Search | arXiv e-print repository

Understanding Stakeholders' Perceptions and Needs Across the LLM Supply Chain

Authors: Agathe Balayn, Lorenzo Corti, Fanny Rancourt, Fabio Casati, Ujwal Gadiraju

Abstract: Explainability and transparency of AI systems are undeniably important, leading to several research studies and tools addressing them. Existing works fall short of accounting for the diverse stakeholders of the AI supply chain who may differ in their needs and consideration of the facets of explainability and transparency. In this paper, we argue for the need to revisit the inquiries of these vita… ▽ More Explainability and transparency of AI systems are undeniably important, leading to several research studies and tools addressing them. Existing works fall short of accounting for the diverse stakeholders of the AI supply chain who may differ in their needs and consideration of the facets of explainability and transparency. In this paper, we argue for the need to revisit the inquiries of these vital constructs in the context of LLMs. To this end, we report on a qualitative study with 71 different stakeholders, where we explore the prevalent perceptions and needs around these concepts. This study not only confirms the importance of exploring the ``who'' in XAI and transparency for LLMs, but also reflects on best practices to do so while surfacing the often forgotten stakeholders and their information needs. Our insights suggest that researchers and practitioners should simultaneously clarify the ``who'' in considerations of explainability and transparency, the ``what'' in the information needs, and ``why'' they are needed to ensure responsible design and development across the LLM supply chain. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Paper accepted at the HCXAI workshop, co-located with CHI'24

arXiv:2405.16310 [pdf, other]

An Empirical Exploration of Trust Dynamics in LLM Supply Chains

Authors: Agathe Balayn, Mireia Yurrita, Fanny Rancourt, Fabio Casati, Ujwal Gadiraju

Abstract: With the widespread proliferation of AI systems, trust in AI is an important and timely topic to navigate. Researchers so far have largely employed a myopic view of this relationship. In particular, a limited number of relevant trustors (e.g., end-users) and trustees (i.e., AI systems) have been considered, and empirical explorations have remained in laboratory settings, potentially overlooking fa… ▽ More With the widespread proliferation of AI systems, trust in AI is an important and timely topic to navigate. Researchers so far have largely employed a myopic view of this relationship. In particular, a limited number of relevant trustors (e.g., end-users) and trustees (i.e., AI systems) have been considered, and empirical explorations have remained in laboratory settings, potentially overlooking factors that impact human-AI relationships in the real world. In this paper, we argue for broadening the scope of studies addressing `trust in AI' by accounting for the complex and dynamic supply chains that AI systems result from. AI supply chains entail various technical artifacts that diverse individuals, organizations, and stakeholders interact with, in a variety of ways. We present insights from an in-situ, empirical study of LLM supply chains. Our work reveals additional types of trustors and trustees and new factors impacting their trust relationships. These relationships were found to be central to the development and adoption of LLMs, but they can also be the terrain for uncalibrated trust and reliance on untrustworthy LLMs. Based on these findings, we discuss the implications for research on `trust in AI'. We highlight new research opportunities and challenges concerning the appropriate study of inter-actor relationships across the supply chain and the development of calibrated trust and meaningful reliance behaviors. We also question the meaning of building trust in the LLM supply chain. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Paper accepted at the TREW workshop co-located with CHI'24

arXiv:2307.11806 [pdf, other]

doi 10.1145/3600211.3604655

How do you feel? Measuring User-Perceived Value for Rejecting Machine Decisions in Hate Speech Detection

Authors: Philippe Lammerts, Philip Lippmann, Yen-Chia Hsu, Fabio Casati, Jie Yang

Abstract: Hate speech moderation remains a challenging task for social media platforms. Human-AI collaborative systems offer the potential to combine the strengths of humans' reliability and the scalability of machine learning to tackle this issue effectively. While methods for task handover in human-AI collaboration exist that consider the costs of incorrect predictions, insufficient attention has been pai… ▽ More Hate speech moderation remains a challenging task for social media platforms. Human-AI collaborative systems offer the potential to combine the strengths of humans' reliability and the scalability of machine learning to tackle this issue effectively. While methods for task handover in human-AI collaboration exist that consider the costs of incorrect predictions, insufficient attention has been paid to accurately estimating these costs. In this work, we propose a value-sensitive rejection mechanism that automatically rejects machine decisions for human moderation based on users' value perceptions regarding machine decisions. We conduct a crowdsourced survey study with 160 participants to evaluate their perception of correct and incorrect machine decisions in the domain of hate speech detection, as well as occurrences where the system rejects making a prediction. Here, we introduce Magnitude Estimation, an unbounded scale, as the preferred method for measuring user (dis)agreement with machine decisions. Our results show that Magnitude Estimation can provide a reliable measurement of participants' perception of machine decisions. By integrating user-perceived value into human-AI collaboration, we further show that it can guide us in 1) determining when to accept or reject machine decisions to obtain the optimal total value a model can deliver and 2) selecting better classification models as compared to the more widely used target of model accuracy. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: To appear at AIES '23. Philippe Lammerts, Philip Lippmann, Yen-Chia Hsu, Fabio Casati, and Jie Yang. 2023. How do you feel? Measuring User-Perceived Value for Rejecting Machine Decisions in Hate Speech Detection. In AAAI/ACM Conference on AI, Ethics, and Society (AIES '23), August 8.10, 2023, Montreal, QC, Canada. ACM, New York, NY, USA. 11 pages

arXiv:2306.01771 [pdf, other]

ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence

Authors: Amin Beheshti, Jian Yang, Quan Z. Sheng, Boualem Benatallah, Fabio Casati, Schahram Dustdar, Hamid Reza Motahari Nezhad, Xuyun Zhang, Shan Xue

Abstract: Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes us… ▽ More Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes using GPT technology to generate new process models when/if needed. We introduce ProcessGPT as a new technology that has the potential to enhance decision-making in data-centric and knowledge-intensive processes. ProcessGPT can be designed by training a generative pre-trained transformer model on a large dataset of business process data. This model can then be fine-tuned on specific process domains and trained to generate process flows and make decisions based on context and user input. The model can be integrated with NLP and machine learning techniques to provide insights and recommendations for process improvement. Furthermore, the model can automate repetitive tasks and improve process efficiency while enabling knowledge workers to communicate analysis findings, supporting evidence, and make decisions. ProcessGPT can revolutionize business process management (BPM) by offering a powerful tool for process augmentation, automation and improvement. Finally, we demonstrate how ProcessGPT can be a powerful tool for augmenting data engineers in maintaining data ecosystem processes within large bank organizations. Our scenario highlights the potential of this approach to improve efficiency, reduce costs, and enhance the quality of business operations through the automation of data-centric and knowledge-intensive processes. These results underscore the promise of ProcessGPT as a transformative technology for organizations looking to improve their process workflows. △ Less

Submitted 28 May, 2023; originally announced June 2023.

Comments: Accepted in: 2023 IEEE International Conference on Web Services (ICWS); Corresponding author: Prof. Amin Beheshti ([email protected])

arXiv:2209.15157 [pdf, other]

Rethinking and Recomputing the Value of Machine Learning Models

Authors: Burcu Sayin, Jie Yang, Xinyue Chen, Andrea Passerini, Fabio Casati

Abstract: In this paper, we argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application within organizational or societal contexts, where they are intended to create beneficial value for people. We propose a shift in perspective, redefining model assessment and selection to emphasize integration into workflows that combine machin… ▽ More In this paper, we argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application within organizational or societal contexts, where they are intended to create beneficial value for people. We propose a shift in perspective, redefining model assessment and selection to emphasize integration into workflows that combine machine predictions with human expertise, particularly in scenarios requiring human intervention for low-confidence predictions. Traditional metrics like accuracy and f-score fail to capture the beneficial value of models in such hybrid settings. To address this, we introduce a simple yet theoretically sound "value" metric that incorporates task-specific costs for correct predictions, errors, and rejections, offering a practical framework for real-world evaluation. Through extensive experiments, we show that existing metrics fail to capture real-world needs, often leading to suboptimal choices in terms of value when used to rank classifiers. Furthermore, we emphasize the critical role of calibration in determining model value, showing that simple, well-calibrated models can often outperform more complex models that are challenging to calibrate. △ Less

Submitted 23 April, 2025; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted at the Journal of Artificial Intelligence Review

arXiv:2112.06775 [pdf, other]

On the Value of ML Models

Authors: Fabio Casati, Pierre-André Noël, Jie Yang

Abstract: We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications. For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what t… ▽ More We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications. For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a ``good'' ML model. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Poster presentation at Workshop on Human and Machine Decisions at NeurIPS 2021 (WHMD 2021). https://sites.google.com/view/whmd2021

MSC Class: 68Q32; 68T05 ACM Class: I.2.6

Journal ref: Fabio Casati, Pierre-André Noël and Jie Yang (2021, December 14). On the Value of ML Models [Poster presentation]. Workshop on Human and Machine Decisions, NeurIPS 2021, virtual. https://sites.google.com/view/whmd2021

arXiv:2111.06736 [pdf, other]

The Science of Rejection: A Research Area for Human Computation

Authors: Burcu Sayin, Jie Yang, Andrea Passerini, Fabio Casati

Abstract: We motivate why the science of learning to reject model predictions is central to ML, and why human computation has a lead role in this effort. We motivate why the science of learning to reject model predictions is central to ML, and why human computation has a lead role in this effort. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: To appear in the Proceedings of The 9th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2021)

MSC Class: machine learning; human in the loop

arXiv:2109.09420 [pdf, other]

Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots

Authors: Jorge Ramírez, Auday Berro, Marcos Baez, Boualem Benatallah, Fabio Casati

Abstract: A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse. A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: HCOMP 2021 Works-in-progress & Demonstrations

arXiv:2107.13519 [pdf, other]

On the state of reporting in crowdsourcing experiments and a checklist to aid current practices

Authors: Jorge Ramírez, Burcu Sayin, Marcos Baez, Fabio Casati, Luca Cernuzzi, Boualem Benatallah, Gianluca Demartini

Abstract: Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., sampling crowd workers, mapping experimental conditions to micro-tasks, or ensure quality contributions. While several guideline… ▽ More Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., sampling crowd workers, mapping experimental conditions to micro-tasks, or ensure quality contributions. While several guidelines inform researchers in these choices, guidance of how and what to report from crowdsourcing experiments has been largely overlooked. If under-reported, implementation choices constitute variability sources that can affect the experiment's reproducibility and prevent a fair assessment of research outcomes. In this paper, we examine the current state of reporting of crowdsourcing experiments and offer guidance to address associated reporting issues. We start by identifying sensible implementation choices, relying on existing literature and interviews with experts, to then extensively analyze the reporting of 171 crowdsourcing experiments. Informed by this process, we propose a checklist for reporting crowdsourcing experiments. △ Less

Submitted 9 September, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: Accepted to CSCW 2021

arXiv:2101.08854 [pdf, other]

Active Hybrid Classification

Authors: Evgeny Krivosheev, Fabio Casati, Alessandro Bozzon

Abstract: Hybrid crowd-machine classifiers can achieve superior performance by combining the cost-effectiveness of automatic classification with the accuracy of human judgment. This paper shows how crowd and machines can support each other in tackling classification problems. Specifically, we propose an architecture that orchestrates active learning and crowd classification and combines them in a virtuous c… ▽ More Hybrid crowd-machine classifiers can achieve superior performance by combining the cost-effectiveness of automatic classification with the accuracy of human judgment. This paper shows how crowd and machines can support each other in tackling classification problems. Specifically, we propose an architecture that orchestrates active learning and crowd classification and combines them in a virtuous cycle. We show that when the pool of items to classify is finite we face learning vs. exploitation trade-off in hybrid classification, as we need to balance crowd tasks optimized for creating a training dataset with tasks optimized for classifying items in the pool. We define the problem, propose a set of heuristics and evaluate the approach on three real-world datasets with different characteristics in terms of machine and crowd classification performance, showing that our active hybrid approach significantly outperforms baselines. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:2011.03969 [pdf]

doi 10.1109/MIC.2020.3037151

Chatbots as conversational healthcare services

Authors: Mlađan Jovanović, Marcos Baez, Fabio Casati

Abstract: Chatbots are emerging as a promising platform for accessing and delivering healthcare services. The evidence is in the growing number of publicly available chatbots aiming at taking an active role in the provision of prevention, diagnosis, and treatment services. This article takes a closer look at how these emerging chatbots address design aspects relevant to healthcare service provision, emphasi… ▽ More Chatbots are emerging as a promising platform for accessing and delivering healthcare services. The evidence is in the growing number of publicly available chatbots aiming at taking an active role in the provision of prevention, diagnosis, and treatment services. This article takes a closer look at how these emerging chatbots address design aspects relevant to healthcare service provision, emphasizing the Human-AI interaction aspects and the transparency in AI automation and decision making. △ Less

Submitted 8 November, 2020; originally announced November 2020.

arXiv:2011.02891 [pdf, other]

On the impact of predicate complexity in crowdsourced classification tasks

Authors: Jorge Ramírez, Marcos Baez, Fabio Casati, Luca Cernuzzi, Boualem Benatallah, Ekaterina A. Taran, Veronika A. Malanina

Abstract: This paper explores and offers guidance on a specific and relevant problem in task design for crowdsourcing: how to formulate a complex question used to classify a set of items. In micro-task markets, classification is still among the most popular tasks. We situate our work in the context of information retrieval and multi-predicate classification, i.e., classifying a set of items based on a set o… ▽ More This paper explores and offers guidance on a specific and relevant problem in task design for crowdsourcing: how to formulate a complex question used to classify a set of items. In micro-task markets, classification is still among the most popular tasks. We situate our work in the context of information retrieval and multi-predicate classification, i.e., classifying a set of items based on a set of conditions. Our experiments cover a wide range of tasks and domains, and also consider crowd workers alone and in tandem with machine learning classifiers. We provide empirical evidence into how the resulting classification performance is affected by different predicate formulation strategies, emphasizing the importance of predicate formulation as a task design dimension in crowdsourcing. △ Less

Submitted 17 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.02804 [pdf, other]

Challenges and strategies for running controlled crowdsourcing experiments

Authors: Jorge Ramírez, Marcos Baez, Fabio Casati, Luca Cernuzzi, Boualem Benatallah

Abstract: This paper reports on the challenges and lessons we learned while running controlled experiments in crowdsourcing platforms. Crowdsourcing is becoming an attractive technique to engage a diverse and large pool of subjects in experimental research, allowing researchers to achieve levels of scale and completion times that would otherwise not be feasible in lab settings. However, the scale and flexib… ▽ More This paper reports on the challenges and lessons we learned while running controlled experiments in crowdsourcing platforms. Crowdsourcing is becoming an attractive technique to engage a diverse and large pool of subjects in experimental research, allowing researchers to achieve levels of scale and completion times that would otherwise not be feasible in lab settings. However, the scale and flexibility comes at the cost of multiple and sometimes unknown sources of bias and confounding factors that arise from technical limitations of crowdsourcing platforms and from the challenges of running controlled experiments in the "wild". In this paper, we take our experience in running systematic evaluations of task design as a motivating example to explore, describe, and quantify the potential impact of running uncontrolled crowdsourcing experiments and derive possible coping strategies. Among the challenges identified, we can mention sampling bias, controlling the assignment of subjects to experimental conditions, learning effects, and reliability of crowdsourcing results. According to our empirical studies, the impact of potential biases and confounding factors can amount to a 38\% loss in the utility of the data collected in uncontrolled settings; and it can significantly change the outcome of experiments. These issues ultimately inspired us to implement CrowdHub, a system that sits on top of major crowdsourcing platforms and allows researchers and practitioners to run controlled crowdsourcing projects. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2009.03101 [pdf]

doi 10.1109/MIC.2020.3024605

Chatbot integration in few patterns

Authors: Marcos Baez, Florian Daniel, Fabio Casati, Boualem Benatallah

Abstract: Chatbots are software agents that are able to interact with humans in natural language. Their intuitive interaction paradigm is expected to significantly reshape the software landscape of tomorrow, while already today chatbots are invading a multitude of scenarios and contexts. This article takes a developer's perspective, identifies a set of architectural patterns that capture different chatbot i… ▽ More Chatbots are software agents that are able to interact with humans in natural language. Their intuitive interaction paradigm is expected to significantly reshape the software landscape of tomorrow, while already today chatbots are invading a multitude of scenarios and contexts. This article takes a developer's perspective, identifies a set of architectural patterns that capture different chatbot integration scenarios, and reviews state-of-the-art development aids. △ Less

Submitted 18 September, 2020; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: prior version was an incomplete early draft; current version includes changes in references, appropriate acknowledgement; and minor revisions

arXiv:2001.06543 [pdf, other]

Siamese Graph Neural Networks for Data Integration

Authors: Evgeny Krivosheev, Mattia Atzeni, Katsiaryna Mirylenka, Paolo Scotton, Fabio Casati

Abstract: Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to model… ▽ More Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles. Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible. This is achieved by combining siamese and graph neural networks to propagate information between connected entities and support high scalability. We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations. △ Less

Submitted 17 January, 2020; originally announced January 2020.

arXiv:1909.02800 [pdf, other]

CrowdHub: Extending crowdsourcing platforms for the controlled evaluation of tasks designs

Authors: Jorge Ramírez, Simone Degiacomi, Davide Zanella, Marcos Baez, Fabio Casati, Boualem Benatallah

Abstract: We present CrowdHub, a tool for running systematic evaluations of task designs on top of crowdsourcing platforms. The goal is to support the evaluation process, avoiding potential experimental biases that, according to our empirical studies, can amount to 38% loss in the utility of the collected dataset in uncontrolled settings. Using CrowdHub, researchers can map their experimental design and aut… ▽ More We present CrowdHub, a tool for running systematic evaluations of task designs on top of crowdsourcing platforms. The goal is to support the evaluation process, avoiding potential experimental biases that, according to our empirical studies, can amount to 38% loss in the utility of the collected dataset in uncontrolled settings. Using CrowdHub, researchers can map their experimental design and automate the complex process of managing task execution over time while controlling for returning workers and crowd demographics, thus reducing bias, increasing utility of collected data, and making more efficient use of a limited pool of subjects. △ Less

Submitted 10 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

arXiv:1909.02780 [pdf, other]

Understanding the Impact of Text Highlighting in Crowdsourcing Tasks

Authors: Jorge Ramírez, Marcos Baez, Fabio Casati, Boualem Benatallah

Abstract: Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combin… ▽ More Text classification is one of the most common goals of machine learning (ML) projects, and also one of the most frequent human intelligence tasks in crowdsourcing platforms. ML has mixed success in such tasks depending on the nature of the problem, while crowd-based classification has proven to be surprisingly effective, but can be expensive. Recently, hybrid text classification algorithms, combining human computation and machine learning, have been proposed to improve accuracy and reduce costs. One way to do so is to have ML highlight or emphasize portions of text that it believes to be more relevant to the decision. Humans can then rely only on this text or read the entire text if the highlighted information is insufficient. In this paper, we investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks. We study this through a series of crowdsourcing experiments running over different datasets and with task designs imposing different cognitive demands. Our findings suggest that highlighting is effective in reducing classification effort but does not improve accuracy - and in fact, low-quality highlighting can decrease it. △ Less

Submitted 6 September, 2019; originally announced September 2019.

arXiv:1904.00714 [pdf, other]

Combining Crowd and Machines for Multi-predicate Item Screening

Authors: Evgeny Krivosheev, Fabio Casati, Marcos Baez, Boualem Benatallah

Abstract: This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classi… ▽ More This paper discusses how crowd and machine classifiers can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. We further show how, given a new classification problem and a set of classifiers of unknown accuracy for the problem at hand, we can identify how to manage the cost-accuracy trade off by progressively determining if we should spend budget to obtain test data (to assess the accuracy of the given classifiers), or to train an ensemble of classifiers, or whether we should leverage the existing machine classifiers with the crowd, and in this case how to efficiently combine them based on their estimated characteristics to obtain the classification. We demonstrate that the techniques we propose obtain significant cost/accuracy improvements with respect to the leading classification algorithms. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: Please cite the CSCW2018 version of this paper:@article{krivosheev2018combining, title={Combining Crowd and Machines for Multi-predicate Item Screening}, author={Krivosheev, Evgeny and Casati, Fabio and Baez, Marcos and Benatallah, Boualem}, journal={Proceedings of the ACM on Human-Computer Interaction}, volume={2}, number={CSCW}, pages={97}, year={2018}, publisher={ACM} }

arXiv:1901.04194 [pdf]

doi 10.1007/978-981-13-3693-5_17

Technologies for promoting social participation in later life

Authors: Marcos Baez, Radoslaw Nielek, Fabio Casati, Adam Wierzbicki

Abstract: Social participation is known to bring great benefits to the health and well-being of people as they age. From being in contact with others to engaging in group activities, keeping socially active can help slow down the effects of age-related declines, reduce risks of loneliness and social isolation and even mortality in old age. There are unfortunately a variety of barriers that make it difficult… ▽ More Social participation is known to bring great benefits to the health and well-being of people as they age. From being in contact with others to engaging in group activities, keeping socially active can help slow down the effects of age-related declines, reduce risks of loneliness and social isolation and even mortality in old age. There are unfortunately a variety of barriers that make it difficult for older adults to engage in social activities in a regular basis. In this chapter, we give an overview of the challenges to social participation and discuss how technology can help overcome these barriers and promote participation in social activities. We examine two particular research threads and designs, exploring ways in which technology can support co-located and virtual participation: i) an application that motivates the virtual participation in group training programs, and ii) a location-based game that supports co-located intergenerational ICT training classes. We discuss the effectiveness and limitations of various design choices in the two use cases and outline the lessons learned △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1806.03249 [pdf, other]

doi 10.1145/3197391.3205426

Design Challenges for Reconnecting in Later Life: A Qualitative Study

Authors: Francisco Ibarra, Grzegorz Kowalik, Marcos Baez, Radosław Nielek, Norma Lau, Luca Cernuzzi, Fabio Casati

Abstract: Friendships and social interactions are renown contributors to wellbeing. As such, keeping a healthy amount of relationships becomes very important as people age and the size of their social network tends to decrease. In this paper, we take a step back and explore reconnection -- find out about or re-contact old friends, an emerging topic due to the increased use of computer-mediated technology by… ▽ More Friendships and social interactions are renown contributors to wellbeing. As such, keeping a healthy amount of relationships becomes very important as people age and the size of their social network tends to decrease. In this paper, we take a step back and explore reconnection -- find out about or re-contact old friends, an emerging topic due to the increased use of computer-mediated technology by older adults to maintain friendships and form new ones. We report on our findings from semi-structured interviews with 28 individuals from Costa Rica and Poland. The interviews aimed to explore whether there is a wish to reconnect, and the challenges encountered by older adults to reconnect. We contribute with design considerations for tools allow- ing older adults to reconnect, discussing opportunities for technology. △ Less

Submitted 31 May, 2018; originally announced June 2018.

ACM Class: H.5.m

arXiv:1806.02291 [pdf, other]

doi 10.1145/3197391.3205424

Designing for Co-located and Virtual Social Interactions in Residential Care

Authors: Francisco Ibarra, Marcos Baez, Francesca Fiore, Fabio Casati

Abstract: In this paper we explore the feasibility and design challenges in supporting co-located and virtual social interactions in residential care by building on the practice of reminiscence. Motivated by the challenges of social interaction in this context, we first explore the feasibility of a reminiscence-based social interaction tool designed to stimulate conversation in residential care with differe… ▽ More In this paper we explore the feasibility and design challenges in supporting co-located and virtual social interactions in residential care by building on the practice of reminiscence. Motivated by the challenges of social interaction in this context, we first explore the feasibility of a reminiscence-based social interaction tool designed to stimulate conversation in residential care with different stakeholders. Then, we explore the design challenges in supporting an assisting role in co-located reminiscence sessions, by running pilot studies with a technology probe. Our findings point to the feasibility of the tool and the willingness of stakeholders to contribute in the process, although with some skepticism about virtual interactions. The reminiscence sessions showed that compromises are needed when designing for both story collection and conversation stimulation, evidencing specific design areas where further exploration is needed. △ Less

Submitted 31 May, 2018; originally announced June 2018.

ACM Class: H.5.m

arXiv:1805.12376 [pdf, other]

CrowdRev: A platform for Crowd-based Screening of Literature Reviews

Authors: Jorge Ramirez, Evgeny Krivosheev, Marcos Baez, Fabio Casati, Boualem Benatallah

Abstract: In this paper and demo we present a crowd and crowd+AI based system, called CrowdRev, supporting the screening phase of literature reviews and achieving the same quality as author classification at a fraction of the cost, and near-instantly. CrowdRev makes it easy for authors to leverage the crowd, and ensures that no money is wasted even in the face of difficult papers or criteria: if the system… ▽ More In this paper and demo we present a crowd and crowd+AI based system, called CrowdRev, supporting the screening phase of literature reviews and achieving the same quality as author classification at a fraction of the cost, and near-instantly. CrowdRev makes it easy for authors to leverage the crowd, and ensures that no money is wasted even in the face of difficult papers or criteria: if the system detects that the task is too hard for the crowd, it just gives up trying (for that paper, or for that criteria, or altogether), without wasting money and never compromising on quality. △ Less

Submitted 31 May, 2018; originally announced May 2018.

arXiv:1805.12346 [pdf, other]

Crowdsourcing for Reminiscence Chatbot Design

Authors: Svetlana Nikitina, Florian Daniel, Marcos Baez, Fabio Casati

Abstract: In this work-in-progress paper we discuss the challenges in identifying effective and scalable crowd-based strategies for designing content, conversation logic, and meaningful metrics for a reminiscence chatbot targeted at older adults. We formalize the problem and outline the main research questions that drive the research agenda in chatbot design for reminiscence and for relational agents for ol… ▽ More In this work-in-progress paper we discuss the challenges in identifying effective and scalable crowd-based strategies for designing content, conversation logic, and meaningful metrics for a reminiscence chatbot targeted at older adults. We formalize the problem and outline the main research questions that drive the research agenda in chatbot design for reminiscence and for relational agents for older adults in general. △ Less

Submitted 31 May, 2018; originally announced May 2018.

arXiv:1803.09814 [pdf, other]

Crowd-based Multi-Predicate Screening of Papers in Literature Reviews

Authors: Evgeny Krivosheev, Fabio Casati, Boualem Benatallah

Abstract: Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particu… ▽ More Systematic literature reviews (SLRs) are one of the most common and useful form of scientific research and publication. Tens of thousands of SLRs are published each year, and this rate is growing across all fields of science. Performing an accurate, complete and unbiased SLR is however a difficult and expensive endeavor. This is true in general for all phases of a literature review, and in particular for the paper screening phase, where authors lter a set of potentially in-scope papers based on a number of exclusion criteria. To address the problem, in recent years the research community has began to explore the use of the crowd to allow for a faster, accurate, cheaper and unbiased screening of papers. Initial results show that crowdsourcing can be effective, even for relatively complex reviews. In this paper we derive and analyze a set of strategies for crowd-based screening, and show that an adaptive strategy, that continuously re-assesses the statistical properties of the problem to minimize the number of votes needed to take decisions for each paper, significantly outperforms a number of non-adaptive approaches in terms of cost and accuracy. We validate both applicability and results of the approach through a set of crowdsourcing experiments, and discuss properties of the problem and algorithms that we believe to be generally of interest for classification problems where items are classified via a series of successive tests (as it often happens in medicine). △ Less

Submitted 21 March, 2018; originally announced March 2018.

Comments: Please cite the www2018 version of this paper: @inproceedings{krivosheev2018, title={Crowd-based Multi-Predicate Screening of Papers in Literature Reviews}, author={Evgeny Krivosheev, Fabio Casati and Boualem Benatallah}, year={2018}, organization={International World Wide Web Conferences Steering Committee} }

arXiv:1803.07947 [pdf, other]

Crowd-Machine Collaboration for Item Screening

Authors: Evgeny Krivosheev, Bahareh Harandizadeh, Fabio Casati, Boualem Benatallah

Abstract: In this paper we describe how crowd and machine classifier can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. In this paper we describe how crowd and machine classifier can be efficiently combined to screen items that satisfy a set of predicates. We show that this is a recurring problem in many domains, present machine-human (hybrid) algorithms that screen items efficiently and estimate the gain over human-only or machine-only screening in terms of performance and cost. △ Less

Submitted 21 March, 2018; originally announced March 2018.

arXiv:1802.04100 [pdf, ps, other]

doi 10.1145/3183428.3183439

Agile development for vulnerable populations: lessons learned and recommendations

Authors: Marcos Baez, Fabio Casati

Abstract: In this paper we draw attention to the challenges of managing software projects for vulnerable populations, i.e., people potentially exposed to harm or not capable of protecting their own interests. The focus on human aspects, and particularly, the inclusion of human-centered approaches, has been a popular topic in the software engineering community. We argue, however, that current literature prov… ▽ More In this paper we draw attention to the challenges of managing software projects for vulnerable populations, i.e., people potentially exposed to harm or not capable of protecting their own interests. The focus on human aspects, and particularly, the inclusion of human-centered approaches, has been a popular topic in the software engineering community. We argue, however, that current literature provides little understanding and guidance on how to approach these type of scenarios. Here, we shed some light on the topic by reporting on our experiences in developing innovative solutions for the residential care scenario, outlining potential issues and recommendations. △ Less

Submitted 25 January, 2018; originally announced February 2018.

arXiv:1711.05410 [pdf, other]

Programming Bots by Synthesizing Natural Language Expressions into API Invocations

Authors: Shayan Zamanirad, Boualem Benatallah, Moshe Chai Barukh, Fabio Casati, Carlos Rodriguez

Abstract: At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically capturing users expressions.… ▽ More At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically capturing users expressions. Moreover, bots will derive immeasurable value by programming them to invoke APIs for their results. Today, within the Web and Mobile development community, complex applications are being stringed together with a few lines of code -- all made possible by APIs. Yet, developers today are not as empowered to program bots in much the same way. To overcome this, we introduce BotBase, a bot programming platform that dynamically synthesizes natural language user expressions into API invocations. Our solution is two faceted: Firstly, we construct an API knowledge graph to encode and evolve APIs; secondly, leveraging the above we apply techniques in NLP, ML and Entity Recognition to perform the required synthesis from natural language user expressions into API calls. △ Less

Submitted 15 November, 2017; originally announced November 2017.

Comments: The paper is published at ASE 2017 (The 32nd IEEE/ACM International Conference on Automated Software Engineering)

Journal ref: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017, 832-837

arXiv:1709.05168 [pdf, other]

Crowdsourcing Paper Screening in Systematic Literature Reviews

Authors: Evgeny Krivosheev, Fabio Casati, Valentina Caforio, Boualem Benatallah

Abstract: Literature reviews allow scientists to stand on the shoulders of giants, showing promising directions, summarizing progress, and pointing out existing challenges in research. At the same time conducting a systematic literature review is a laborious and consequently expensive process. In the last decade, there have a few studies on crowdsourcing in literature reviews. This paper explores the feasib… ▽ More Literature reviews allow scientists to stand on the shoulders of giants, showing promising directions, summarizing progress, and pointing out existing challenges in research. At the same time conducting a systematic literature review is a laborious and consequently expensive process. In the last decade, there have a few studies on crowdsourcing in literature reviews. This paper explores the feasibility of crowdsourcing for facilitating the literature review process in terms of results, time and effort, as well as to identify which crowdsourcing strategies provide the best results based on the budget available. In particular we focus on the screening phase of the literature review process and we contribute and assess methods for identifying the size of tests, labels required per paper, and classification functions as well as methods to split the crowdsourcing process in phases to improve results. Finally, we present our findings based on experiments run on Crowdflower. △ Less

Submitted 15 September, 2017; originally announced September 2017.

arXiv:1703.06317 [pdf, other]

Designing for older adults: review of touchscreen design guidelines

Authors: Leysan Nurgalieva, Juan Jose Jara Laconich, Marcos Baez, Fabio Casati, Maurizio Marchese

Abstract: The distinct abilities of older adults to interact with computers has motivated a wide range of contributions in the the form of design guidelines for making technologies usable and accessible for the elderly population. However, despite the growing effort by the research community, the adoption of guidelines by developers and designers has been scant or not properly translated into more accessibl… ▽ More The distinct abilities of older adults to interact with computers has motivated a wide range of contributions in the the form of design guidelines for making technologies usable and accessible for the elderly population. However, despite the growing effort by the research community, the adoption of guidelines by developers and designers has been scant or not properly translated into more accessible interaction systems. In this paper we explore this issue by reporting on a qualitative outcomes of a systematic review of 204 research-derived design guidelines for touchscreen applications. We report first on the different definitions of "elderly" and assess the reliability, organization and accessibility of the guidelines. Then we present our early attempt at facilitating the reporting and access of such guidelines to researchers and practitioners. △ Less

Submitted 18 March, 2017; originally announced March 2017.

ACM Class: H.5.m

arXiv:1701.07607 [pdf, ps, other]

Understanding how Software Can Support the Needs of Family Caregivers for Patients with Severe Conditions

Authors: Angela di Fiore, Francesco Ceschel, Francesca Fiore, Marcos Baez, Fabio Casati, Giampaolo Armellin

Abstract: In this paper, we report an extensive analysis that we performed in two scenarios where the care relation between doctor and patients are mediated by the relatives of the patients: Pediatric Palliative Care (PPC) and Nursing Homes (NH). When the patients are children or very old adults in the end of life, the provision of care often involve a family caregiver as the main point of contact for the h… ▽ More In this paper, we report an extensive analysis that we performed in two scenarios where the care relation between doctor and patients are mediated by the relatives of the patients: Pediatric Palliative Care (PPC) and Nursing Homes (NH). When the patients are children or very old adults in the end of life, the provision of care often involve a family caregiver as the main point of contact for the health service. PPC and NH are characterized by emotional complexity, since incurable diseases expose the family caregivers to heavy careload and human distress. In this paper, we discuss our findings with a novel perspective, focusing on: information, coordination and social challenges that arise by dealing with such contexts; the existing technology as it is appropriated today to cope with them; and what we, as software researchers, can do to develop the right solutions. △ Less

Submitted 26 January, 2017; originally announced January 2017.

arXiv:1612.02686 [pdf]

Effects of online group exercises for older adults on physical, psychological and social wellbeing: a pilot trial

Authors: Marcos Baez, Iman Khaghani Far, Francisco Ibarra, Michela Ferron, Daniele Didino, Fabio Casati

Abstract: Background. There are many factors that can make of group exercises a challenging setting for older adults. A major one in the elderly population is the difference in the level of skills. In this paper we report on the physical, psychological and social wellbeing outcomes of a novel virtual gym that enables online group-exercises in older adults with different levels of skills. Methods. A total… ▽ More Background. There are many factors that can make of group exercises a challenging setting for older adults. A major one in the elderly population is the difference in the level of skills. In this paper we report on the physical, psychological and social wellbeing outcomes of a novel virtual gym that enables online group-exercises in older adults with different levels of skills. Methods. A total of 37 older adults (65-87 years old) followed a personalized exercise program based on the OTAGO program for fall prevention, for a period of eight weeks. Participants could join online group exercises using a tablet-based application. Participants were assigned either to a Control group (individual training) or Social group (online group-exercising). Pre- and post- measurements were taken to analyze the physical, psychological and social wellbeing outcomes. The study received ethical approval from the CREATE-NET Ethics Committee on ICT Research Involving Human Beings (Application N. 2014-001). Results. There were improvements in both the Social and Control groups in terms of physical outcomes. Interestingly though, while in the Control group fitter individuals tended to adhere more to the training, this was not the case for the Social group, where the initial level had no effect on adherence. For psychological and social wellbeing outcomes there were improvements on both groups, regardless of the application used. Conclusion. Group exercising in a virtual gym can be effective in motivating and enabling individuals who are less fit to train as much as fitter individuals. This not only indicates the feasibility of training together despite differences in physical skills but also suggests that online exercise can reduce the effect of skills on adherence in a social context. Longer term interventions with more participants are instead recommended to assess impacts on wellbeing. △ Less

Submitted 8 December, 2016; originally announced December 2016.

arXiv:1609.05334 [pdf, other]

doi 10.1109/CTS.2016.0024

What makes people bond?: A study on social interactions and common life points on Facebook

Authors: Emanuel Sanchiz, Francisco Ibarra, Svetlana Nikitina, Marcos Baez, Fabio Casati

Abstract: In this paper we aim at understanding if and how, by analysing people's profile and historical data (such as data available on Facebook profiles and interactions, or collected explicitly) we can motivate two persons to interact and eventually create long-term bonds. We do this by exploring the relationship between connectedness, social interactions and common life points on Facebook. The results a… ▽ More In this paper we aim at understanding if and how, by analysing people's profile and historical data (such as data available on Facebook profiles and interactions, or collected explicitly) we can motivate two persons to interact and eventually create long-term bonds. We do this by exploring the relationship between connectedness, social interactions and common life points on Facebook. The results are of particular importance for the development of technology that aims at reducing social isolation for people with less chances to interact, such as older adults. △ Less

Submitted 17 September, 2016; originally announced September 2016.

arXiv:1609.05329 [pdf, other]

doi 10.1109/CTS.2016.0098

Online Group-exercises for Older Adults of Different Physical Abilities

Authors: Marcos Baez, Francisco Ibarra, Iman Khaghani Far, Michela Ferron, Fabio Casati

Abstract: In this paper we describe the design and validation of a virtual fitness environment aiming at keeping older adults physically and socially active. We target particularly older adults who are socially more isolated, physically less active, and with less chances of training in a gym. The virtual fitness environment, namely Gymcentral, was designed to enable and motivate older adults to follow perso… ▽ More In this paper we describe the design and validation of a virtual fitness environment aiming at keeping older adults physically and socially active. We target particularly older adults who are socially more isolated, physically less active, and with less chances of training in a gym. The virtual fitness environment, namely Gymcentral, was designed to enable and motivate older adults to follow personalised exercises from home, with a (heterogeneous) group of remote friends and under the remote supervision of a Coach. We take the training activity as an opportunity to create social interactions, by complementing training features with social instruments. Finally, we report on the feasibility and effectiveness of the virtual environment, as well as its effects on the usage and social interactions, from an intervention study in Trento, Italy △ Less

Submitted 17 September, 2016; originally announced September 2016.

arXiv:1607.01752 [pdf, other]

CrowdCafe - Mobile Crowdsourcing Platform

Authors: Pavel Kucherbaev, Azad Abad, Stefano Tranquillini, Florian Daniel, Maurizio Marchese, Fabio Casati

Abstract: In this paper we present a mobile crowdsourcing platform CrowdCafe, where people can perform microtasks using their smartphones while they ride a bus, travel by train, stand in a queue or wait for an appointment. These microtasks are executed in exchange for rewards provided by local stores, such as coffee, desserts and bus tickets. We present the concept, the implementation and the evaluation by… ▽ More In this paper we present a mobile crowdsourcing platform CrowdCafe, where people can perform microtasks using their smartphones while they ride a bus, travel by train, stand in a queue or wait for an appointment. These microtasks are executed in exchange for rewards provided by local stores, such as coffee, desserts and bus tickets. We present the concept, the implementation and the evaluation by conducting a study with 52 participants, having 1108 tasks completed. △ Less

Submitted 6 July, 2016; originally announced July 2016.

Comments: Was published before as a part of the phd thesis by Pavel Kucherbaev http://eprints-phd.biblio.unitn.it/1716/

arXiv:1603.03349 [pdf, other]

Personalized Persuasion for Social Interactions in Nursing Homes

Authors: Marcos Baez, Chiara Dalpiaz, Fatbardha Hoxha, Alessia Tovo, Valentina Caforio, Fabio Casati

Abstract: This paper presents our preliminary investigation and approach towards a mixed physical-virtual technology for stimulating social interactions among and with older adults in nursing homes. We report on set of surveys, apps and focus groups aiming at understanding the different motivations and obstacles in promoting social interactions in institutionalised care. We then present our approach to addr… ▽ More This paper presents our preliminary investigation and approach towards a mixed physical-virtual technology for stimulating social interactions among and with older adults in nursing homes. We report on set of surveys, apps and focus groups aiming at understanding the different motivations and obstacles in promoting social interactions in institutionalised care. We then present our approach to address some of the key themes found, e.g., the technological disparity, lack of conversation topics and opportunities to interact △ Less

Submitted 9 March, 2016; originally announced March 2016.

Showing 1–35 of 35 results for author: Casati, F