-
Detecting Student Intent for Chat-Based Intelligent Tutoring Systems
Authors:
Ella Cutler,
Zachary Levonian,
S. Thomas Christie
Abstract:
Chat interfaces for intelligent tutoring systems (ITSs) enable interactivity and flexibility. However, when students interact with chat interfaces, they expect dialogue-driven navigation from the system and can express frustration and disinterest if this is not provided. Intent detection systems help students navigate within an ITS, but detecting students' intent during open-ended dialogue is chal…
▽ More
Chat interfaces for intelligent tutoring systems (ITSs) enable interactivity and flexibility. However, when students interact with chat interfaces, they expect dialogue-driven navigation from the system and can express frustration and disinterest if this is not provided. Intent detection systems help students navigate within an ITS, but detecting students' intent during open-ended dialogue is challenging. We designed an intent detection system in a chatbot ITS, classifying a student's intent between continuing the current lesson or switching to a new lesson. We explore the utility of four machine learning approaches for this task - including both conventional classification approaches and fine-tuned large language models - finding that using an intent classifier introduces trade-offs around implementation cost, accuracy, and prediction time. We argue that implementing intent detection in chat interfaces can reduce frustration and support student learning.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Safe Generative Chats in a WhatsApp Intelligent Tutoring System
Authors:
Zachary Levonian,
Owen Henkel
Abstract:
Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a c…
▽ More
Large language models (LLMs) are flexible, personalizable, and available, which makes their use within Intelligent Tutoring Systems (ITSs) appealing. However, that flexibility creates risks: inaccuracies, harmful content, and non-curricular material. Ethically deploying LLM-backed ITS systems requires designing safeguards that ensure positive experiences for students. We describe the design of a conversational system integrated into an ITS, and our experience evaluating its safety with red-teaming, an in-classroom usability test, and field deployment. We present empirical data from more than 8,000 student conversations with this system, finding that GPT-3.5 rarely generates inappropriate messages. Comparatively more common is inappropriate messages from students, which prompts us to reason about safeguarding as a content moderation and classroom management problem. The student interaction behaviors we observe provide implications for designers - to focus on student inputs as a content moderation problem - and implications for researchers - to focus on subtle forms of bad content.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
ORES-Inspect: A technology probe for machine learning audits on enwiki
Authors:
Zachary Levonian,
Lauren Hagen,
Lu Li,
Jada Lilleboe,
Solvejg Wastvedt,
Aaron Halfaker,
Loren Terveen
Abstract:
Auditing the machine learning (ML) models used on Wikipedia is important for ensuring that vandalism-detection processes remain fair and effective. However, conducting audits is challenging because stakeholders have diverse priorities and assembling evidence for a model's [in]efficacy is technically complex. We designed an interface to enable editors to learn about and audit the performance of the…
▽ More
Auditing the machine learning (ML) models used on Wikipedia is important for ensuring that vandalism-detection processes remain fair and effective. However, conducting audits is challenging because stakeholders have diverse priorities and assembling evidence for a model's [in]efficacy is technically complex. We designed an interface to enable editors to learn about and audit the performance of the ORES edit quality model. ORES-Inspect is an open-source web tool and a provocative technology probe for researching how editors think about auditing the many ML models used on Wikipedia. We describe the design of ORES-Inspect and our plans for further research with this system.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference
Authors:
Zachary Levonian,
Chenglu Li,
Wangda Zhu,
Anoushka Gade,
Owen Henkel,
Millie-Ellen Postle,
Wanli Xing
Abstract:
For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions…
▽ More
For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.
△ Less
Submitted 10 November, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
"Thoughts & Prayers'' or ":Heart Reaction: & :Prayer Reaction:'': How the Release of New Reactions on CaringBridge Reshapes Supportive Communication During Health Crises
Authors:
C. Estelle Smith,
Hannah Miller Hillberg,
Zachary Levonian
Abstract:
Following Facebook's introduction of the "Like" in 2009, CaringBridge (a nonprofit health journaling platform) implemented a "Heart" symbol as a single-click reaction affordance in 2012. In 2016, Facebook expanded its Like into a set of emotion-based reactions. In 2021, CaringBridge likewise added three new reactions: "Prayer", "Happy", and "Sad." Through user surveys ($N=808$) and interviews (…
▽ More
Following Facebook's introduction of the "Like" in 2009, CaringBridge (a nonprofit health journaling platform) implemented a "Heart" symbol as a single-click reaction affordance in 2012. In 2016, Facebook expanded its Like into a set of emotion-based reactions. In 2021, CaringBridge likewise added three new reactions: "Prayer", "Happy", and "Sad." Through user surveys ($N=808$) and interviews ($N=13$), we evaluated this product launch. Unlike Likes on mainstream social media, CaringBridge's single-click Heart was consistently interpreted as a simple, meaningful expression of acknowledgement and support. Although most users accepted the new reactions, the product launch transformed user perceptions of the feature and ignited major disagreement regarding the meanings and functions of reactions in the high stakes context of health crises. Some users found the new reactions to be useful, convenient, and reducing of caregiver burden; others felt they cause emotional harms by stripping communication of meaningful expression and authentic care. Overall, these results surface tensions for small social media platforms that need to survive amidst giants, as well as highlighting crucial trade-offs between the cognitive effort, meaningfulness, and efficiency of different forms of Computer-Mediated Communication (CMC). Our work provides three contributions to support researchers and designers in navigating these tensions: (1) empirical knowledge of how users perceived the reactions launch on CaringBridge; (2) design implications for improving health-focused CMC; and (3) concrete questions to guide future research into reactions and health-focused CMC.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Peer Recommendation Interventions for Health-related Social Support: a Feasibility Assessment
Authors:
Zachary Levonian,
Matthew Zent,
Ngan Nguyen,
Matthew McNamara,
Loren Terveen,
Svetlana Yarosh
Abstract:
Online health communities (OHCs) offer the promise of connecting with supportive peers. Forming these connections first requires finding relevant peers - a process that can be time-consuming. Peer recommendation systems are a computational approach to make finding peers easier during a health journey. By encouraging OHC users to alter their online social networks, peer recommendations could increa…
▽ More
Online health communities (OHCs) offer the promise of connecting with supportive peers. Forming these connections first requires finding relevant peers - a process that can be time-consuming. Peer recommendation systems are a computational approach to make finding peers easier during a health journey. By encouraging OHC users to alter their online social networks, peer recommendations could increase available support. But these benefits are hypothetical and based on mixed, observational evidence. To experimentally evaluate the effect of peer recommendations, we conceptualize these systems as health interventions designed to increase specific beneficial connection behaviors. In this paper, we designed a peer recommendation intervention to increase two behaviors: reading about peer experiences and interacting with peers. We conducted an initial feasibility assessment of this intervention by conducting a 12-week field study in which 79 users of CaringBridge received weekly peer recommendations via email. Our results support the usefulness and demand for peer recommendation and suggest benefits to evaluating larger peer recommendation interventions. Our contributions include practical guidance on the development and evaluation of peer recommendation interventions for OHCs.
△ Less
Submitted 24 January, 2025; v1 submitted 11 September, 2022;
originally announced September 2022.
-
Patterns of Patient and Caregiver Mutual Support Connections in an Online Health Community
Authors:
Zachary Levonian,
Marco Dow,
Drew Erikson,
Sourojit Ghosh,
Hannah Miller Hillberg,
Saumik Narayanan,
Loren Terveen,
Svetlana Yarosh
Abstract:
Online health communities offer the promise of support benefits to users, in particular because these communities enable users to find peers with similar experiences. Building mutually supportive connections between peers is a key motivation for using online health communities. However, a user's role in a community may influence the formation of peer connections. In this work, we study patterns of…
▽ More
Online health communities offer the promise of support benefits to users, in particular because these communities enable users to find peers with similar experiences. Building mutually supportive connections between peers is a key motivation for using online health communities. However, a user's role in a community may influence the formation of peer connections. In this work, we study patterns of peer connections between two structural health roles: patient and non-professional caregiver. We examine user behavior in an online health community where finding peers is not explicitly supported. This context lets us use social network analysis methods to explore the growth of such connections in the wild and identify users' peer communication preferences. We investigated how connections between peers were initiated, finding that initiations are more likely between two authors who have the same role and who are close within the broader communication network. Relationships are also more likely to form and be more interactive when authors have the same role. Our results have implications for the design of systems supporting peer communication, e.g. peer-to-peer recommendation systems.
△ Less
Submitted 10 September, 2020; v1 submitted 31 July, 2020;
originally announced July 2020.
-
"I Cannot Do All of This Alone": Exploring Instrumental and Prayer Support in Online Health Communities
Authors:
C. Estelle Smith,
Zachary Levonian,
Haiwei Ma,
Robert Giaquinto,
Gemma Lein-Mcdonough,
Zixuan Li,
Susan O'Conner-Von,
Svetlana Yarosh
Abstract:
Online Health Communities (OHCs) are known to provide substantial emotional and informational support to patients and family caregivers facing life-threatening diagnoses like cancer and other illnesses, injuries, or chronic conditions. Yet little work explores how OHCs facilitate other vital forms of social support, especially instrumental support. We partner with CaringBridge.org---a prominent OH…
▽ More
Online Health Communities (OHCs) are known to provide substantial emotional and informational support to patients and family caregivers facing life-threatening diagnoses like cancer and other illnesses, injuries, or chronic conditions. Yet little work explores how OHCs facilitate other vital forms of social support, especially instrumental support. We partner with CaringBridge.org---a prominent OHC for journaling about health crises---to complete a two-phase study focused on instrumental support. Phase one involves a content analysis of 641 CaringBridge updates. Phase two is a survey of 991 CaringBridge users. Results show that patients and family caregivers diverge from their support networks in their preferences for specific instrumental support types. Furthermore, ``prayer support'' emerged as the most prominent support category across both phases. We discuss design implications to accommodate divergent preferences and to expand the instrumental support network. We also discuss the need for future work to empower family caregivers and to support spirituality.
△ Less
Submitted 24 May, 2020;
originally announced May 2020.