-
KIRETT -- A wearable device to support rescue operations using artificial intelligence to improve first aid
Authors:
Johannes Zenkert,
Christian Weber,
Mubaris Nadeem,
Lisa Bender,
Madjid Fathi,
Abu Shad Ahammed,
Aniebiet Micheal Ezekiel,
Roman Obermaisser,
Maximilian Bradford
Abstract:
This short paper presents first steps in the scientific part of the KIRETT project, which aims to improve first aid during rescue operations using a wearable device. The wearable is used for computer-aided situation recognition by means of artificial intelligence. It provides contextual recommendations for actions and operations to rescue personnel and is intended to minimize damage to patients du…
▽ More
This short paper presents first steps in the scientific part of the KIRETT project, which aims to improve first aid during rescue operations using a wearable device. The wearable is used for computer-aided situation recognition by means of artificial intelligence. It provides contextual recommendations for actions and operations to rescue personnel and is intended to minimize damage to patients due to incorrect treatment, as well as increase the probability of survival. The paper describes a first overview of research approaches within the project.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
The Impact of Background Speech on Interruption Detection in Collaborative Groups
Authors:
Mariah Bradford,
Nikhil Krishnaswamy,
Nathaniel Blanchard
Abstract:
Interruption plays a crucial role in collaborative learning, shaping group interactions and influencing knowledge construction. AI-driven support can assist teachers in monitoring these interactions. However, most previous work on interruption detection and interpretation has been conducted in single-conversation environments with relatively clean audio. AI agents deployed in classrooms for collab…
▽ More
Interruption plays a crucial role in collaborative learning, shaping group interactions and influencing knowledge construction. AI-driven support can assist teachers in monitoring these interactions. However, most previous work on interruption detection and interpretation has been conducted in single-conversation environments with relatively clean audio. AI agents deployed in classrooms for collaborative learning within small groups will need to contend with multiple concurrent conversations -- in this context, overlapping speech will be ubiquitous, and interruptions will need to be identified in other ways. In this work, we analyze interruption detection in single-conversation and multi-group dialogue settings. We then create a state-of-the-art method for interruption identification that is robust to overlapping speech, and thus could be deployed in classrooms. Further, our work highlights meaningful linguistic and prosodic information about how interruptions manifest in collaborative group interactions. Our investigation also paves the way for future works to account for the influence of overlapping speech from multiple groups when tracking group dialog.
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
Dude, where's my utterance? Evaluating the effects of automatic segmentation and transcription on CPS detection
Authors:
Videep Venkatesha,
Mariah Bradford,
Nathaniel Blanchard
Abstract:
Collaborative Problem-Solving (CPS) markers capture key aspects of effective teamwork, such as staying on task, avoiding interruptions, and generating constructive ideas. An AI system that reliably detects these markers could help teachers identify when a group is struggling or demonstrating productive collaboration. Such a system requires an automated pipeline composed of multiple components. In…
▽ More
Collaborative Problem-Solving (CPS) markers capture key aspects of effective teamwork, such as staying on task, avoiding interruptions, and generating constructive ideas. An AI system that reliably detects these markers could help teachers identify when a group is struggling or demonstrating productive collaboration. Such a system requires an automated pipeline composed of multiple components. In this work, we evaluate how CPS detection is impacted by automating two critical components: transcription and speech segmentation. On the public Weights Task Dataset (WTD), we find CPS detection performance with automated transcription and segmentation methods is comparable to human-segmented and manually transcribed data; however, we find the automated segmentation methods reduces the number of utterances by 26.5%, impacting the the granularity of the data. We discuss the implications for developing AI-driven tools that support collaborative learning in classrooms.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
An Exploration of Internal States in Collaborative Problem Solving
Authors:
Sifatul Anindho,
Videep Venkatesha,
Mariah Bradford,
Anne M. Cleary,
Nathaniel Blanchard
Abstract:
Collaborative problem solving (CPS) is a complex cognitive, social, and emotional process that is increasingly prevalent in educational and professional settings. This study investigates the emotional states of individuals during CPS using a mixed-methods approach. Teams of four first completed a novel CPS task. Immediately after, each individual was placed in an isolated room where they reviewed…
▽ More
Collaborative problem solving (CPS) is a complex cognitive, social, and emotional process that is increasingly prevalent in educational and professional settings. This study investigates the emotional states of individuals during CPS using a mixed-methods approach. Teams of four first completed a novel CPS task. Immediately after, each individual was placed in an isolated room where they reviewed the video of their group performing the task and self-reported their internal experiences throughout the task. We performed a linguistic analysis of these internal monologues, providing insights into the range of emotions individuals experience during CPS. Our analysis showed distinct patterns in language use, including characteristic unigrams and bigrams, key words and phrases, emotion labels, and semantic similarity between emotion-related words.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
AdriĆ de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative Dialogues
Authors:
Hannah VanderHoeven,
Brady Bhalla,
Ibrahim Khebour,
Austin Youngren,
Videep Venkatesha,
Mariah Bradford,
Jack Fitzgerald,
Carlos Mabrey,
Jingxuan Tu,
Yifan Zhu,
Kenneth Lai,
Changsoo Jung,
James Pustejovsky,
Nikhil Krishnaswamy
Abstract:
We present TRACE, a novel system for live *common ground* tracking in situated collaborative tasks. With a focus on fast, real-time performance, TRACE tracks the speech, actions, gestures, and visual attention of participants, uses these multimodal inputs to determine the set of task-relevant propositions that have been raised as the dialogue progresses, and tracks the group's epistemic position a…
▽ More
We present TRACE, a novel system for live *common ground* tracking in situated collaborative tasks. With a focus on fast, real-time performance, TRACE tracks the speech, actions, gestures, and visual attention of participants, uses these multimodal inputs to determine the set of task-relevant propositions that have been raised as the dialogue progresses, and tracks the group's epistemic position and beliefs toward them as the task unfolds. Amid increased interest in AI systems that can mediate collaborations, TRACE represents an important step forward for agents that can engage with multiparty, multimodal discourse.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Speech Is Not Enough: Interpreting Nonverbal Indicators of Common Knowledge and Engagement
Authors:
Derek Palmer,
Yifan Zhu,
Kenneth Lai,
Hannah VanderHoeven,
Mariah Bradford,
Ibrahim Khebour,
Carlos Mabrey,
Jack Fitzgerald,
Nikhil Krishnaswamy,
Martha Palmer,
James Pustejovsky
Abstract:
Our goal is to develop an AI Partner that can provide support for group problem solving and social dynamics. In multi-party working group environments, multimodal analytics is crucial for identifying non-verbal interactions of group members. In conjunction with their verbal participation, this creates an holistic understanding of collaboration and engagement that provides necessary context for the…
▽ More
Our goal is to develop an AI Partner that can provide support for group problem solving and social dynamics. In multi-party working group environments, multimodal analytics is crucial for identifying non-verbal interactions of group members. In conjunction with their verbal participation, this creates an holistic understanding of collaboration and engagement that provides necessary context for the AI Partner. In this demo, we illustrate our present capabilities at detecting and tracking nonverbal behavior in student task-oriented interactions in the classroom, and the implications for tracking common ground and engagement.
△ Less
Submitted 7 December, 2024;
originally announced December 2024.
-
Any Other Thoughts, Hedgehog? Linking Deliberation Chains in Collaborative Dialogues
Authors:
Abhijnan Nath,
Videep Venkatesha,
Mariah Bradford,
Avyakta Chelle,
Austin Youngren,
Carlos Mabrey,
Nathaniel Blanchard,
Nikhil Krishnaswamy
Abstract:
Question-asking in collaborative dialogue has long been established as key to knowledge construction, both in internal and collaborative problem solving. In this work, we examine probing questions in collaborative dialogues: questions that explicitly elicit responses from the speaker's interlocutors. Specifically, we focus on modeling the causal relations that lead directly from utterances earlier…
▽ More
Question-asking in collaborative dialogue has long been established as key to knowledge construction, both in internal and collaborative problem solving. In this work, we examine probing questions in collaborative dialogues: questions that explicitly elicit responses from the speaker's interlocutors. Specifically, we focus on modeling the causal relations that lead directly from utterances earlier in the dialogue to the emergence of the probing question. We model these relations using a novel graph-based framework of deliberation chains, and reframe the problem of constructing such chains as a coreference-style clustering problem. Our framework jointly models probing and causal utterances and the links between them, and we evaluate on two challenging collaborative task datasets: the Weights Task and DeliData. Our results demonstrate the effectiveness of our theoretically-grounded approach compared to both baselines and stronger coreference approaches, and establish a standard of performance in this novel task.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Common Ground Tracking in Multimodal Dialogue
Authors:
Ibrahim Khebour,
Kenneth Lai,
Mariah Bradford,
Yifan Zhu,
Richard Brutti,
Christopher Tam,
Jingxuan Tu,
Benjamin Ibarra,
Nathaniel Blanchard,
Nikhil Krishnaswamy,
James Pustejovsky
Abstract:
Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on ``dialogue state tracking'' (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is ``common ground tracking'' (CGT), which ide…
▽ More
Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on ``dialogue state tracking'' (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is ``common ground tracking'' (CGT), which identifies the shared belief space held by all of the participants in a task-oriented dialogue: the task-relevant propositions all participants accept as true. In this paper we present a method for automatically identifying the current set of shared beliefs and ``questions under discussion'' (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth, establishing a benchmark in this novel, challenging task.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Authors:
Corbyn Terpstra,
Ibrahim Khebour,
Mariah Bradford,
Brett Wisniewski,
Nikhil Krishnaswamy,
Nathaniel Blanchard
Abstract:
Collaborative problem solving (CPS) in teams is tightly coupled with the creation of shared meaning between participants in a situated, collaborative task. In this work, we assess the quality of different utterance segmentation techniques as an aid in annotating CPS. We (1) manually transcribe utterances in a dataset of triads collaboratively solving a problem involving dialogue and physical objec…
▽ More
Collaborative problem solving (CPS) in teams is tightly coupled with the creation of shared meaning between participants in a situated, collaborative task. In this work, we assess the quality of different utterance segmentation techniques as an aid in annotating CPS. We (1) manually transcribe utterances in a dataset of triads collaboratively solving a problem involving dialogue and physical object manipulation, (2) annotate collaborative moves according to these gold-standard transcripts, and then (3) apply these annotations to utterances that have been automatically segmented using toolkits from Google and OpenAI's Whisper. We show that the oracle utterances have minimal correspondence to automatically segmented speech, and that automatically segmented speech using different segmentation methods is also inconsistent. We also show that annotating automatically segmented speech has distinct implications compared with annotating oracle utterances--since most annotation schemes are designed for oracle cases, when annotating automatically-segmented utterances, annotators must invoke other information to make arbitrary judgments which other annotators may not replicate. We conclude with a discussion of how future annotation specs can account for these needs.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?
Authors:
Quynh Do,
Judith Gaspers,
Tobias Roding,
Melanie Bradford
Abstract:
This paper addresses the question as to what degree a BERT-based multilingual Spoken Language Understanding (SLU) model can transfer knowledge across languages. Through experiments we will show that, although it works substantially well even on distant language groups, there is still a gap to the ideal multilingual performance. In addition, we propose a novel BERT-based adversarial model architect…
▽ More
This paper addresses the question as to what degree a BERT-based multilingual Spoken Language Understanding (SLU) model can transfer knowledge across languages. Through experiments we will show that, although it works substantially well even on distant language groups, there is still a gap to the ideal multilingual performance. In addition, we propose a novel BERT-based adversarial model architecture to learn language-shared and language-specific representations for multilingual SLU. Our experimental results prove that the proposed model is capable of narrowing the gap to the ideal multilingual performance.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.