-
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach
Authors:
Ruikun Hou,
Babette Bühler,
Tim Fütterer,
Efe Bozkir,
Peter Gerjets,
Ulrich Trautwein,
Enkelejda Kasneci
Abstract:
Classroom discourse is an essential vehicle through which teaching and learning take place. Assessing different characteristics of discursive practices and linking them to student learning achievement enhances the understanding of teaching quality. Traditional assessments rely on manual coding of classroom observation protocols, which is time-consuming and costly. Despite many studies utilizing AI…
▽ More
Classroom discourse is an essential vehicle through which teaching and learning take place. Assessing different characteristics of discursive practices and linking them to student learning achievement enhances the understanding of teaching quality. Traditional assessments rely on manual coding of classroom observation protocols, which is time-consuming and costly. Despite many studies utilizing AI techniques to analyze classroom discourse at the utterance level, investigations into the evaluation of discursive practices throughout an entire lesson segment remain limited. To address this gap, our study proposes a novel text-centered multimodal fusion architecture to assess the quality of three discourse components grounded in the Global Teaching InSights (GTI) observation protocol: Nature of Discourse, Questioning, and Explanations. First, we employ attention mechanisms to capture inter- and intra-modal interactions from transcript, audio, and video streams. Second, a multi-task learning approach is adopted to jointly predict the quality scores of the three components. Third, we formulate the task as an ordinal classification problem to account for rating level order. The effectiveness of these designed elements is demonstrated through an ablation study on the GTI Germany dataset containing 92 videotaped math lessons. Our results highlight the dominant role of text modality in approaching this task. Integrating acoustic features enhances the model's consistency with human ratings, achieving an overall Quadratic Weighted Kappa score of 0.384, comparable to human inter-rater reliability (0.326). Our study lays the groundwork for the future development of automated discourse quality assessment to support teacher professional development through timely feedback on multidimensional discourse practices.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT
Authors:
Ruikun Hou,
Tim Fütterer,
Babette Bühler,
Efe Bozkir,
Peter Gerjets,
Ulrich Trautwein,
Enkelejda Kasneci
Abstract:
Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such h…
▽ More
Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating encouragement and warmth in classrooms, a key component of the Global Teaching Insights (GTI) study's observation protocol. To this end, we employed facial and speech emotion recognition with sentiment analysis to extract interpretable features from video, audio, and transcript data. The prediction task involved both classification and regression methods. Additionally, in light of recent large language models' remarkable text annotation capabilities, we evaluated ChatGPT's zero-shot performance on this scoring task based on transcripts. We demonstrated our approach on the GTI dataset, comprising 367 16-minute video segments from 92 authentic lesson recordings. The inferences of GPT-4 and the best-trained model yielded correlations of r = .341 and r = .441 with human ratings, respectively. Combining estimates from both models through averaging, an ensemble approach achieved a correlation of r = .513, comparable to human inter-rater reliability. Our model explanation analysis indicated that text sentiment features were the primary contributors to the trained model's decisions. Moreover, GPT-4 could deliver logical and concrete reasoning as potential teacher guidelines. Our findings provide insights into using advanced, multimodal techniques for automated classroom observation, aiming to foster teacher training through frequent and valuable feedback.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
WikiPulse - A News-Portal Based on Wikipedia
Authors:
Tobias Futterer,
Peter A. Gloor,
Tushar Malhotra,
Harrison Mfula,
Karsten Packmohr,
Stefan Schultheiss
Abstract:
More and more user-generated content is complementing conventional journalism. While we don't think that CNN or New York Times and its professional journalists will disappear anytime soon, formidable competition is emerging through humble Wikipedia editors. In earlier work (Becker 2012), we found that entertainment and sports news appeared on average about two hours earlier on Wikipedia than on CN…
▽ More
More and more user-generated content is complementing conventional journalism. While we don't think that CNN or New York Times and its professional journalists will disappear anytime soon, formidable competition is emerging through humble Wikipedia editors. In earlier work (Becker 2012), we found that entertainment and sports news appeared on average about two hours earlier on Wikipedia than on CNN and Reuters online. In this project we build a news-reader that automatically identifies late-breaking news among the most recent Wikipedia articles and then displays it on a dedicated Web site.
△ Less
Submitted 5 August, 2013;
originally announced August 2013.