-
Scalable and consistent few-shot classification of survey responses using text embeddings
Authors:
Jonas Timmann Mjaaland,
Markus Fleten Kreutzer,
Halvor Tyseng,
Rebeckah K. Fussell,
Gina Passante,
N. G. Holmes,
Anders Malthe-Sørenssen,
Tor Ole B. Odden
Abstract:
Qualitative analysis of open-ended survey responses is a commonly-used research method in the social sciences, but traditional coding approaches are often time-consuming and prone to inconsistency. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in qualitative analysis, si…
▽ More
Qualitative analysis of open-ended survey responses is a commonly-used research method in the social sciences, but traditional coding approaches are often time-consuming and prone to inconsistency. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in qualitative analysis, since they demand extensive labeled data, disrupt established qualitative workflows, and/or yield variable results. In this paper, we introduce a text embedding-based classification framework that requires only a handful of examples per category and fits well with standard qualitative workflows. When benchmarked against human analysis of a conceptual physics survey consisting of 2899 open-ended responses, our framework achieves a Cohen's Kappa ranging from 0.74 to 0.83 as compared to expert human coders in an exhaustive coding scheme. We further show how performance of this framework improves with fine-tuning of the text embedding model, and how the method can be used to audit previously-analyzed datasets. These findings demonstrate that text embedding-assisted coding can flexibly scale to thousands of responses without sacrificing interpretability, opening avenues for deductive qualitative analysis at scale.
△ Less
Submitted 27 August, 2025;
originally announced August 2025.
-
Comparing large language models for supervised analysis of students' lab notes
Authors:
Rebeckah K. Fussell,
Megan Flynn,
Anil Damle,
Michael F. J. Fox,
N. G. Holmes
Abstract:
Recent advancements in large language models (LLMs) hold significant promise in improving physics education research that uses machine learning. In this study, we compare the application of various models to perform large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students' typed lab notes through sentence-level labeling. S…
▽ More
Recent advancements in large language models (LLMs) hold significant promise in improving physics education research that uses machine learning. In this study, we compare the application of various models to perform large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students' typed lab notes through sentence-level labeling. Specifically, we use training data to fine-tune two different LLMs, BERT and LLaMA, and compare the performance of these models to both a traditional bag of words approach and a few-shot LLM (without fine-tuning).} We evaluate the models based on their resource use, performance metrics, and research outcomes when identifying skills in lab notes. We find that higher-resource models often, but not necessarily, perform better than lower-resource models. We also find that all models estimate similar trends in research outcomes, although the absolute values of the estimated measurements are not always within uncertainties of each other. We use the results to discuss relevant considerations for education researchers seeking to select a model type to use as a classifier.
△ Less
Submitted 24 February, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
A method to assess trustworthiness of machine coding at scale
Authors:
Rebeckah K. Fussell,
Emily M. Stump,
N. G. Holmes
Abstract:
Physics education researchers are interested in using the tools of machine learning and natural language processing to make quantitative claims from natural language and text data, such as open-ended responses to survey questions. The aspiration is that this form of machine coding may be more efficient and consistent than human coding, allowing much larger and broader data sets to be analyzed than…
▽ More
Physics education researchers are interested in using the tools of machine learning and natural language processing to make quantitative claims from natural language and text data, such as open-ended responses to survey questions. The aspiration is that this form of machine coding may be more efficient and consistent than human coding, allowing much larger and broader data sets to be analyzed than is practical with human coders. Existing work that uses these tools, however, does not investigate norms that allow for trustworthy quantitative claims without full reliance on cross-checking with human coding, which defeats the purpose of using these automated tools. Here we propose a four-part method for making such claims with supervised natural language processing: evaluating a trained model, calculating statistical uncertainty, calculating systematic uncertainty from the trained algorithm, and calculating systematic uncertainty from novel data sources. We provide evidence for this method using data from two distinct short response survey questions with two distinct coding schemes. We also provide a real-world example of using these practices to machine code a data set unseen by human coders. We offer recommendations to guide physics education researchers who may use machine-coding methods in the future.
△ Less
Submitted 7 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Instructing nontraditional physics labs: Toward responsiveness to student epistemic framing
Authors:
Meagan Sundstrom,
Rebeckah K. Fussell,
Anna McLean Phillips,
Mark Akubo,
Scott E. Allen,
David Hammer,
Rachel E. Scherr,
N. G. Holmes
Abstract:
Research on nontraditional laboratory (lab) activities in physics shows that students often expect to verify predetermined results, as takes place in traditional activities. This understanding of what is taking place, or epistemic framing, may impact their behaviors in the lab, either productively or unproductively. In this paper, we present an analysis of student epistemic framing in a nontraditi…
▽ More
Research on nontraditional laboratory (lab) activities in physics shows that students often expect to verify predetermined results, as takes place in traditional activities. This understanding of what is taking place, or epistemic framing, may impact their behaviors in the lab, either productively or unproductively. In this paper, we present an analysis of student epistemic framing in a nontraditional lab to understand how instructional context, specifically instructor behaviors, may shape student framing. We present video data from a lab section taught by an experienced teaching assistant (TA), with 19 students working in seven groups. We argue that student framing in this lab is evidenced by whether or not students articulate experimental predictions and by the extent to which they take up opportunities to construct knowledge (epistemic agency). We show that the TA's attempts to shift student frames generally succeed with respect to experimental predictions but are less successful with respect to epistemic agency. In part, we suggest, the success of the TA's attempts reflects whether and how they are responsive to students' current framing. This work offers evidence that instructors can shift students' frames in nontraditional labs, while also illuminating the complexities of both student framing and the role of the instructor in shifting that framing in this context.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.