-
Classification of Quality Characteristics in Online User Feedback using Linguistic Analysis, Crowdsourcing and LLMs
Authors:
Eduard C. Groen,
Fabiano Dalpiaz,
Martijn van Vliet,
Boris Winter,
Joerg Doerr,
Sjaak Brinkkemper
Abstract:
Software qualities such as usability or reliability are among the strongest determinants of mobile app user satisfaction and constitute a significant portion of online user feedback on software products, making it a valuable source of quality-related feedback to guide the development process. The abundance of online user feedback warrants the automated identification of quality characteristics, bu…
▽ More
Software qualities such as usability or reliability are among the strongest determinants of mobile app user satisfaction and constitute a significant portion of online user feedback on software products, making it a valuable source of quality-related feedback to guide the development process. The abundance of online user feedback warrants the automated identification of quality characteristics, but the online user feedback's heterogeneity and the lack of appropriate training corpora limit the applicability of supervised machine learning. We therefore investigate the viability of three approaches that could be effective in low-data settings: language patterns (LPs) based on quality-related keywords, instructions for crowdsourced micro-tasks, and large language model (LLM) prompts. We determined the feasibility of each approach and then compared their accuracy. For the complex multiclass classification of quality characteristics, the LP-based approach achieved a varied precision (0.38-0.92) depending on the quality characteristic, and low recall; crowdsourcing achieved the best average accuracy in two consecutive phases (0.63, 0.72), which could be matched by the best-performing LLM condition (0.66) and a prediction based on the LLMs' majority vote (0.68). Our findings show that in this low-data setting, the two approaches that use crowdsourcing or LLMs instead of involving experts achieve accurate classifications, while the LP-based approach has only limited potential. The promise of crowdsourcing and LLMs in this context might even extend to building training corpora.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Guidelines for data analysis scripts
Authors:
Marijn van Vliet
Abstract:
Unorganized heaps of analysis code are a growing liability as data analysis pipelines are getting longer and more complicated. This is worrying, as neuroscience papers are getting retracted due to programmer error. In this paper, some guidelines are presented that help keep analysis code well organized, easy to understand and convenient to work with:
1. Each analysis step is one script
2. A sc…
▽ More
Unorganized heaps of analysis code are a growing liability as data analysis pipelines are getting longer and more complicated. This is worrying, as neuroscience papers are getting retracted due to programmer error. In this paper, some guidelines are presented that help keep analysis code well organized, easy to understand and convenient to work with:
1. Each analysis step is one script
2. A script either processes a single recording, or aggregates across recordings, never both
3. One master script to run the entire analysis
4. Save all intermediate results
5. Visualize all intermediate results
6. Each parameter and filename is defined only once
7. Distinguish files that are part of the official pipeline from other scripts
In addition to discussing the reasoning behind each guideline, an example analysis pipeline is presented as a case study to see how each guideline translates into code.
△ Less
Submitted 9 August, 2019; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Data Intensive High Energy Physics Analysis in a Distributed Cloud
Authors:
R. J. Sobie,
A. Agarwal,
M. Anderson,
P. Armstrong,
K. Fransham,
I. Gable,
D. Harris,
C. Leavett-Brown,
M. Paterson,
D. Penfold-Brown,
M. Vliet,
A. Charbonneau,
R. Impey,
W. Podaima
Abstract:
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We des…
▽ More
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We describe the process in which a user prepares an analysis virtual machine (VM) and submits batch jobs to a central scheduler. The system boots the user-specific VM on one of the IaaS clouds, runs the jobs and returns the output to the user. The user application accesses a central database for calibration data during the execution of the application. Similarly, the data is located in a central location and streamed by the running application. The system can easily run one hundred simultaneous jobs in an efficient manner and should scale to many hundreds and possibly thousands of user jobs.
△ Less
Submitted 1 January, 2011;
originally announced January 2011.