-
A Controlled Experiment on the Impact of Intrusion Detection False Alarm Rate on Analyst Performance
Authors:
Lucas Layman,
William Roden
Abstract:
Organizations use intrusion detection systems (IDSes) to identify harmful activity among millions of computer network events. Cybersecurity analysts review IDS alarms to verify whether malicious activity occurred and to take remedial action. However, IDS systems exhibit high false alarm rates. This study examines the impact of IDS false alarm rate on human analyst sensitivity (probability of detec…
▽ More
Organizations use intrusion detection systems (IDSes) to identify harmful activity among millions of computer network events. Cybersecurity analysts review IDS alarms to verify whether malicious activity occurred and to take remedial action. However, IDS systems exhibit high false alarm rates. This study examines the impact of IDS false alarm rate on human analyst sensitivity (probability of detection), precision (positive predictive value), and time on task when evaluating IDS alarms. A controlled experiment was conducted with participants divided into two treatment groups, 50% IDS false alarm rate and 86% false alarm rate, who classified whether simulated IDS alarms were true or false alarms. Results show statistically significant differences in precision and time on task. The median values for the 86% false alarm rate group were 47% lower precision and 40% slower time on task than the 50% false alarm rate group. No significant difference in analyst sensitivity was observed.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Toward Predicting Success and Failure in CS2: A Mixed-Method Analysis
Authors:
Lucas Layman,
Yang Song,
Curry Guinn
Abstract:
Factors driving success and failure in CS1 are the subject of much study but less so for CS2. This paper investigates the transition from CS1 to CS2 in search of leading indicators of success in CS2. Both CS1 and CS2 at the University of North Carolina Wilmington (UNCW) are taught in Python with annual enrollments of 300 and 150 respectively. In this paper, we report on the following research ques…
▽ More
Factors driving success and failure in CS1 are the subject of much study but less so for CS2. This paper investigates the transition from CS1 to CS2 in search of leading indicators of success in CS2. Both CS1 and CS2 at the University of North Carolina Wilmington (UNCW) are taught in Python with annual enrollments of 300 and 150 respectively. In this paper, we report on the following research questions: 1) Are CS1 grades indicators of CS2 grades? 2) Does a quantitative relationship exist between CS2 course grade and a modified version of the SCS1 concept inventory? 3) What are the most challenging aspects of CS2, and how well does CS1 prepare students for CS2 from the student's perspective? We provide a quantitative analysis of 2300 CS1 and CS2 course grades from 2013--2019. In Spring 2019, we administered a modified version of the SCS1 concept inventory to 44 students in the first week of CS2. Further, 69 students completed an exit questionnaire at the conclusion of CS2 to gain qualitative student feedback on their challenges in CS2 and on how well CS1 prepared them for CS2. We find that 56% of students' grades were lower in CS2 than CS1, 18% improved their grades, and 26% earned the same grade. Of the changes, 62% were within one grade point. We find a statistically significant correlation between the modified SCS1 score and CS2 grade points. Students identify linked lists and class/object concepts among the most challenging. Student feedback on CS2 challenges and the adequacy of their CS1 preparations identify possible avenues for improving the CS1-CS2 transition.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Cry Wolf: Toward an Experimentation Platform and Dataset for Human Factors in Cyber Security Analysis
Authors:
William Roden,
Lucas Layman
Abstract:
Computer network defense is a partnership between automated systems and human cyber security analysts. The system behaviors, for example raising a high proportion of false alarms, likely impact cyber analyst performance. Experimentation in the analyst-system domain is challenging due to lack of access to security experts, the usability of attack datasets, and the training required to use security…
▽ More
Computer network defense is a partnership between automated systems and human cyber security analysts. The system behaviors, for example raising a high proportion of false alarms, likely impact cyber analyst performance. Experimentation in the analyst-system domain is challenging due to lack of access to security experts, the usability of attack datasets, and the training required to use security analysis tools. This paper describes Cry Wolf, an open source web application for user studies of cyber security analysis tasks. This paper also provides an open-access dataset of 73 true and false Intrusion Detection System (IDS) alarms derived from real-world examples of "impossible travel" scenarios. Cry Wolf and the impossible travel dataset were used in an experiment on the impact of IDS false alarm rate on analysts' abilities to correctly classify IDS alerts as true or false alarms. Results from that experiment are used to evaluate the quality of the dataset using difficulty and discrimination index measures drawn from classical test theory. Many alerts in the dataset provide good discrimination for participants' overall task performance.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Authors:
Tim Menzies,
William Nichols,
Forrest Shull,
Lucas Layman
Abstract:
Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects conducted around the world in the peri…
▽ More
Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects conducted around the world in the period from 2006--2014. To the best of our knowledge, this is the largest study yet published on this effect. We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch between this common rule of thumb and empirical data. In summary, DIE is not some constant across all projects. Rather, DIE might be an historical relic that occurs intermittently only in certain kinds of projects. This is a significant result since it predicts that new development processes that promise to faster retire more issues will not have a guaranteed return on investment (depending on the context where applied), and that a long-held truth in software engineering should not be considered a global truism.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
Less is More: Minimizing Code Reorganization using XTREE
Authors:
Rahul Krishna,
Tim Menzies,
Lucas Layman
Abstract:
Context: Developers use bad code smells to guide code reorganization. Yet developers, text books, tools, and researchers disagree on which bad smells are important. Objective: To evaluate the likelihood that a code reorganization to address bad code smells will yield improvement in the defect-proneness of the code. Method: We introduce XTREE, a tool that analyzes a historical log of defects seen p…
▽ More
Context: Developers use bad code smells to guide code reorganization. Yet developers, text books, tools, and researchers disagree on which bad smells are important. Objective: To evaluate the likelihood that a code reorganization to address bad code smells will yield improvement in the defect-proneness of the code. Method: We introduce XTREE, a tool that analyzes a historical log of defects seen previously in the code and generates a set of useful code changes. Any bad smell that requires changes outside of that set can be deprioritized (since there is no historical evidence that the bad smell causes any problems). Evaluation: We evaluate XTREE's recommendations for bad smell improvement against recommendations from previous work (Shatnawi, Alves, and Borges) using multiple data sets of code metrics and defect counts. Results: Code modules that are changed in response to XTREE's recommendations contain significantly fewer defects than recommendations from previous studies. Further, XTREE endorses changes to very few code metrics, and the bad smell recommendations (learned from previous studies) are not universal to all software projects. Conclusion: Before undertaking a code reorganization based on a bad smell report, use a tool like XTREE to check and ignore any such operations that are useless; i.e. ones which lack evidence in the historical record that it is useful to make that change. Note that this use case applies to both manual code reorganizations proposed by developers as well as those conducted by automatic methods. This recommendation assumes that there is an historical record. If none exists, then the results of this paper could be used as a guide.
△ Less
Submitted 14 May, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.