-
Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data
Authors:
Swati Rallapalli,
Shannon Gallagher,
Andrew O. Mellinger,
Jasmine Ratchford,
Anusha Sinha,
Tyler Brooks,
William R. Nichols,
Nick Winski,
Bryan Brown
Abstract:
We study the efficacy of fine-tuning Large Language Models (LLMs) for the specific task of report (government archives, news, intelligence reports) summarization. While this topic is being very actively researched - our specific application set-up faces two challenges: (i) ground-truth summaries maybe unavailable (e.g., for government archives), and (ii) availability of limited compute power - the…
▽ More
We study the efficacy of fine-tuning Large Language Models (LLMs) for the specific task of report (government archives, news, intelligence reports) summarization. While this topic is being very actively researched - our specific application set-up faces two challenges: (i) ground-truth summaries maybe unavailable (e.g., for government archives), and (ii) availability of limited compute power - the sensitive nature of the application requires that computation is performed on-premise and for most of our experiments we use one or two A100 GPU cards. Under this set-up we conduct experiments to answer the following questions. First, given that fine-tuning the LLMs can be resource intensive, is it feasible to fine-tune them for improved report summarization capabilities on-premise? Second, what are the metrics we could leverage to assess the quality of these summaries? We conduct experiments on two different fine-tuning approaches in parallel and our findings reveal interesting trends regarding the utility of fine-tuning LLMs. Specifically, we find that in many cases, fine-tuning helps improve summary quality and in other cases it helps by reducing the number of invalid or garbage summaries.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
The CESAW dataset: a conversation
Authors:
Derek M. Jones,
William R. Nichols
Abstract:
An analysis of the 61,817 tasks performed by developers working on 45 projects, implemented using Team Software Process, is documented via a conversation between a data analyst and the person who collected, compiled, and originally analyzed the data. Five projects were safety critical, containing a total of 28,899 tasks.
Projects were broken down using a Work Breakdown Structure to create a hier…
▽ More
An analysis of the 61,817 tasks performed by developers working on 45 projects, implemented using Team Software Process, is documented via a conversation between a data analyst and the person who collected, compiled, and originally analyzed the data. Five projects were safety critical, containing a total of 28,899 tasks.
Projects were broken down using a Work Breakdown Structure to create a hierarchical organization, with tasks at the leaf nodes. The WBS information enables task organization within a project to be investigated, e.g., how related tasks are sequenced together. Task data includes: kind of task, anonymous developer id, start/end time/date, as well as interruption and break times; a total of 203,621 time facts.
Task effort estimation accuracy was found to be influenced by factors such as the person making the estimate, the project involved, and the propensity to use round numbers.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Assessing Practitioner Beliefs about Software Engineering
Authors:
N. C. Shrikanth,
William Nichols,
Fahmid Morshed Fahid,
Tim Menzies
Abstract:
Software engineering is a highly dynamic discipline. Hence, as times change, so too might our beliefs about core processes in this field. This paper checks some five beliefs that originated in the past decades that comment on the relationships between (i) developer productivity; (ii) software quality and (iii) years of developer experience. Using data collected from 1,356 developers in the period…
▽ More
Software engineering is a highly dynamic discipline. Hence, as times change, so too might our beliefs about core processes in this field. This paper checks some five beliefs that originated in the past decades that comment on the relationships between (i) developer productivity; (ii) software quality and (iii) years of developer experience. Using data collected from 1,356 developers in the period 1995 to 2006, we found support for only one of the five beliefs titled "Quality entails productivity". We found no clear support for four other beliefs based on programming languages and software developers. However, from the sporadic evidence of the four other beliefs we learned that a narrow scope could delude practitioners in misinterpreting certain effects to hold in their day to day work. Lastly, through an aggregated view of assessing the five beliefs, we find programming languages act as a confounding factor for developer productivity and software quality. Thus the overall message of this work is that it is both important and possible to revisit old beliefs in SE. Researchers and practitioners should routinely retest old beliefs.
△ Less
Submitted 24 May, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
The Cost and Benefits of Static Analysis During Development
Authors:
William R. Nichols Jr
Abstract:
Without quantitative data, deciding whether and how to use static analysis in a development workflow is a matter of expert opinion and guesswork rather than an engineering trade-off. Moreover, relevant data collected under real-world conditions is scarce. Important but unknown quantitative parameters include, but are not limited to, the effort to apply the techniques, the effectiveness of removing…
▽ More
Without quantitative data, deciding whether and how to use static analysis in a development workflow is a matter of expert opinion and guesswork rather than an engineering trade-off. Moreover, relevant data collected under real-world conditions is scarce. Important but unknown quantitative parameters include, but are not limited to, the effort to apply the techniques, the effectiveness of removing defects, where in the workflow the analysis should be applied, and how static analysis interacts with other quality techniques. This study examined the detailed development process data 35 industrial development projects that included static analysis and that were also instrumented with the Team Software Process. We collected data project plans, logs of effort, defect, and size and post mortem reports and analyzed performance of their development activities to populate a parameterized performance model. We compared effort and defect levels with and without static analysis using a planning model that includes feedback for defect removal effectiveness and fix effort. We found evidence that using each tool developers found and removed defects at a higher rate than alternative removal techniques. Moreover, the early and inexpensive removal reduced not only final defect density but also total development effort. The contributions of this paper include real-world benchmarks of process data from projects using static analysis tools, a demonstration of a cost-effectiveness analysis using this data, and a recommendation these tools were consistently cost effective operationally.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Authors:
Tim Menzies,
William Nichols,
Forrest Shull,
Lucas Layman
Abstract:
Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects conducted around the world in the peri…
▽ More
Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects conducted around the world in the period from 2006--2014. To the best of our knowledge, this is the largest study yet published on this effect. We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch between this common rule of thumb and empirical data. In summary, DIE is not some constant across all projects. Rather, DIE might be an historical relic that occurs intermittently only in certain kinds of projects. This is a significant result since it predicts that new development processes that promise to faster retire more issues will not have a guaranteed return on investment (depending on the context where applied), and that a long-held truth in software engineering should not be considered a global truism.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.