Search | arXiv e-print repository

Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy

Authors: Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai

Abstract: Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial i… ▽ More Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial intelligence (AI) techniques have significantly enhanced the auto-contouring of normal tissues, accurate delineation of RT target volumes remains a challenge. In this study, we propose a visual language model-based RT target volume auto-delineation network termed Radformer. The Radformer utilizes a hierarichal vision transformer as the backbone and incorporates large language models to extract text-rich features from clinical data. We introduce a visual language attention module (VLAM) for integrating visual and linguistic features for language-aware visual encoding (LAVE). The Radformer has been evaluated on a dataset comprising 2985 patients with head-and-neck cancer who underwent RT. Metrics, including the Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were used to evaluate the performance of the model quantitatively. Our results demonstrate that the Radformer has superior segmentation performance compared to other state-of-the-art models, validating its potential for adoption in RT practice. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:1903.05432 [pdf, other]

doi 10.1145/3319008.3319021

Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

Authors: Rainer Niedermayr, Stefan Wagner

Abstract: Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight a… ▽ More Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: EASE 2019

ACM Class: D.2.5

Journal ref: 2019 ACM 23rd International Conference on Evaluation and Assessment in Software Engineering (EASE)

arXiv:1811.00820 [pdf, other]

doi 10.7717/peerj-cs.187

Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk

Authors: Rainer Niedermayr, Tobias Röhm, Stefan Wagner

Abstract: Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios. Aims. We take an inverse view on defect p… ▽ More Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios. Aims. We take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being "trivial". We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions. Method. We compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk. We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level. Results. Our results show that inverse defect prediction can identify approx. 32-44% of the methods of a project to have a low fault risk; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even eleven times less likely to contain a fault. Conclusions. Inverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios. △ Less

Submitted 2 November, 2018; originally announced November 2018.

Comments: Submitted to PeerJ CS

ACM Class: D.2.5; D.2.8

Journal ref: PeerJ Computer Science 5:e187, 2019

arXiv:1805.01132 [pdf, other]

doi 10.1145/3183440.3195022

Poster: Identification of Methods with Low Fault Risk

Authors: Rainer Niedermayr, Tobias Röhm, Stefan Wagner

Abstract: Test resources are usually limited and therefore it is often not possible to completely test an application before a release. Therefore, testers need to focus their activities on the relevant code regions. In this paper, we introduce an inverse defect prediction approach to identify methods that contain hardly any faults. We applied our approach to six Java open-source projects and show that on av… ▽ More Test resources are usually limited and therefore it is often not possible to completely test an application before a release. Therefore, testers need to focus their activities on the relevant code regions. In this paper, we introduce an inverse defect prediction approach to identify methods that contain hardly any faults. We applied our approach to six Java open-source projects and show that on average 31.6% of the methods of a project have a low fault risk; they contain in total, on average, only 5.8% of all faults. Furthermore, the results suggest that, unlike defect prediction, our approach can also be applied in cross-project prediction scenarios. Therefore, inverse defect prediction can help prioritize untested code areas and guide testers to increase the fault detection probability. △ Less

Submitted 3 May, 2018; originally announced May 2018.

Comments: ICSE 2018 Poster Track

ACM Class: D.2.5; D.2.8

Journal ref: 2018 IEEE/ACM International Conference on Software Engineering Companion (ICSE Companion)

arXiv:1804.07599 [pdf, other]

doi 10.1109/WETSoM.2017..1

Ticket Coverage: Putting Test Coverage into Context

Authors: Jakob Rott, Rainer Niedermayr, Elmar Juergens, Dennis Pagano

Abstract: There is no metric that determines how well the implementation of a ticket has been tested. As a consequence, code changed within the context of a ticket might unintentionally remain untested and get into production. This is a major problem, because changed code is more fault-prone than unchanged code. In this paper, we introduce the metric ticket coverage which puts test coverage into the context… ▽ More There is no metric that determines how well the implementation of a ticket has been tested. As a consequence, code changed within the context of a ticket might unintentionally remain untested and get into production. This is a major problem, because changed code is more fault-prone than unchanged code. In this paper, we introduce the metric ticket coverage which puts test coverage into the context of tickets. For each ticket, it determines the ratio of changed methods covered by automated or manual tests. We conducted an empirical study on an industrial system consisting of 650k lines of Java code and show that ticket coverage brings transparency into the test state of tickets and reveals relevant test gaps. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: WETSoM 2017

Report number: WETSoM 2017-08-32- ACM Class: D.2.5

Journal ref: 2017 IEEE/ACM 8th Workshop on Emerging Trends in Software Metrics (WETSoM)

arXiv:1611.07163 [pdf, other]

doi 10.1145/2896941.2896944

Will My Tests Tell Me If I Break This Code?

Authors: Rainer Niedermayr, Elmar Juergens, Stefan Wagner

Abstract: Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually… ▽ More Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually are in detecting regression faults. Our goal was to evaluate the validity of code coverage as a measure for test effectiveness. To do so, we conducted an empirical study in which we applied an extreme mutation testing approach to analyze the tests of open-source projects written in Java. We assessed the ratio of pseudo-tested methods (those tested in a way such that faults would not be detected) to all covered methods and judged their impact on the software project. The results show that the ratio of pseudo-tested methods is acceptable for unit tests but not for system tests (that execute large portions of the whole system). Therefore, we conclude that the coverage metric is only a valid effectiveness indicator for unit tests. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: 7 pages, 3 figures

Journal ref: Proceedings of the International Workshop on Continuous Software Evolution and Delivery (CSED '16). ACM, 2016

Showing 1–6 of 6 results for author: Niedermayr, R