-
Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy
Authors:
Praveenbalaji Rajendran,
Yong Yang,
Thomas R. Niedermayr,
Michael Gensheimer,
Beth Beadle,
Quynh-Thu Le,
Lei Xing,
Xianjin Dai
Abstract:
Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial i…
▽ More
Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial intelligence (AI) techniques have significantly enhanced the auto-contouring of normal tissues, accurate delineation of RT target volumes remains a challenge. In this study, we propose a visual language model-based RT target volume auto-delineation network termed Radformer. The Radformer utilizes a hierarichal vision transformer as the backbone and incorporates large language models to extract text-rich features from clinical data. We introduce a visual language attention module (VLAM) for integrating visual and linguistic features for language-aware visual encoding (LAVE). The Radformer has been evaluated on a dataset comprising 2985 patients with head-and-neck cancer who underwent RT. Metrics, including the Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were used to evaluate the performance of the model quantitatively. Our results demonstrate that the Radformer has superior segmentation performance compared to other state-of-the-art models, validating its potential for adoption in RT practice.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?
Authors:
Rainer Niedermayr,
Stefan Wagner
Abstract:
Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight a…
▽ More
Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
Too Trivial To Test? An Inverse View on Defect Prediction to Identify Methods with Low Fault Risk
Authors:
Rainer Niedermayr,
Tobias Röhm,
Stefan Wagner
Abstract:
Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect p…
▽ More
Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios.
Aims. We take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being "trivial". We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions.
Method. We compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk. We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level.
Results. Our results show that inverse defect prediction can identify approx. 32-44% of the methods of a project to have a low fault risk; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even eleven times less likely to contain a fault.
Conclusions. Inverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios.
△ Less
Submitted 2 November, 2018;
originally announced November 2018.
-
Poster: Identification of Methods with Low Fault Risk
Authors:
Rainer Niedermayr,
Tobias Röhm,
Stefan Wagner
Abstract:
Test resources are usually limited and therefore it is often not possible to completely test an application before a release. Therefore, testers need to focus their activities on the relevant code regions. In this paper, we introduce an inverse defect prediction approach to identify methods that contain hardly any faults. We applied our approach to six Java open-source projects and show that on av…
▽ More
Test resources are usually limited and therefore it is often not possible to completely test an application before a release. Therefore, testers need to focus their activities on the relevant code regions. In this paper, we introduce an inverse defect prediction approach to identify methods that contain hardly any faults. We applied our approach to six Java open-source projects and show that on average 31.6% of the methods of a project have a low fault risk; they contain in total, on average, only 5.8% of all faults. Furthermore, the results suggest that, unlike defect prediction, our approach can also be applied in cross-project prediction scenarios. Therefore, inverse defect prediction can help prioritize untested code areas and guide testers to increase the fault detection probability.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Ticket Coverage: Putting Test Coverage into Context
Authors:
Jakob Rott,
Rainer Niedermayr,
Elmar Juergens,
Dennis Pagano
Abstract:
There is no metric that determines how well the implementation of a ticket has been tested. As a consequence, code changed within the context of a ticket might unintentionally remain untested and get into production. This is a major problem, because changed code is more fault-prone than unchanged code. In this paper, we introduce the metric ticket coverage which puts test coverage into the context…
▽ More
There is no metric that determines how well the implementation of a ticket has been tested. As a consequence, code changed within the context of a ticket might unintentionally remain untested and get into production. This is a major problem, because changed code is more fault-prone than unchanged code. In this paper, we introduce the metric ticket coverage which puts test coverage into the context of tickets. For each ticket, it determines the ratio of changed methods covered by automated or manual tests. We conducted an empirical study on an industrial system consisting of 650k lines of Java code and show that ticket coverage brings transparency into the test state of tickets and reveals relevant test gaps.
△ Less
Submitted 20 April, 2018;
originally announced April 2018.
-
Will My Tests Tell Me If I Break This Code?
Authors:
Rainer Niedermayr,
Elmar Juergens,
Stefan Wagner
Abstract:
Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually…
▽ More
Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually are in detecting regression faults. Our goal was to evaluate the validity of code coverage as a measure for test effectiveness. To do so, we conducted an empirical study in which we applied an extreme mutation testing approach to analyze the tests of open-source projects written in Java. We assessed the ratio of pseudo-tested methods (those tested in a way such that faults would not be detected) to all covered methods and judged their impact on the software project. The results show that the ratio of pseudo-tested methods is acceptable for unit tests but not for system tests (that execute large portions of the whole system). Therefore, we conclude that the coverage metric is only a valid effectiveness indicator for unit tests.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.