Skip to main content

Showing 1–12 of 12 results for author: Steenhoek, B

.
  1. arXiv:2506.00750  [pdf, other

    cs.SE cs.AI

    CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

    Authors: Monoshi Kumar Roy, Simin Chen, Benjamin Steenhoek, Jinjun Peng, Gail Kaiser, Baishakhi Ray, Wei Le

    Abstract: Understanding and reasoning about code semantics is essential for enhancing code LLMs' abilities to solve real-world software engineering (SE) tasks. Although several code reasoning benchmarks exist, most rely on synthetic datasets or educational coding problems and focus on coarse-grained reasoning tasks such as input/output prediction, limiting their effectiveness in evaluating LLMs in practical… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  2. arXiv:2412.14308   

    cs.SE cs.LG

    Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

    Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we pr… ▽ More

    Submitted 6 January, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: This work was intended as a replacement of arXiv:2310.02368 and any subsequent updates will appear there

  3. arXiv:2412.14306  [pdf, other

    cs.SE cs.CR cs.LG

    Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE

    Authors: Benjamin Steenhoek, Kalpathy Sivaraman, Renata Saldivar Gonzalez, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, Wei Le

    Abstract: This paper presents the first empirical study of a vulnerability detection and fix tool with professional software developers on real projects that they own. We implemented DeepVulGuard, an IDE-integrated tool based on state-of-the-art detection and fix models, and show that it has promising performance on benchmarks of historic vulnerability data. DeepVulGuard scans code for vulnerabilities (incl… ▽ More

    Submitted 25 April, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted to ICSE 2025 research track. Camera-ready version with grant reference number fixed in acknowledgments

  4. arXiv:2403.17218  [pdf, other

    cs.SE cs.CR cs.LG

    To Err is Machine: Vulnerability Detection Challenges LLM Reasoning

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Hengbo Tong, Swarna Das, Earl T. Barr, Wei Le

    Abstract: In this paper, we present a challenging code reasoning task: vulnerability detection. Large Language Models (LLMs) have shown promising results in natural-language and math reasoning, but state-of-the-art (SOTA) models reported only 54.5% Balanced Accuracy in our vulnerability detection evaluation, even those models pre-trained on large amounts of source code. Our error analysis on LLM responses s… ▽ More

    Submitted 7 January, 2025; v1 submitted 25 March, 2024; originally announced March 2024.

  5. arXiv:2311.04109  [pdf, other

    cs.LG cs.CR

    Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Shaila Sharmin, Wei Le

    Abstract: Recently, pretrained language models have shown state-of-the-art performance on the vulnerability detection task. These models are pretrained on a large corpus of source code, then fine-tuned on a smaller supervised vulnerability dataset. Due to the different training objectives and the performance of the models, it is interesting to consider whether the models have learned the semantics of code r… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  6. arXiv:2310.02368  [pdf, other

    cs.SE cs.LG

    Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

    Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Software testing is a crucial aspect of software development, and the creation of high-quality tests that adhere to best practices is essential for effective maintenance. Recently, Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases. However, these LLMs are often trained on vast amounts of publicly available code, which may includ… ▽ More

    Submitted 6 January, 2025; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to DeepTest 2025 (ICSE Workshop). Previously this version appeared as arXiv:2412.14308 which was submitted as a new work by accident

  7. arXiv:2309.11004  [pdf, other

    cs.SE

    Reproducing Failures in Fault Signatures

    Authors: Ashwin Kallingal Joshy, Benjamin Steenhoek, Xiuyuan Guo, Wei Le

    Abstract: Software often fails in the field, however reproducing and debugging field failures is very challenging: the failure-inducing input may be missing, and the program setup can be complicated and hard to reproduce by the developers. In this paper, we propose to generate fault signatures from the failure locations and the original source code to reproduce the faults in small executable programs. We sa… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  8. arXiv:2306.07487  [pdf, other

    cs.SE

    TRACED: Execution-aware Pre-training for Source Code

    Authors: Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray

    Abstract: Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted by ICSE 2024 (Early Cycle). Camera-ready is in preparation

  9. arXiv:2305.02515  [pdf, other

    cs.SE

    A Study of Static Warning Cascading Tools (Experience Paper)

    Authors: Xiuyuan Guo, Ashwin Kallingal Joshy, Benjamin Steenhoek, Wei Le, Lori Flynn

    Abstract: Static analysis is widely used for software assurance. However, static analysis tools can report an overwhelming number of warnings, many of which are false positives. Applying static analysis to a new version, a large number of warnings can be only relevant to the old version. Inspecting these warnings is a waste of time and can prevent developers from finding the new bugs in the new version. In… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 11 pages ( include references) , 12 Figures

  10. arXiv:2212.08109  [pdf, other

    cs.SE cs.CR cs.LG

    An Empirical Study of Deep Learning Models for Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, Wei Le

    Abstract: Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a good understanding of these models. This limits the further advancement of model robustness, debugging, and deployment for the vulnerability detection. In this… ▽ More

    Submitted 12 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 12 pages, 14 figures. Accepted at ICSE 2023. Camera-ready version

  11. arXiv:2212.08108  [pdf, other

    cs.SE cs.CR cs.LG

    Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection

    Authors: Benjamin Steenhoek, Hongyang Gao, Wei Le

    Abstract: Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs… ▽ More

    Submitted 1 October, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Accepted at ICSE 2024 (Early Cycle). Camera-ready version

  12. Validating Static Warnings via Testing Code Fragments

    Authors: Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, Wei Le

    Abstract: Static analysis is an important approach for finding bugs and vulnerabilities in software. However, inspecting and confirming static warnings are challenging and time-consuming. In this paper, we present a novel solution that automatically generates test cases based on static warnings to validate true and false positives. We designed a syntactic patching algorithm that can generate syntactically v… ▽ More

    Submitted 28 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis July 11 to 17, 2021, Denmark. 13 pages

    ACM Class: D.2.5; D.2.4; F.3.1