-
eyeballvul: a future-proof benchmark for vulnerability detection in the wild
Abstract: Long contexts of recent LLMs have enabled a new use case: asking models to find security vulnerabilities in entire codebases. To evaluate model performance on this task, we introduce eyeballvul: a benchmark designed to test the vulnerability detection capabilities of language models at scale, that is sourced and updated weekly from the stream of published vulnerabilities in open-source repositorie… ▽ More
Submitted 13 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.
Comments: Due to a bug in the litellm library (we haven't tracked exactly which one, but probably at least https://github.com/BerriAI/litellm/commit/2452753e084e8134c0c484b32c63fb5f2950c5ba), our Gemini 1.5 Pro inference costs were incorrect. We've updated the relevant plot (Fig 7) and its interpretation (both Gemini 1.5 Pro and Claude 3.5 Sonnet stand out from the other models, not just Gemini 1.5 Pro)
-
Neural Normalized Min-Sum Message-Passing vs. Viterbi Decoding for the CCSDS Line Product Code
Abstract: The Consultative Committee for Space Data Systems (CCSDS) 141.11-O-1 Line Product Code (LPC) provides a rare opportunity to compare maximum-likelihood decoding and message passing. The LPC considered in this paper is intended to serve as the inner code in conjunction with a (255,239) Reed Solomon (RS) code whose symbols are bytes of data. This paper represents the 141.11-O-1 LPC as a bipartite gra… ▽ More
Submitted 15 November, 2021; originally announced November 2021.
Comments: This paper has been submitted to ICC 2022