Programming Languages
See recent articles
Showing new listings for Tuesday, 23 September 2025
- [1] arXiv:2509.16246 [pdf, html, other]
-
Title: VerilogMonkey: Exploring Parallel Scaling for Automated Verilog Code Generation with LLMsSubjects: Programming Languages (cs.PL); Hardware Architecture (cs.AR)
We present VerilogMonkey, an empirical study of parallel scaling for the under-explored task of automated Verilog generation. Parallel scaling improves LLM performance by sampling many outputs in parallel. Across multiple benchmarks and mainstream LLMs, we find that scaling to hundreds of samples is cost-effective in both time and money and, even without any additional enhancements such as post-training or agentic methods, surpasses prior results on LLM-based Verilog generation. We further dissect why parallel scaling delivers these gains and show how output randomness in LLMs affects its effectiveness.
- [2] arXiv:2509.16248 [pdf, html, other]
-
Title: GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2Savini Kashmira, Jayanaka Dantanarayana, Thamirawaran Sathiyalogeswaran, Yichao Yuan, Nishil Talati, Krisztian Flautner, Lingjia Tang, Jason MarsSubjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Software Engineering (cs.SE)
This paper presents GraphMend, a high-level compiler that eliminates FX graph breaks in PyTorch 2 programs. Although PyTorch 2 introduced TorchDynamo and TorchInductor to enable just-in-time graph compilation, unresolved dynamic control flow and unsupported Python constructs often fragment models into multiple FX graphs. These fragments force frequent fallbacks to eager mode, incur costly CPU-to-GPU synchronizations, and reduce optimization opportunities. GraphMend addresses this limitation by analyzing and transforming source code before execution. Built on the Jac compilation framework, GraphMend introduces two code transformations that remove graph breaks due to dynamic control flow and Python I/O functions. This design allows PyTorch's compilation pipeline to capture larger, uninterrupted FX graphs without requiring manual refactoring by developers. Evaluation across eight Hugging Face models shows that GraphMend removes all fixable graph breaks due to dynamic control flow and Python I/O functions, driving the break count to 0 in 6 models and reducing it from 5 to 2 in another model. On NVIDIA RTX 3090 and A40 GPUs, GraphMend achieves up to 75% latency reductions and up to 8% higher end-to-end throughput. These results demonstrate that high-level code transformation is an effective complement to PyTorch's dynamic JIT compilation pipeline, substantially improving both usability and performance.
- [3] arXiv:2509.17795 [pdf, html, other]
-
Title: Efficient Linearizability MonitoringSubjects: Programming Languages (cs.PL)
This paper revisits the fundamental problem of monitoring the linearizability of concurrent stacks, queues, sets, and multisets. Given a history of a library implementing one of these abstract data types, the monitoring problem is to answer whether the given history is linearizable. For stacks, queues, and (multi)sets, we present monitoring algorithms with complexities $\mathcal{O}(n^2)$, $\mathcal{O}(n\; log\, n)$, and $\mathcal{O}{(n)}$, respectively, where $n$ is the number of operations in the input history. For stacks and queues, our results hold under the standard assumption of {\it data-independence}, i.e., the behavior of the library is not sensitive to the actual values stored in the data structure. Past works to solve the same problems have cubic time complexity and (more seriously) have correctness issues: they either (i) lack correctness proofs or (ii) the suggested correctness proofs are erroneous (we present counter-examples), or (iii) have incorrect algorithms. Our improved complexity results rely on substantially different algorithms for which we provide detailed proofs of correctness. We have implemented our stack and queue algorithms in LiMo (Linearizability Monitor). We evaluate LiMo and compare it with the state-of-the-art tool Violin -- whose correctness proofs we have found errors in -- which checks for linearizability violations. Our experimental evaluation confirms that LiMo outperforms Violin regarding both efficiency and scalability.
New submissions (showing 3 of 3 entries)
- [4] arXiv:2509.16205 (cross-list from cs.ET) [pdf, html, other]
-
Title: A 200-Line Python Micro-Benchmark Suite for NISQ Circuit CompilersComments: 9 pages, 1 figure. Includes reproducibility instructions and code artifacts. Companion repository: this https URLSubjects: Emerging Technologies (cs.ET); Programming Languages (cs.PL)
We present this http URL, a compact (approx. 200 lines) Python script that automates the collection of key compiler metrics, i.e., gate depth, two-qubit-gate count, wall-clock compilation time, and memory footprint, across multiple open-source quantum circuit transpilers. The suite ships with six didactic circuits (3 to 8 qubits) implementing fundamental quantum algorithms and supports Qiskit, tket, Cirq, and the Qiskit-Braket provider; in this paper we showcase results for Qiskit 0.46 and Braket 1.16. The entire run completes in under three minutes on a laptop, emits a single CSV plus publisheable plot, and reproduces the figure here with one command. We release the code under the MIT licence to serve as a quick-start regression harness for NISQ compiler research.
- [5] arXiv:2509.16215 (cross-list from cs.LG) [pdf, html, other]
-
Title: Discovering Software Parallelization Points Using Deep Neural NetworksComments: 17 pages, 10 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE); Programming Languages (cs.PL); Software Engineering (cs.SE)
This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. Two genetic algorithm-based code generators were developed to produce two distinct types of code: (i) independent loops, which are parallelizable, and (ii) ambiguous loops, whose dependencies are unclear, making them impossible to define if the loop is parallelizable or not. The generated code snippets were tokenized and preprocessed to ensure a robust dataset. Two deep learning models - a Deep Neural Network (DNN) and a Convolutional Neural Network (CNN) - were implemented to perform the classification. Based on 30 independent runs, a robust statistical analysis was employed to verify the expected performance of both models, DNN and CNN. The CNN showed a slightly higher mean performance, but the two models had a similar variability. Experiments with varying dataset sizes highlighted the importance of data diversity for model performance. These results demonstrate the feasibility of using deep learning to automate the identification of parallelizable structures in code, offering a promising tool for software optimization and performance improvement.
- [6] arXiv:2509.16239 (cross-list from cs.LO) [pdf, html, other]
-
Title: Gödel Mirror: A Formal System For Contradiction-Driven RecursionComments: 10 pages. Preprint submitted to Logical Methods in Computer Science (LMCS)Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
We introduce the Gödel Mirror, a formal system defined in Lean 4 that treats contradiction as a control signal for recursive structural evolution.
Inspired by Gödelian self-reference, our system's operational semantics encode symbolic paradoxes as deterministic transitions. Unlike systems designed to guarantee normalization, the Gödel Mirror is a minimal and verifiable architecture that leverages a controlled, non-terminating loop as a productive feature.
Our Lean 4 mechanization proves that self-referential paradoxes are deterministically encapsulated and resolved into new structures without leading to logical explosion, yielding a paraconsistent inference loop: Paradox -> Encapsulate -> Reenter -> Node
We argue that this calculus opens a new class of symbolic systems in which contradiction is metabolized into structure, providing a formal basis for agents capable of resolving internal inconsistencies. - [7] arXiv:2509.16241 (cross-list from cs.CL) [pdf, html, other]
-
Title: REAMS: Reasoning Enhanced Algorithm for Maths SolvingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
The challenges of solving complex university-level mathematics problems, particularly those from MIT, and Columbia University courses, and selected tasks from the MATH dataset, remain a significant obstacle in the field of artificial intelligence. Conventional methods have consistently fallen short in this domain, highlighting the need for more advanced approaches. In this paper, we introduce a language-based solution that leverages zero-shot learning and mathematical reasoning to effectively solve, explain, and generate solutions for these advanced math problems. By integrating program synthesis, our method reduces reliance on large-scale training data while significantly improving problem-solving accuracy. Our approach achieves an accuracy of 90.15%, representing a substantial improvement over the previous benchmark of 81% and setting a new standard in automated mathematical problem-solving. These findings highlight the significant potential of advanced AI methodologies to address and overcome the challenges presented by some of the most complex mathematical courses and datasets.
- [8] arXiv:2509.16443 (cross-list from physics.app-ph) [pdf, html, other]
-
Title: LightCode: Compiling LLM Inference for Photonic-Electronic SystemsComments: 9 pages, 8 figuresSubjects: Applied Physics (physics.app-ph); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
The growing demand for low-latency, energy-efficient inference in large language models (LLMs) has catalyzed interest in heterogeneous architectures. While GPUs remain dominant, they are poorly suited for integration with emerging domain-specific accelerators like the Photonic Tensor Units (PTUs), which offer low-power, high-throughput linear computation. This motivates hybrid compilation strategies that combine photonic and electronic resources. We present LightCode, a compiler framework and simulator for mapping LLM inference workloads across hybrid photonic-electronic systems. LightCode introduces the Stacked Graph, an intermediate representation that encodes multiple hardware-specific realizations of each tensor operation. Hardware assignment is formulated as a constrained subgraph selection problem optimized for latency or energy under parametric cost models. We evaluate LightCode on the prefill stage of GPT-2 and Llama-7B showing that under our workload and hardware assumptions, (i) Photonic hardware reduced energy by up to 50% in our simulated workloads at maximum sequence length; (ii) multiplexing and assignment strategy yielded latency improvements exceeding 10x; and (iii) Optimizing for latency or energy resulted in distinct hardware mappings in our simulations. LightCode offers a module, foundational framework and simulator for compiling LLMs to emerging photonic accelerators.
- [9] arXiv:2509.16497 (cross-list from cs.ET) [pdf, html, other]
-
Title: PrediPrune: Reducing Verification Overhead in Souper with Machine Learning Driven PruningSubjects: Emerging Technologies (cs.ET); Programming Languages (cs.PL); Software Engineering (cs.SE)
Souper is a powerful enumerative superoptimizer that enhances the runtime performance of programs by optimizing LLVM intermediate representation (IR) code. However, its verification process, which relies on a computationally expensive SMT solver to validate optimization candidates, must explore a large search space. This large search space makes the verification process particularly expensive, increasing the burden to incorporate Souper into compilation tools. We propose PrediPrune, a stochastic candidate pruning strategy that effectively reduces the number of invalid candidates passed to the SMT solver. By utilizing machine learning techniques to predict the validity of candidates based on features extracted from the code, PrediPrune prunes unlikely candidates early, decreasing the verification workload. When combined with the state-of-the-art approach (Dataflow), PrediPrune decreases compilation time by 51% compared to the Baseline and by 12% compared to using only Dataflow, emphasizing the effectiveness of the combined approach that integrates a purely ML-based method (PrediPrune) with a purely non-ML based (Dataflow) method. Additionally, PrediPrune offers a flexible interface to trade-off compilation time and optimization opportunities, allowing end users to adjust the balance according to their needs.
- [10] arXiv:2509.17343 (cross-list from quant-ph) [pdf, other]
-
Title: Quantum Simulation Programming via TypingComments: Paper accepted to the Quantum Programming Languages (QPL) 2025 conference; available from: this https URLSubjects: Quantum Physics (quant-ph); Programming Languages (cs.PL)
Quantum simulations are designed to model quantum systems, and many compilation frameworks have been developed for executing such simulations on quantum computers. Most compilers leverage the capabilities of digital and analog quantum computers by representing quantum particle systems with Pauli strings or digital quantum circuits, making it challenging for users in physics, chemistry, and biology to program simulations effectively. QBLUE is proposed as the first programming language for describing the behaviors of quantum systems in terms of second quantization Hamiltonians. Within QBLUE, a novel type system is proposed to clearly define states across different quantum systems and treat quantum computers as quantum particle systems of specific types. The type system is compatible with the compilation of quantum simulations expressed in QBLUE for digital and analog quantum computers. With QBLUE, users can specify the desired quantum particle system and model the system on quantum computers.
Cross submissions (showing 7 of 7 entries)
- [11] arXiv:2502.12466 (replaced) [pdf, html, other]
-
Title: EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence CheckingAnjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex AikenSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)
As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a model's ability to reason about program semantics. EquiBench consists of 2400 program pairs across four languages and six categories. These pairs are generated through program analysis, compiler scheduling, and superoptimization, ensuring high-confidence labels, nontrivial difficulty, and full automation. We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline. Further analysis reveals that models often rely on syntactic similarity rather than exhibiting robust reasoning about program semantics, highlighting current limitations. Our code and dataset are publicly available at this https URL
- [12] arXiv:2504.10369 (replaced) [pdf, html, other]
-
Title: SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic ReasoningYiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, Ang LiComments: NeurIPS 2025Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
Optimizing Register Transfer Level (RTL) code is crucial for improving the power, performance, and area (PPA) of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework that seamlessly integrates LLM-based code rewriting with symbolic reasoning techniques. Our method incorporates a retrieval-augmented generation (RAG) system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, a fast verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively, compared to the state-of-the-art methods.