Search | arXiv e-print repository

doi 10.4204/EPTCS.402.7

Embedding Differential Dynamic Logic in PVS

Authors: J. Tanner Slagel, Mariano Moscato, Lauren White, César A. Muñoz, Swee Balachandran, Aaron Dutle

Abstract: Differential dynamic logic (dL) is a formal framework for specifying and reasoning about hybrid systems, i.e., dynamical systems that exhibit both continuous and discrete behaviors. These kinds of systems arise in many safety- and mission-critical applications. This paper presents a formalization of dL in the Prototype Verification System (PVS) that includes the semantics of hybrid programs and d… ▽ More Differential dynamic logic (dL) is a formal framework for specifying and reasoning about hybrid systems, i.e., dynamical systems that exhibit both continuous and discrete behaviors. These kinds of systems arise in many safety- and mission-critical applications. This paper presents a formalization of dL in the Prototype Verification System (PVS) that includes the semantics of hybrid programs and dL's proof calculus. The formalization embeds dL into the PVS logic, resulting in a version of dL whose proof calculus is not only formally verified, but is also available for the verification of hybrid programs within PVS itself. This embedding, called Plaidypvs (Properly Assured Implementation of dL for Hybrid Program Verification and Specification), supports standard dL style proofs, but further leverages the capabilities of PVS to allow reasoning about entire classes of hybrid programs. The embedding also allows the user to import the well-established definitions and mathematical theories available in PVS. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: In Proceedings LSFA/HCVS 2023, arXiv:2404.13672

ACM Class: F.3.1

Journal ref: EPTCS 402, 2024, pp. 43-62

arXiv:2209.00188 [pdf, other]

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Authors: Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

Abstract: Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average a… ▽ More Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes. △ Less

Submitted 30 September, 2022; v1 submitted 31 August, 2022; originally announced September 2022.

Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

ACM Class: B.3.2; C.0

arXiv:2012.03745 [pdf, other]

doi 10.4204/EPTCS.329.3

From Requirements to Autonomous Flight: An Overview of the Monitoring ICAROUS Project

Authors: Aaron Dutle, César Muñoz, Esther Conrad, Alwyn Goodloe, Laura Titolo, Ivan Perez, Swee Balachandran, Dimitra Giannakopoulou, Anastasia Mavridou, Thomas Pressburger

Abstract: The Independent Configurable Architecture for Reliable Operations of Unmanned Systems (ICAROUS) is a software architecture incorporating a set of algorithms to enable autonomous operations of unmanned aircraft applications. This paper provides an overview of Monitoring ICAROUS, a project whose objective is to provide a formal approach to generating runtime monitors for autonomous systems from requ… ▽ More The Independent Configurable Architecture for Reliable Operations of Unmanned Systems (ICAROUS) is a software architecture incorporating a set of algorithms to enable autonomous operations of unmanned aircraft applications. This paper provides an overview of Monitoring ICAROUS, a project whose objective is to provide a formal approach to generating runtime monitors for autonomous systems from requirements written in a structured natural language. This approach integrates FRET, a formal requirement elicitation and authoring tool, and Copilot, a runtime verification framework. FRET is used to specify formal requirements in structured natural language. These requirements are translated into temporal logic formulae. Copilot is then used to generate executable runtime monitors from these temporal logic specifications. The generated monitors are directly integrated into ICAROUS to perform runtime verification during flight. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: In Proceedings FMAS 2020, arXiv:2012.01176

Journal ref: EPTCS 329, 2020, pp. 23-30

arXiv:2011.11695 [pdf, other]

Proximu$: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute

Authors: Anant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om J. Omer, Avishaii Abuhatzera, Belliappa Kuttanna, Sreenivas Subramoney

Abstract: Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of utilities and services. CPUs continue to scale up their raw compute capabilities for DNN inference along with mature high performance libraries to extract optimal performance. While general purpose CPUs offer unique attractive advantages for DNN inference at both datacenter and edge, they have primarily e… ▽ More Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of utilities and services. CPUs continue to scale up their raw compute capabilities for DNN inference along with mature high performance libraries to extract optimal performance. While general purpose CPUs offer unique attractive advantages for DNN inference at both datacenter and edge, they have primarily evolved to optimize single thread performance. For highly parallel, throughput-oriented DNN inference, this results in inefficiencies in both power and performance, impacting both raw performance scaling and overall performance/watt. We present Proximu$\$$, where we systematically tackle the root inefficiencies in power and performance scaling for CPU DNN inference. Performance scales efficiently by distributing light-weight tensor compute near all caches in a multi-level cache hierarchy. This maximizes the cumulative utilization of the existing bandwidth resources in the system and minimizes movement of data. Power is drastically reduced through simple ISA extensions that encode the structured, loop-y workload behavior. This enables a bulk offload of pre-decoded work, with loop unrolling in the light-weight near-cache units, effectively bypassing the power-hungry stages of the wide Out-of-Order (OOO) CPU pipeline. Across a number of DNN models, Proximu$\$$ achieves a 2.3x increase in convolution performance/watt with a 2x to 3.94x scaling in raw performance. Similarly, Proximu$\$$ achieves a 1.8x increase in inner-product performance/watt with 2.8x scaling in performance. With no changes to the programming model, no increase in cache capacity or bandwidth and minimal additional hardware, Proximu$\$$ enables unprecedented CPU efficiency gains while achieving similar performance to state-of-the-art Domain Specific Accelerators (DSA) for DNN inference in this AI era. △ Less

Submitted 2 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: 18 pages, 21 figures

arXiv:1808.06074 [pdf, other]

Compiler Enhanced Scheduling for OpenMP for Heterogeneous Multiprocessors

Authors: Jyothi Krishna V S, Shankar Balachandran

Abstract: Scheduling in Asymmetric Multicore Processors (AMP), a special case of Heterogeneous Multiprocessors, is a widely studied topic. The scheduling techniques which are mostly runtime do not usually consider parallel programming pattern used in parallel programming frameworks like OpenMP. On the other hand, current compilers for these parallel programming platforms are hardware oblivious which prevent… ▽ More Scheduling in Asymmetric Multicore Processors (AMP), a special case of Heterogeneous Multiprocessors, is a widely studied topic. The scheduling techniques which are mostly runtime do not usually consider parallel programming pattern used in parallel programming frameworks like OpenMP. On the other hand, current compilers for these parallel programming platforms are hardware oblivious which prevent any compile-time optimization for platforms like big.LITTLE and has to completely rely on runtime optimization. In this paper, we propose a hardware-aware Compiler Enhanced Scheduling (CES) where the common compiler transformations are coupled with compiler added scheduling commands to take advantage of the hardware asymmetry and improve the runtime efficiency. We implement a compiler for OpenMP and demonstrate its efficiency in Samsung Exynos with big.LITTLE architecture. On an average, we see 18% reduction in runtime and 14% reduction in energy consumption in standard NPB and FSU benchmarks with CES across multiple frequencies and core configurations in big.LITTLE. △ Less

Submitted 18 August, 2018; originally announced August 2018.

Comments: 6 Pages, 4 figures, Presented in 2nd EEHCO (Energy Efficiency with Heterogenous Computing) Workshop in Prague 2016 (Part of HiPEAC event 2016)

arXiv:1112.1396 [pdf, ps, other]

A Study on Hierarchical Floorplans of Order k

Authors: Shankar Balachandran, Sajin Koroth

Abstract: A floorplan is a rectangular dissection which describes the relative placement of electronic modules on the chip. It is called a mosaic floorplan if there are no empty rooms or cross junctions in the rectangular dissection. We study a subclass of mosaic floorplans called hierarchical floorplans of order $k$ (abbreviated HFO-${k}$). A floorplan is HFO-$k$ if it can be obtained by starting with a si… ▽ More A floorplan is a rectangular dissection which describes the relative placement of electronic modules on the chip. It is called a mosaic floorplan if there are no empty rooms or cross junctions in the rectangular dissection. We study a subclass of mosaic floorplans called hierarchical floorplans of order $k$ (abbreviated HFO-${k}$). A floorplan is HFO-$k$ if it can be obtained by starting with a single rectangle and recursively embedding mosaic floorplans of at most $k$ rooms inside the rooms of intermediate floorplans. When $k=2$ this is exactly the class of slicing floorplans as the only distinct floorplans with two rooms are a room with a vertical slice and a room with a horizontal slice respectdeively. And embedding such a room is equivalent to slicing the parent room vertically/horizontally. In this paper we characterize permutations corresponding to the Abe-labeling of HFO-$k$ floorplans and also give an algorithm for identification of such permutations in linear time for any particular $k$. We give a recurrence relation for exact number of HFO-5 floorplans with $n$ rooms which can be easily extended to any $k$ also. Based on this recurrence we provide a polynomial time algorithm to generate the number of HFO-$k$ floorplans with $n$ rooms. Considering its application in VLSI design we also give moves on HFO-$k$ family of permutations for combinatorial optimization using simulated annealing etc. We also explore some interesting properties of Baxter permutations which have a bijective correspondence with mosaic floorplans. △ Less

Submitted 6 December, 2011; originally announced December 2011.

Comments: 33 pages, 13 figures

MSC Class: 68R05

arXiv:1112.1374 [pdf, other]

Sub-families of Baxter Permutations Based on Pattern Avoidance

Authors: Shankar Balachandran, Sajin Koroth

Abstract: Baxter permutations are a class of permutations which are in bijection with a class of floorplans that arise in chip design called mosaic floorplans. We study a subclass of mosaic floorplans called $HFO_k$ defined from mosaic floorplans by placing certain geometric restrictions. This naturally leads to studying a subclass of Baxter permutations. This subclass of Baxter permutations are characteriz… ▽ More Baxter permutations are a class of permutations which are in bijection with a class of floorplans that arise in chip design called mosaic floorplans. We study a subclass of mosaic floorplans called $HFO_k$ defined from mosaic floorplans by placing certain geometric restrictions. This naturally leads to studying a subclass of Baxter permutations. This subclass of Baxter permutations are characterized by pattern avoidance. We establish a bijection, between the subclass of floorplans we study and a subclass of Baxter permutations, based on the analogy between decomposition of a floorplan into smaller blocks and block decomposition of permutations. Apart from the characterization, we also answer combinatorial questions on these classes. We give an algebraic generating function (but without a closed form solution) for the number of permutations, an exponential lower bound on growth rate, and a linear time algorithm for deciding membership in each subclass. Based on the recurrence relation describing the class, we also give a polynomial time algorithm for enumeration. We finally prove that Baxter permutations are closed under inverse based on an argument inspired from the geometry of the corresponding mosaic floorplans. This proof also establishes that the subclass of Baxter permutations we study are also closed under inverse. Characterizing permutations instead of the corresponding floorplans can be helpful in reasoning about the solution space and in designing efficient algorithms for floorplanning. △ Less

Submitted 10 March, 2016; v1 submitted 6 December, 2011; originally announced December 2011.

MSC Class: 68R05

Showing 1–7 of 7 results for author: Balachandran, S