Search | arXiv e-print repository

An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache

Abstract: We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two identical, independent, single-issue in-order execution pipelines (hardware threads) which share a common memory sub-system (consisting of instruction and data cach… ▽ More We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two identical, independent, single-issue in-order execution pipelines (hardware threads) which share a common memory sub-system (consisting of instruction and data caches together with a memory management unit). From a design perspective, the assembly and verification of the dual threaded processor is simplified by the use of existing verified implementations of the execution pipeline and a memory unit. Because the memory unit is shared by the two hardware threads, the relative area overhead of adding the second hardware thread is 25\% of the area of the existing single threaded processor. Using an FPGA implementation we evaluate the performance of the dual threaded processor relative to the single threaded one. On applications which can be parallelized, we observe speedups of 1.6X to 1.88X. For applications that are not parallelizable, the speedup is more modest. We also observe that the dual threaded processor performance is degraded on applications which generate large numbers of cache misses. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2108.00444 [pdf, other]

An efficient reverse-lookup table based strategy for solving the synonym and cache coherence problem in virtually indexed, virtually tagged caches

Authors: Madhav P. Desai, Aniket Deshmukh

Abstract: Virtually indexed and virtually tagged (VIVT) caches are an attractive option for micro-processor level-1 caches, because of their fast response time and because they are cheaper to implement than more complex caches such as virtually-indexed physical-tagged (VIPT) caches. The level-1 VIVT cache becomes even simpler to construct if it is implemented as a direct-mapped cache (VIVT-DM cache). Howeve… ▽ More Virtually indexed and virtually tagged (VIVT) caches are an attractive option for micro-processor level-1 caches, because of their fast response time and because they are cheaper to implement than more complex caches such as virtually-indexed physical-tagged (VIPT) caches. The level-1 VIVT cache becomes even simpler to construct if it is implemented as a direct-mapped cache (VIVT-DM cache). However, VIVT and VIVT-DM caches have some drawbacks. When the number of sets in the cache is larger than the smallest page size, there is a possibility of synonyms (two or more virtual addresses mapped to the same physical address) existing in the cache. Further, maintenance of cache coherence across multiple processors requires a physical to virtual translation mechanism in the hardware. We describe a simple, efficient reverse lookup table based approach to address the synonym and the coherence problems in VIVT (both set associative and direct-mapped) caches. In particular, the proposed scheme does not disturb the critical memory access paths in a typical micro-processor, and requires a low overhead for its implementation. We have implemented and validated the scheme in the AJIT 32-bit microprocessor core (an implementation of the SPARC-V8 ISA) and the implementation uses approximately 2% of the gates and 5.3% of the memory bits in the processor core. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: 13 pages

arXiv:1706.03315 [pdf, other]

Neutron-induced strike: Study of multiple node charge collection in 14nm FinFETs

Authors: Nanditha P. Rao, Madhav P. Desai

Abstract: FinFETs have replaced the conventional bulk CMOS transistors in the sub-20nm technology. One of the key issues to consider is, the vulnerability of FinFET based circuits to multiple node charge collection due to neutron-induced strikes. In this paper, we perform a device simulation based characterization study on representative layouts of 14nm bulk FinFETs in order to study the extent to which mul… ▽ More FinFETs have replaced the conventional bulk CMOS transistors in the sub-20nm technology. One of the key issues to consider is, the vulnerability of FinFET based circuits to multiple node charge collection due to neutron-induced strikes. In this paper, we perform a device simulation based characterization study on representative layouts of 14nm bulk FinFETs in order to study the extent to which multiple transistors are affected. We find that multiple transistors do get affected and the impact can last up to five transistors away (~200nm). We show that the potential of source/drain regions in the neighborhood of the strike is a significant contributing factor. In the case of multi-fin FinFETs, the charge collected per fin is seen to reduce as the number of fins increase. Thus, smaller FinFETs are susceptible to high amounts of charge collection. △ Less

Submitted 11 June, 2017; originally announced June 2017.

Comments: 5 pages

arXiv:1612.08239 [pdf, ps, other]

Neutron induced strike: On the likelihood of multiple bit-flips in logic circuits

Authors: Nanditha P. Rao, Madhav P. Desai

Abstract: High energy particles from cosmic rays or packaging materials can generate a glitch or a current transient (single event transient or SET) in a logic circuit. This SET can eventually get captured in a register resulting in a flip of the register content, which is known as soft error or single-event upset (SEU). A soft error is typically modeled as a probabilistic single bit-flip model. In developi… ▽ More High energy particles from cosmic rays or packaging materials can generate a glitch or a current transient (single event transient or SET) in a logic circuit. This SET can eventually get captured in a register resulting in a flip of the register content, which is known as soft error or single-event upset (SEU). A soft error is typically modeled as a probabilistic single bit-flip model. In developing such abstract fault models, an important issue to consider is the likelihood of multiple bit errors caused by particle strikes. The fact that an SET causes multiple flips is noted in the literature. We perform a characterization study of the impact of an SET on a logic circuit to quantify the extent to which an SET can cause multiple bit flips. We use post-layout circuit simulations and Monte Carlo sampling scheme to get accurate bit-flip statistics. We perform our simulations on ISCAS'85, ISCAS'89 and ITC'99 benchmarks in 180nm and 65nm technologies. We find that a substantial fraction of SEU outcomes had multiple register flips. We futher analyse the individual contributions of the strike on a register and the strike on a logic gate, to multiple flips. We find that, amongst the erroneous outcomes, the probability of multiple bit-flips for 'gate-strike' cases was substantial and went up to 50%, where as those for 'register-strike' cases was just about 2%. This implies that, in principle, we can eliminate the flips due to register strikes using hardened flip-flop designs. However, in such designs, out of the remaining flips which will be due to gate strikes, a large fraction is likely to be multiple flips. △ Less

Submitted 15 June, 2017; v1 submitted 25 December, 2016; originally announced December 2016.

Comments: 9 pages

arXiv:1606.02900 [pdf, other]

On Continuous-space Embedding of Discrete-parameter Queueing Systems

Authors: Neha Karanjkar, Madhav P. Desai, Shalabh Bhatnagar

Abstract: Motivated by the problem of discrete-parameter simulation optimization (DPSO) of queueing systems, we consider the problem of embedding the discrete parameter space into a continuous one so that descent-based continuous-space methods could be directly applied for efficient optimization. We show that a randomization of the simulation model itself can be used to achieve such an embedding when the ob… ▽ More Motivated by the problem of discrete-parameter simulation optimization (DPSO) of queueing systems, we consider the problem of embedding the discrete parameter space into a continuous one so that descent-based continuous-space methods could be directly applied for efficient optimization. We show that a randomization of the simulation model itself can be used to achieve such an embedding when the objective function is a long-run average measure. Unlike spatial interpolation, the computational cost of this embedding is independent of the number of parameters in the system, making the approach ideally suited to high-dimensional problems. We describe in detail the application of this technique to discrete-time queues for embedding queue capacities, number of servers and server-delay parameters into continuous space and empirically show that the technique can produce smooth interpolations of the objective function. Through an optimization case-study of a queueing network with $10^7$ design points, we demonstrate that existing continuous optimizers can be effectively applied over such an embedding to find good solutions. △ Less

Submitted 12 February, 2018; v1 submitted 9 June, 2016; originally announced June 2016.

Comments: Submitted to a journal and is under review

arXiv:1411.2222 [pdf, other]

Optimization of Discrete-parameter Multiprocessor Systems using a Novel Ergodic Interpolation Technique

Authors: Neha V. Karanjkar, Madhav P. Desai

Abstract: Modern multi-core systems have a large number of design parameters, most of which are discrete-valued, and this number is likely to keep increasing as chip complexity rises. Further, the accurate evaluation of a potential design choice is computationally expensive because it requires detailed cycle-accurate system simulation. If the discrete parameter space can be embedded into a larger continuous… ▽ More Modern multi-core systems have a large number of design parameters, most of which are discrete-valued, and this number is likely to keep increasing as chip complexity rises. Further, the accurate evaluation of a potential design choice is computationally expensive because it requires detailed cycle-accurate system simulation. If the discrete parameter space can be embedded into a larger continuous parameter space, then continuous space techniques can, in principle, be applied to the system optimization problem. Such continuous space techniques often scale well with the number of parameters. We propose a novel technique for embedding the discrete parameter space into an extended continuous space so that continuous space techniques can be applied to the embedded problem using cycle accurate simulation for evaluating the objective function. This embedding is implemented using simulation-based ergodic interpolation, which, unlike spatial interpolation, produces the interpolated value within a single simulation run irrespective of the number of parameters. We have implemented this interpolation scheme in a cycle-based system simulator. In a characterization study, we observe that the interpolated performance curves are continuous, piece-wise smooth, and have low statistical error. We use the ergodic interpolation-based approach to solve a large multi-core design optimization problem with 31 design parameters. Our results indicate that continuous space optimization using ergodic interpolation-based embedding can be a viable approach for large multi-core design optimization problems. △ Less

Submitted 14 July, 2015; v1 submitted 9 November, 2014; originally announced November 2014.

Comments: A short version of this paper will be published in the proceedings of IEEE MASCOTS 2015 conference

arXiv:1401.1003 [pdf, other]

On the likelihood of multiple bit upsets in logic circuits

Authors: Nanditha P. Rao, Shahbaz Sarik, Madhav P. Desai

Abstract: Soft errors have a significant impact on the circuit reliability at nanoscale technologies. At the architectural level, soft errors are commonly modeled by a probabilistic bit-flip model. In developing such abstract fault models, an important issue to consider is the likelihood of multiple bit errors caused by particle strikes. This likelihood has been studied to a great extent in memories, but ha… ▽ More Soft errors have a significant impact on the circuit reliability at nanoscale technologies. At the architectural level, soft errors are commonly modeled by a probabilistic bit-flip model. In developing such abstract fault models, an important issue to consider is the likelihood of multiple bit errors caused by particle strikes. This likelihood has been studied to a great extent in memories, but has not been understood to the same extent in logic circuits. In this paper, we attempt to quantify the likelihood that a single transient event can cause multiple bit errors in logic circuits consisting of combinational gates and flip-flops. In particular, we calculate the conditional probability of multiple bit-flips given that a single bit flips as a result of the transient. To calculate this conditional probability, we use a Monte Carlo technique in which samples are generated using detailed post-layout circuit simulations. Our experiments on the ISCAS'85 benchmarks and a few other circuits indicate that, this conditional probability is quite significant and can be as high as 0.31. Thus we conclude that multiple bit-flips must necessarily be considered in order to obtain a realistic architectural fault model for soft errors. △ Less

Submitted 6 January, 2014; originally announced January 2014.

Comments: 6 pages

arXiv:1009.6046 [pdf, ps, other]

On Cycles in Random Graphs

Authors: Madhav P. Desai

Abstract: We consider the geometric random (GR) graph on the $d-$dimensional torus with the $L_σ$ distance measure ($1 \leq σ\leq \infty$). Our main result is an exact characterization of the probability that a particular labeled cycle exists in this random graph. For $σ= 2$ and $σ= \infty$, we use this characterization to derive a series which evaluates to the cycle probability. We thus obtain an exact for… ▽ More We consider the geometric random (GR) graph on the $d-$dimensional torus with the $L_σ$ distance measure ($1 \leq σ\leq \infty$). Our main result is an exact characterization of the probability that a particular labeled cycle exists in this random graph. For $σ= 2$ and $σ= \infty$, we use this characterization to derive a series which evaluates to the cycle probability. We thus obtain an exact formula for the expected number of Hamilton cycles in the random graph (when $σ= \infty$ and $σ= 2$). We also consider the adjacency matrix of the random graph and derive a recurrence relation for the expected values of the elementary symmetric functions evaluated on the eigenvalues (and thus the determinant) of the adjacency matrix, and a recurrence relation for the expected value of the permanent of the adjacency matrix. The cycle probability features prominently in these recurrence relations. We calculate these quantities for geometric random graphs (in the $σ= 2$ and $σ= \infty$ case) with up to $20$ vertices, and compare them with the corresponding quantities for the Erdös-Rényi (ER) random graph with the same edge probabilities. The calculations indicate that the threshold for rapid growth in the number of Hamilton cycles (as well as that for rapid growth in the permanent of the adjacency matrix) in the GR graph is lower than in the ER graph. However, as the number of vertices $n$ increases, the difference between the GR and ER thresholds reduces, and in both cases, the threshold $\sim \log(n)/n$. Also, we observe that the expected determinant can take very large values. This throws some light on the question of the maximal determinant of symmetric $0/1$ matrices. △ Less

Submitted 30 September, 2010; originally announced September 2010.

Comments: 17 pages, 4 figures

MSC Class: 05C80 (primary) 60B20 (secondary)

Showing 1–8 of 8 results for author: Desai, M P