Search | arXiv e-print repository

Scheduling Strategies for Partially-Replicable Task Chains on Two Types of Resources

Authors: Diane Orhan, Yacine Idouar, Laércio Lima Pilla, Adrien Cassagne, Denis Barthou, Christophe Jego

Abstract: The arrival of heterogeneous (or hybrid) multicore architectures on parallel platforms has brought new performance opportunities for applications and efficiency opportunities to systems. They have also increased the challenges related to thread scheduling, as tasks' execution times will vary depending if they are placed in big (performance) cores or little (efficient) ones. In this paper, we focus… ▽ More The arrival of heterogeneous (or hybrid) multicore architectures on parallel platforms has brought new performance opportunities for applications and efficiency opportunities to systems. They have also increased the challenges related to thread scheduling, as tasks' execution times will vary depending if they are placed in big (performance) cores or little (efficient) ones. In this paper, we focus on the challenges heterogeneous multicore problems bring to partially-replicable task chains, such as the ones that implement digital communication standards in Software-Defined Radio (SDR). Our objective is to maximize the throughput of these task chains while also minimizing their power consumption. We model this problem as a pipelined workflow scheduling problem using pipelined and replicated parallelism on two types of resources whose objectives are to minimize the period and to use as many little cores as necessary. We propose two greedy heuristics (FERTAC and 2CATAC) and one optimal dynamic programming (HeRAD) solution to the problem. We evaluate our solutions and compare the quality of their schedules (in period and resource utilization) and their execution times using synthetic task chains and an implementation of the DVB-S2 communication standard running on StreamPU. Our results demonstrate the benefits and drawbacks of the different proposed solutions. On average, FERTAC and 2CATAC achieve near-optimal solutions, with periods that are less than 10% worse than the optimal (HeRAD) using fewer than 2 extra cores. These three scheduling strategies now enable programmers and users of StreamPU to transparently make use of heterogeneous multicore processors and achieve throughputs that differ from their theoretical maximums by less than 8% on average. △ Less

Submitted 14 February, 2025; originally announced February 2025.

arXiv:2411.05433 [pdf, other]

Computing the Low-Weight codewords of Punctured and Shortened Pre-Transformed polar Codes

Authors: Malek Ellouze, Romain Tajan, Camille Leroux, Christophe Jégo, Charly Poulliat

Abstract: In this paper, we present a deterministic algorithm to count the low-weight codewords of punctured and shortened pure and pre-transformed polar codes. The method first evaluates the weight properties of punctured/shortened polar cosets. Then, a method that discards the cosets that have no impact on the computation of the low-weight codewords is introduced. A key advantage of this method is its app… ▽ More In this paper, we present a deterministic algorithm to count the low-weight codewords of punctured and shortened pure and pre-transformed polar codes. The method first evaluates the weight properties of punctured/shortened polar cosets. Then, a method that discards the cosets that have no impact on the computation of the low-weight codewords is introduced. A key advantage of this method is its applicability, regardless of the frozen bit set, puncturing/shortening pattern, or pretransformation. Results confirm the method's efficiency while showing reduced computational complexity compared to stateof-the-art algorithms. △ Less

Submitted 8 November, 2024; originally announced November 2024.

arXiv:2406.07934 [pdf, other]

Hardware Implementation of Soft Mapper/Demappers in Iterative EP-based Receivers

Authors: Ian Fischer Schilling, Serdar Sahin, Camille Leroux, Antonio Maria Cipriano, Christophe Jego

Abstract: This paper presents a comprehensive study and implementations onto FPGA device of an Expectation Propagation (EP)-based receiver for QPSK, 8-PSK, and 16-QAM. To the best of our knowledge, this is the first for this kind of receiver. The receiver implements a Frequency Domain (FD) Self-Iterated Linear Equalizer (SILE), where EP is used to approximate the true posterior distribution of the transmitt… ▽ More This paper presents a comprehensive study and implementations onto FPGA device of an Expectation Propagation (EP)-based receiver for QPSK, 8-PSK, and 16-QAM. To the best of our knowledge, this is the first for this kind of receiver. The receiver implements a Frequency Domain (FD) Self-Iterated Linear Equalizer (SILE), where EP is used to approximate the true posterior distribution of the transmitted symbols with a simpler distribution. Analytical approximations for the EP feedback generation process and the three constellations are applied to lessen the complexity of the soft mapper/demapper architectures. The simulation results demonstrate that the fixed-point version performs comparably to the floating-point. Moreover, implementation results show the efficiency in terms of FPGA resource usage of the proposed architecture. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2206.06147 [pdf, other]

A DSEL for High Throughput and Low Latency Software-Defined Radio on Multicore CPUs

Authors: Adrien Cassagne, Romain Tajan, Olivier Aumage, Camille Leroux, Denis Barthou, Christophe Jégo

Abstract: This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables th… ▽ More This article presents a new Domain Specific Embedded Language (DSEL) dedicated to Software-Defined Radio (SDR). From a set of carefully designed components, it enables to build efficient software digital communication systems, able to take advantage of the parallelism of modern processor architectures, in a straightforward and safe manner for the programmer. In particular, proposed DSEL enables the combination of pipelining and sequence duplication techniques to extract both temporal and spatial parallelism from digital communication systems. We leverage the DSEL capabilities on a real use case: a fully digital transceiver for the widely used DVB-S2 standard designed entirely in software. Through evaluation, we show how proposed software DVB-S2 transceiver is able to get the most from modern, high-end multicore CPU targets. △ Less

Submitted 3 August, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:1710.08314 [pdf, other]

Fast and Flexible Software Polar List Decoders

Authors: Mathieu Léonardon, Adrien Cassagne, Camille Leroux, Christophe Jégo, Louis-Philippe Hamelin, Yvon Savaria

Abstract: Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancella… ▽ More Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancellation List (SCL) decoder. A large set of parameters can be fine-tuned dynamically without re-compiling the software source code: the code length, the code rate, the frozen bits set, the puncturing patterns, the cyclic redundancy check, the list size, the type of decoding algorithm, the tree-pruning strategy and the data quantization. This generic and flexible SCL decoder enables to explore tradeoffs between throughput, latency and decoding performance. Several optimizations are proposed to achieve a competitive decoding speed despite the constraints induced by the genericity and the flexibility. The resulting polar list decoder is about 4 times faster than a generic software decoder and only 2 times slower than a non-flexible unrolled decoder. Thanks to the flexibility of the decoder, the fully adaptive SCL algorithm can be easily implemented and achieves higher throughput than any other similar decoder in the literature (up to 425 Mb/s on a single processor core for N = 2048 and K = 1723 at 4.5 dB). △ Less

Submitted 23 October, 2017; originally announced October 2017.

Comments: 11 pages, 7 figures, submitted to Springer Journal of Signal Processing Systems

arXiv:1310.1712 [pdf, other]

Partial Sums Computation In Polar Codes Decoding

Authors: Guillaume Berhault, Camille Leroux, Christophe Jego, Dominique Dallet

Abstract: Polar codes are the first error-correcting codes to provably achieve the channel capacity but with infinite codelengths. For finite codelengths the existing decoder architectures are limited in working frequency by the partial sums computation unit. We explain in this paper how the partial sums computation can be seen as a matrix multiplication. Then, an efficient hardware implementation of this p… ▽ More Polar codes are the first error-correcting codes to provably achieve the channel capacity but with infinite codelengths. For finite codelengths the existing decoder architectures are limited in working frequency by the partial sums computation unit. We explain in this paper how the partial sums computation can be seen as a matrix multiplication. Then, an efficient hardware implementation of this product is investigated. It has reduced logic resources and interconnections. Formalized architectures, to compute partial sums and to generate the bits of the generator matrix k^n, are presented. The proposed architecture allows removing the multiplexing resources used to assigned to each processing elements the required partial sums. △ Less

Submitted 9 January, 2015; v1 submitted 7 October, 2013; originally announced October 2013.

Comments: Accepted to ISCAS 2015

arXiv:1309.7818 [pdf, other]

doi 10.1109/SiPS.2013.6674541

Partial Sums Generation Architecture for Successive Cancellation Decoding of Polar Codes

Authors: Guillaume Berhault, Camille Leroux, Christophe Jego, Dominique Dallet

Abstract: Polar codes are a new family of error correction codes for which efficient hardware architectures have to be defined for the encoder and the decoder. Polar codes are decoded using the successive cancellation decoding algorithm that includes partial sums computations. We take advantage of the recursive structure of polar codes to introduce an efficient partial sums computation unit that can also im… ▽ More Polar codes are a new family of error correction codes for which efficient hardware architectures have to be defined for the encoder and the decoder. Polar codes are decoded using the successive cancellation decoding algorithm that includes partial sums computations. We take advantage of the recursive structure of polar codes to introduce an efficient partial sums computation unit that can also implements the encoder. The proposed architecture is synthesized for several codelengths in 65nm ASIC technology. The area of the resulting design is reduced up to 26% and the maximum working frequency is improved by ~25%. △ Less

Submitted 30 September, 2013; originally announced September 2013.

Comments: Submitted to IEEE Workshop on Signal Processing Systems (SiPS)(26 April 2012). Accepted (28 June 2013)

Showing 1–7 of 7 results for author: Jego, C