-
High performance and energy efficient inference for deep learning on ARM processors
Authors:
Adrián Castelló,
Sergio Barrachina,
Manuel F. Dolz,
Enrique S. Quintana-Ortí,
Pau San Juan
Abstract:
We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and d…
▽ More
We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARMs NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors. Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the processor power modes. The experiments with the new inference engine are reported for the ResNet50 v1.5 model on the ImageNet dataset from the MLPerf suite using the ARM v8.2 cores in the NVIDIA Jetson AGX Xavier board. These results show superior performance compared with the well-spread TFLite from Google and slightly inferior results when compared with ArmNN, the native library from ARM for DNN inference.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
HARE: Supporting efficient uplink multi-hop communications in self-organizing LPWANs
Authors:
Toni Adame,
Sergio Barrachina,
Boris Bellalta,
Albert Bel
Abstract:
The emergence of low-power wide area networks (LPWANs) as a new agent in the Internet of Things (IoT) will result in the incorporation into the digital world of low-automated processes from a wide variety of sectors. The single-hop conception of typical LPWAN deployments, though simple and robust, overlooks the self-organization capabilities of network devices, suffers from lack of scalability in…
▽ More
The emergence of low-power wide area networks (LPWANs) as a new agent in the Internet of Things (IoT) will result in the incorporation into the digital world of low-automated processes from a wide variety of sectors. The single-hop conception of typical LPWAN deployments, though simple and robust, overlooks the self-organization capabilities of network devices, suffers from lack of scalability in crowded scenarios, and pays little attention to energy consumption. Aimed to take the most out of devices' capabilities, the HARE protocol stack is proposed in this paper as a new LPWAN technology flexible enough to adopt uplink multi-hop communications when proving energetically more efficient. In this way, results from a real testbed show energy savings of up to 15% when using a multi-hop approach while keeping the same network reliability. System's self-organizing capability and resilience have been also validated after performing numerous iterations of the association mechanism and deliberately switching off network devices.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.
-
Multi-hop Communication in the Uplink for LPWANs
Authors:
Sergio Barrachina,
Boris Bellalta,
Toni Adame,
Albert Bel
Abstract:
Low-Power Wide Area Networks (LPWANs) have arisen as a promising communication technology for supporting Internet of Things (IoT) services due to their low power operation, wide coverage range, low cost and scalability. However, most LPWAN solutions like SIGFOX or LoRaWAN rely on star topology networks, where stations (STAs) transmit directly to the gateway (GW), which often leads to rapid battery…
▽ More
Low-Power Wide Area Networks (LPWANs) have arisen as a promising communication technology for supporting Internet of Things (IoT) services due to their low power operation, wide coverage range, low cost and scalability. However, most LPWAN solutions like SIGFOX or LoRaWAN rely on star topology networks, where stations (STAs) transmit directly to the gateway (GW), which often leads to rapid battery depletion in STAs located far from it. In this work, we analyze the impact on LPWANs energy consumption of multi-hop communication in the uplink, allowing STAs to transmit data packets in lower power levels and higher data rates to closer parent STAs, reducing their energy consumption consequently. To that aim, we introduce the Distance-Ring Exponential Stations Generator (DRESG) framework, designed to evaluate the performance of the so-called optimal-hop routing model, which establishes optimal routing connections in terms of energy efficiency, aiming to balance the consumption among all the STAs in the network. Results show that enabling such multi-hop connections entails higher network lifetimes, reducing significantly the bottleneck consumption in LPWANs with up to thousands of STAs. These results lead to foresee multi-hop communication in the uplink as a promising routing alternative for extending the lifetime of LPWAN deployments.
△ Less
Submitted 4 September, 2017; v1 submitted 26 November, 2016;
originally announced November 2016.
-
Concurrent and Accurate RNA Sequencing on Multicore Platforms
Authors:
Héctor Martínez,
Joaquín Tárraga,
Ignacio Medina,
Sergio Barrachina,
Maribel Castillo,
Joaquín Dopazo,
Enrique S. Quintana-Ortí
Abstract:
In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The a…
▽ More
In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The aligner is complemented with a careful strategy to detect splice junctions based on the division of RNA reads into short segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing useful information for the successful alignment of the complete reads.
Experimental results on platforms with AMD and Intel multicore processors report the remarkable parallel performance of HPG-Aligner, on short and long RNA reads, which excels in both execution time and sensitivity to an state-of-the-art aligner such as TopHat 2 built on top of Bowtie and Bowtie 2.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.