-
FuzzyLogic.jl: a Flexible Library for Efficient and Productive Fuzzy Inference
Authors:
Luca Ferranti,
Jani Boutellier
Abstract:
This paper introduces \textsc{FuzzyLogic.jl}, a Julia library to perform fuzzy inference. The library is fully open-source and released under a permissive license. The core design principles of the library are: user-friendliness, flexibility, efficiency and interoperability. Particularly, our library is easy to use, allows to specify fuzzy systems in an expressive yet concise domain specific langu…
▽ More
This paper introduces \textsc{FuzzyLogic.jl}, a Julia library to perform fuzzy inference. The library is fully open-source and released under a permissive license. The core design principles of the library are: user-friendliness, flexibility, efficiency and interoperability. Particularly, our library is easy to use, allows to specify fuzzy systems in an expressive yet concise domain specific language, has several visualization tools, supports popular inference systems like Mamdani, Sugeno and Type-2 systems, can be easily expanded with custom user settings or algorithms and can perform fuzzy inference efficiently. It also allows reading fuzzy models from other formats such as Matlab .fis, FCL or FML. In this paper, we describe the library main features and benchmark it with a few examples, showing it achieves significant speedup compared to the Matlab fuzzy toolbox.
△ Less
Submitted 17 June, 2023;
originally announced June 2023.
-
TBPos: Dataset for Large-Scale Precision Visual Localization
Authors:
Masud Fahim,
Ilona Söchting,
Luca Ferranti,
Juho Kannala,
Jani Boutellier
Abstract:
Image based localization is a classical computer vision challenge, with several well-known datasets. Generally, datasets consist of a visual 3D database that captures the modeled scenery, as well as query images whose 3D pose is to be discovered. Usually the query images have been acquired with a camera that differs from the imaging hardware used to collect the 3D database; consequently, it is har…
▽ More
Image based localization is a classical computer vision challenge, with several well-known datasets. Generally, datasets consist of a visual 3D database that captures the modeled scenery, as well as query images whose 3D pose is to be discovered. Usually the query images have been acquired with a camera that differs from the imaging hardware used to collect the 3D database; consequently, it is hard to acquire accurate ground truth poses between query images and the 3D database. As the accuracy of visual localization algorithms constantly improves, precise ground truth becomes increasingly important. This paper proposes TBPos, a novel large-scale visual dataset for image based positioning, which provides query images with fully accurate ground truth poses: both the database images and the query images have been derived from the same laser scanner data. In the experimental part of the paper, the proposed dataset is evaluated by means of an image-based localization pipeline.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization
Authors:
Masud An-Nur Islam Fahim,
Jani Boutellier
Abstract:
Methods for improving deep neural network training times and model generalizability consist of various data augmentation, regularization, and optimization approaches, which tend to be sensitive to hyperparameter settings and make reproducibility more challenging. This work jointly considers two recent training strategies that address model generalizability: sharpness-aware minimization, and self-d…
▽ More
Methods for improving deep neural network training times and model generalizability consist of various data augmentation, regularization, and optimization approaches, which tend to be sensitive to hyperparameter settings and make reproducibility more challenging. This work jointly considers two recent training strategies that address model generalizability: sharpness-aware minimization, and self-distillation, and proposes the novel training strategy of Sharpness-Aware Distilled Teachers (SADT). The experimental section of this work shows that SADT consistently outperforms previously published training strategies in model convergence time, test-time performance, and model generalizability over various neural architectures, datasets, and hyperparameter settings.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Fault-Tolerant Collaborative Inference through the Edge-PRUNE Framework
Authors:
Jani Boutellier,
Bo Tan,
Jari Nurmi
Abstract:
Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning fram…
▽ More
Collaborative inference has received significant research interest in machine learning as a vehicle for distributing computation load, reducing latency, as well as addressing privacy preservation in communications. Recent collaborative inference frameworks have adopted dynamic inference methodologies such as early-exit and run-time partitioning of neural networks. However, as machine learning frameworks scale in the number of inference inputs, e.g., in surveillance applications, fault tolerance related to device failure needs to be considered. This paper presents the Edge-PRUNE distributed computing framework, built on a formally defined model of computation, which provides a flexible infrastructure for fault tolerant collaborative inference. The experimental section of this work shows results on achievable inference time savings by collaborative inference, presents fault tolerant system topologies and analyzes their cost in terms of execution time overhead.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Multiple Offsets Multilateration: a new paradigm for sensor network calibration with unsynchronized reference nodes
Authors:
Luca Ferranti,
Kalle Åström,
Magnus Oskarsson,
Jani Boutellier,
Juho Kannala
Abstract:
Positioning using wave signal measurements is used in several applications, such as GPS systems, structure from sound and Wifi based positioning. Mathematically, such problems require the computation of the positions of receivers and/or transmitters as well as time offsets if the devices are unsynchronized. In this paper, we expand the previous state-of-the-art on positioning formulations by intro…
▽ More
Positioning using wave signal measurements is used in several applications, such as GPS systems, structure from sound and Wifi based positioning. Mathematically, such problems require the computation of the positions of receivers and/or transmitters as well as time offsets if the devices are unsynchronized. In this paper, we expand the previous state-of-the-art on positioning formulations by introducing Multiple Offsets Multilateration (MOM), a new mathematical framework to compute the receivers positions with pseudoranges from unsynchronized reference transmitters at known positions. This could be applied in several scenarios, for example structure from sound and positioning with LEO satellites. We mathematically describe MOM, determining how many receivers and transmitters are needed for the network to be solvable, a study on the number of possible distinct solutions is presented and stable solvers based on homotopy continuation are derived. The solvers are shown to be efficient and robust to noise both for synthetic and real audio data.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Edge-PRUNE: Flexible Distributed Deep Learning Inference
Authors:
Jani Boutellier,
Bo Tan,
Jari Nurmi
Abstract:
Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-w…
▽ More
Collaborative deep learning inference between low-resource endpoint devices and edge servers has received significant research interest in the last few years. Such computation partitioning can help reducing endpoint device energy consumption and improve latency, but equally importantly also contributes to privacy-preserving of sensitive data. This paper describes Edge-PRUNE, a flexible but light-weight computation framework for distributing machine learning inference between edge servers and one or more client devices. Compared to previous approaches, Edge-PRUNE is based on a formal dataflow computing model, and is agnostic towards machine learning training frameworks, offering at the same time wide support for leveraging deep learning accelerators such as embedded GPUs. The experimental section of the paper demonstrates the use and performance of Edge-PRUNE by image classification and object tracking applications on two heterogeneous endpoint devices and an edge server, over wireless and physical connections. Endpoint device inference time for SSD-Mobilenet based object tracking, for example, is accelerated 5.8x by collaborative inference.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Convolutional Neural Network-based Efficient Dense Point Cloud Generation using Unsigned Distance Fields
Authors:
Abol Basher,
Jani Boutellier
Abstract:
Dense point cloud generation from a sparse or incomplete point cloud is a crucial and challenging problem in 3D computer vision and computer graphics. So far, the existing methods are either computationally too expensive, suffer from limited resolution, or both. In addition, some methods are strictly limited to watertight surfaces -- another major obstacle for a number of applications. To address…
▽ More
Dense point cloud generation from a sparse or incomplete point cloud is a crucial and challenging problem in 3D computer vision and computer graphics. So far, the existing methods are either computationally too expensive, suffer from limited resolution, or both. In addition, some methods are strictly limited to watertight surfaces -- another major obstacle for a number of applications. To address these issues, we propose a lightweight Convolutional Neural Network that learns and predicts the unsigned distance field for arbitrary 3D shapes for dense point cloud generation using the recently emerged concept of implicit function learning. Experiments demonstrate that the proposed architecture outperforms the state of the art by 7.8x less model parameters, 2.4x faster inference time and up to 24.8% improved generation quality compared to the state-of-the-art.
△ Less
Submitted 13 March, 2024; v1 submitted 22 March, 2022;
originally announced March 2022.
-
LightSAL: Lightweight Sign Agnostic Learning for Implicit Surface Representation
Authors:
Abol Basher,
Muhammad Sarmad,
Jani Boutellier
Abstract:
Recently, several works have addressed modeling of 3D shapes using deep neural networks to learn implicit surface representations. Up to now, the majority of works have concentrated on reconstruction quality, paying little or no attention to model size or training time. This work proposes LightSAL, a novel deep convolutional architecture for learning 3D shapes; the proposed work concentrates on ef…
▽ More
Recently, several works have addressed modeling of 3D shapes using deep neural networks to learn implicit surface representations. Up to now, the majority of works have concentrated on reconstruction quality, paying little or no attention to model size or training time. This work proposes LightSAL, a novel deep convolutional architecture for learning 3D shapes; the proposed work concentrates on efficiency both in network training time and resulting model size. We build on the recent concept of Sign Agnostic Learning for training the proposed network, relying on signed distance fields, with unsigned distance as ground truth. In the experimental section of the paper, we demonstrate that the proposed architecture outperforms previous work in model size and number of required training iterations, while achieving equivalent accuracy. Experiments are based on the D-Faust dataset that contains 41k 3D scans of human shapes. The proposed model has been implemented in PyTorch.
△ Less
Submitted 9 September, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
-
Can You Trust Your Pose? Confidence Estimation in Visual Localization
Authors:
Luca Ferranti,
Xiaotian Li,
Jani Boutellier,
Juho Kannala
Abstract:
Camera pose estimation in large-scale environments is still an open question and, despite recent promising results, it may still fail in some situations. The research so far has focused on improving subcomponents of estimation pipelines, to achieve more accurate poses. However, there is no guarantee for the result to be correct, even though the correctness of pose estimation is critically importan…
▽ More
Camera pose estimation in large-scale environments is still an open question and, despite recent promising results, it may still fail in some situations. The research so far has focused on improving subcomponents of estimation pipelines, to achieve more accurate poses. However, there is no guarantee for the result to be correct, even though the correctness of pose estimation is critically important in several visual localization applications,such as in autonomous navigation. In this paper we bring to attention a novel research question, pose confidence estimation,where we aim at quantifying how reliable the visually estimated pose is. We develop a novel confidence measure to fulfil this task and show that it can be flexibly applied to different datasets,indoor or outdoor, and for various visual localization pipelines.We also show that the proposed techniques can be used to accomplish a secondary goal: improving the accuracy of existing pose estimation pipelines. Finally, the proposed approach is computationally light-weight and adds only a negligible increase to the computational effort of pose estimation.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Sensor Networks TDOA Self-Calibration: 2D Complexity Analysis and Solutions
Authors:
Luca Ferranti,
Kalle Åström,
Magnus Oskarsson,
Jani Boutellier,
Juho Kannala
Abstract:
Given a network of receivers and transmitters, the process of determining their positions from measured pseudoranges is known as network self-calibration. In this paper we consider 2D networks with synchronized receivers but unsynchronized transmitters and the corresponding calibration techniques, known as Time-Difference-Of-Arrival (TDOA) techniques. Despite previous work, TDOA self-calibration i…
▽ More
Given a network of receivers and transmitters, the process of determining their positions from measured pseudoranges is known as network self-calibration. In this paper we consider 2D networks with synchronized receivers but unsynchronized transmitters and the corresponding calibration techniques, known as Time-Difference-Of-Arrival (TDOA) techniques. Despite previous work, TDOA self-calibration is computationally challenging. Iterative algorithms are very sensitive to the initialization, causing convergence issues. In this paper, we present a novel approach, which gives an algebraic solution to two previously unsolved scenarios. We also demonstrate that our solvers produce an excellent initial value for non-linear optimisation algorithms, leading to a full pipeline robust to noise.
△ Less
Submitted 22 October, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Binarized Convolutional Neural Networks for Efficient Inference on GPUs
Authors:
Mir Khan,
Heikki Huttunen,
Jani Boutellier
Abstract:
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices difficult. In this paper convolutional neural network binarization is implemented on GPU-based platforms for real-time inference on resource constrained devices…
▽ More
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices difficult. In this paper convolutional neural network binarization is implemented on GPU-based platforms for real-time inference on resource constrained devices. In binarized networks, all weights and intermediate computations between layers are quantized to +1 and -1, allowing multiplications and additions to be replaced with bit-wise operations between 32-bit words. This representation completely eliminates the need for floating point multiplications and additions and decreases both the computational load and the memory footprint compared to a full-precision network implemented in floating point, making it well-suited for resource-constrained environments. We compare the performance of our implementation with an equivalent floating point implementation on one desktop and two embedded GPU platforms. Our implementation achieves a maximum speed up of 7. 4X with only 4.4% loss in accuracy compared to a reference implementation.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Embedded Implementation of a Deep Learning Smile Detector
Authors:
Pedram Ghazi,
Antti P. Happonen,
Jani Boutellier,
Heikki Huttunen
Abstract:
In this paper we study the real time deployment of deep learning algorithms in low resource computational environments. As the use case, we compare the accuracy and speed of neural networks for smile detection using different neural network architectures and their system level implementation on NVidia Jetson embedded platform. We also propose an asynchronous multithreading scheme for parallelizing…
▽ More
In this paper we study the real time deployment of deep learning algorithms in low resource computational environments. As the use case, we compare the accuracy and speed of neural networks for smile detection using different neural network architectures and their system level implementation on NVidia Jetson embedded platform. We also propose an asynchronous multithreading scheme for parallelizing the pipeline. Within this framework, we experimentally compare thirteen widely used network topologies. The experiments show that low complexity architectures can achieve almost equal performance as larger ones, with a fraction of computation required.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms
Authors:
Jani Boutellier,
Jiahao Wu,
Heikki Huttunen,
Shuvra S. Bhattacharyya
Abstract:
The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform.
This article prese…
▽ More
The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform.
This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory.
The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters
Authors:
Kaipeng Li,
Amanullah Ghazi,
Chance Tarver,
Jani Boutellier,
Mahmoud Abdelaziz,
Lauri Anttila,
Markku Juntti,
Mikko Valkama,
Joseph R. Cavallaro
Abstract:
Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitte…
▽ More
Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.
△ Less
Submitted 28 December, 2016;
originally announced December 2016.
-
Executing Dynamic Data Rate Actor Networks on OpenCL Platforms
Authors:
Jani Boutellier,
Ilkka Hautala
Abstract:
Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was very tedious and simultaneous use of all available GPP and GPU resources required low-level programming to ensure efficient synchronization and data transfer betw…
▽ More
Heterogeneous computing platforms consisting of general purpose processors (GPPs) and graphics processing units (GPUs) have become commonplace in personal mobile devices and embedded systems. For years, programming of these platforms was very tedious and simultaneous use of all available GPP and GPU resources required low-level programming to ensure efficient synchronization and data transfer between processors. However, in the last few years several high-level programming frameworks have emerged, which enable programmers to describe applications by means of abstractions such as dataflow or Kahn process networks and leave parallel execution, data transfer and synchronization to be handled by the framework.
Unfortunately, even the most advanced high-level programming frameworks have had shortcomings that limit their applicability to certain classes of applications. This paper presents a new, dataflow-flavored programming framework targeting heterogeneous platforms, and differs from previous approaches by allowing GPU-mapped actors to have data dependent consumption of inputs / production of outputs. Such flexibility is essential for configurable and adaptive applications that are becoming increasingly common in signal processing. In our experiments it is shown that this feature allows up to 5x increase in application throughput.
The proposed framework is validated by application examples from the video processing and wireless communications domains. In the experiments the framework is compared to a well-known reference framework and it is shown that the proposed framework enables both a higher degree of flexibility and better throughput.
△ Less
Submitted 10 November, 2016;
originally announced November 2016.