-
Physical and Software Based Fault Injection Attacks Against TEEs in Mobile Devices: A Systemisation of Knowledge
Authors:
Aaron Joy,
Ben Soh,
Zhi Zhang,
Sri Parameswaran,
Darshana Jayasinghe
Abstract:
Trusted Execution Environments (TEEs) are critical components of modern secure computing, providing isolated zones in processors to safeguard sensitive data and execute secure operations. Despite their importance, TEEs are increasingly vulnerable to fault injection (FI) attacks, including both physical methods, such as Electromagnetic Fault Injection (EMFI), and software-based techniques. This sur…
▽ More
Trusted Execution Environments (TEEs) are critical components of modern secure computing, providing isolated zones in processors to safeguard sensitive data and execute secure operations. Despite their importance, TEEs are increasingly vulnerable to fault injection (FI) attacks, including both physical methods, such as Electromagnetic Fault Injection (EMFI), and software-based techniques. This survey examines these FI methodologies, exploring their ability to disrupt TEE operations and expose vulnerabilities in devices ranging from smartphones and IoT systems to cloud platforms.
The study highlights the evolution and effectiveness of non-invasive techniques, such as EMFI, which induce faults through electromagnetic disturbances without physical modifications to hardware, making them harder to detect and mitigate. Real-world case studies illustrate the significant risks posed by these attacks, including unauthorised access, privilege escalation, and data corruption. In addition, the survey identifies gaps in existing TEE security architectures and emphasises the need for enhanced countermeasures, such as dynamic anomaly detection and updated threat models.
The findings underline the importance of interdisciplinary collaboration to address these vulnerabilities, involving researchers, manufacturers, and policymakers. This survey provides actionable insights and recommendations to guide the development of more robust TEE architectures in mobile devices, fortify FI resilience, and shape global security standards. By advancing TEE security, this research aims to protect critical digital infrastructure and maintain trust in secure computing systems worldwide.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
Investigation of moving objects through atmospheric turbulence from a non-stationary platform
Authors:
Nicholas Ferrante,
Jerome Gilles,
Shibin Parameswaran
Abstract:
In this work, we extract the optical flow field corresponding to moving objects from an image sequence of a scene impacted by atmospheric turbulence \emph{and} captured from a moving camera. Our procedure first computes the optical flow field and creates a motion model to compensate for the flow field induced by camera motion. After subtracting the motion model from the optical flow, we proceed wi…
▽ More
In this work, we extract the optical flow field corresponding to moving objects from an image sequence of a scene impacted by atmospheric turbulence \emph{and} captured from a moving camera. Our procedure first computes the optical flow field and creates a motion model to compensate for the flow field induced by camera motion. After subtracting the motion model from the optical flow, we proceed with our previous work, Gilles et al~\cite{gilles2018detection}, where a spatial-temporal cartoon+texture inspired decomposition is performed on the motion-compensated flow field in order to separate flows corresponding to atmospheric turbulence and object motion. Finally, the geometric component is processed with the detection and tracking method and is compared against a ground truth. All of the sequences and code used in this work are open source and are available by contacting the authors.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Out-of-Distribution Learning with Human Feedback
Authors:
Haoyue Bai,
Xuefeng Du,
Katie Rainey,
Shibin Parameswaran,
Yixuan Li
Abstract:
Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of OOD generalization and OOD detection in real-world deployment environments. This paper presents a novel framework for OOD learning with human feedback, which can provide invaluable insights into t…
▽ More
Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of OOD generalization and OOD detection in real-world deployment environments. This paper presents a novel framework for OOD learning with human feedback, which can provide invaluable insights into the nature of OOD shifts and guide effective model adaptation. Our framework capitalizes on the freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. To harness such data, our key idea is to selectively provide human feedback and label a small number of informative samples from the wild data distribution, which are then used to train a multi-class classifier and an OOD detector. By exploiting human feedback, we enhance the robustness and reliability of machine learning models, equipping them with the capability to handle OOD scenarios with greater precision. We provide theoretical insights on the generalization error bounds to justify our algorithm. Extensive experiments show the superiority of our method, outperforming the current state-of-the-art by a significant margin.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
An Automated Validation Framework for Power Management and Data Retention Logic Kits of Standard Cell Library
Authors:
Akshay Karkal Kamath,
Bharath Kumar,
Sunil Aggarwal,
Subramanian Parameswaran,
Parag Lonkar,
Debi Prasanna,
Somasunder Sreenath
Abstract:
The development of a standard cell library involves characterization of a number of gate-level circuits at various cell-level abstractions. Verifying the behavior of these cells largely depends on the manual skills of the circuit designers. Especially challenging are the power management and data retention cells which must be checked thoroughly for voltage and power configurations in addition to t…
▽ More
The development of a standard cell library involves characterization of a number of gate-level circuits at various cell-level abstractions. Verifying the behavior of these cells largely depends on the manual skills of the circuit designers. Especially challenging are the power management and data retention cells which must be checked thoroughly for voltage and power configurations in addition to their logic functionality. Also, when standard cells are extracted into various models, any inconsistencies in these models typically goes unchecked during library development. Thus, validating these cells exhaustively prior to customer delivery is highly advantageous to not only improve customer satisfaction but also to reduce design costs. We address this challenge by presenting a methodology to validate the power management and data retention cells that are used in the logical design flow of low-power chips. For a quick adoption by standard cell library design teams, the framework is fully automated and runs out-of-the-box. The proposed framework has been implemented and deployed within the Samsung Foundry ecosystem to enhance the overall quality of library design kit deliverables.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Efficient Real-Time Selective Genome Sequencing on Resource-Constrained Devices
Authors:
Po Jui Shih,
Hassaan Saadat,
Sri Parameswaran,
Hasindu Gamaarachchi
Abstract:
Third-generation nanopore sequencers offer a feature called selective sequencing or 'Read Until' that allows genomic reads to be analyzed in real-time and abandoned halfway, if not belonging to a genomic region of 'interest'. This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selecti…
▽ More
Third-generation nanopore sequencers offer a feature called selective sequencing or 'Read Until' that allows genomic reads to be analyzed in real-time and abandoned halfway, if not belonging to a genomic region of 'interest'. This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ subsequence Dynamic Time Warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone-sized MinION sequencer. In this paper, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware-software co-design-based method that exploits a low-cost and portable heterogeneous MPSoC platform with on-chip FPGA to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5X faster than a highly optimized multi-threaded software version (around 85X faster than the existing unoptimized multi-threaded software) running on a sophisticated server with 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is two orders of magnitudes lower than the same application executing on the 36-core server. Source code for HARU sDTW module is available as open-source at https://github.com/beebdev/HARU and an example application that utilises HARU is at https://github.com/beebdev/sigfish-haru.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference
Authors:
Jing Gong,
Hassaan Saadat,
Hasindu Gamaarachchi,
Haris Javaid,
Xiaobo Sharon Hu,
Sri Parameswaran
Abstract:
Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient acc…
▽ More
Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.
△ Less
Submitted 23 September, 2022; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Fast Selective Flushing to Mitigate Contention-based Cache Timing Attacks
Authors:
Tuo Li,
Sri Parameswaran
Abstract:
Caches are widely used to improve performance in modern processors. By carefully evicting cache lines and identifying cache hit/miss time, contention-based cache timing channel attacks can be orchestrated to leak information from the victim process. Existing hardware countermeasures explored cache partitioning and randomization, are either costly, not applicable for the L1 data cache, or are vulne…
▽ More
Caches are widely used to improve performance in modern processors. By carefully evicting cache lines and identifying cache hit/miss time, contention-based cache timing channel attacks can be orchestrated to leak information from the victim process. Existing hardware countermeasures explored cache partitioning and randomization, are either costly, not applicable for the L1 data cache, or are vulnerable to sophisticated attacks. Countermeasures using cache flush exist but are slow since all cache lines have to be evacuated during a cache flush. In this paper, we propose for the first time a hardware/software flush-based countermeasure, called fast selective flushing (FaSe). By utilizing an ISA extension (one flush instruction) and cache modification (additional state bits and control logic), FaSe selectively flushes cache lines and provides a mitigation method with a similar effect to existing methods using naive flushing methods. FaSe is implemented on RISC-V Rocket Core/Chip and evaluated on Xilinx FPGA running user programs and the Linux operating system. Our experimental results show that FaSe reduces time overhead significantly by 36% for user programs and 42% for the operating system compared to the methods with naive flushing, with less than 1% hardware overhead. Our security test shows FaSe is capable of mitigating target cache timing attacks.
△ Less
Submitted 22 April, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Real-Time Event-Based Tracking and Detection for Maritime Environments
Authors:
Stephanie Aelmore,
Richard C. Ordonez,
Shibin Parameswaran,
Justin Mauger
Abstract:
Event cameras are ideal for object tracking applications due to their ability to capture fast-moving objects while mitigating latency and data redundancy. Existing event-based clustering and feature tracking approaches for surveillance and object detection work well in the majority of cases, but fall short in a maritime environment. Our application of maritime vessel detection and tracking require…
▽ More
Event cameras are ideal for object tracking applications due to their ability to capture fast-moving objects while mitigating latency and data redundancy. Existing event-based clustering and feature tracking approaches for surveillance and object detection work well in the majority of cases, but fall short in a maritime environment. Our application of maritime vessel detection and tracking requires a process that can identify features and output a confidence score representing the likelihood that the feature was produced by a vessel, which may trigger a subsequent alert or activate a classification system. However, the maritime environment presents unique challenges such as the tendency of waves to produce the majority of events, demanding the majority of computational processing and producing false positive detections. By filtering redundant events and analyzing the movement of each event cluster, we can identify and track vessels while ignoring shorter lived and erratic features such as those produced by waves.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Robust 3D Garment Digitization from Monocular 2D Images for 3D Virtual Try-On Systems
Authors:
Sahib Majithia,
Sandeep N. Parameswaran,
Sadbhavana Babar,
Vikram Garg,
Astitva Srivastava,
Avinash Sharma
Abstract:
In this paper, we develop a robust 3D garment digitization solution that can generalize well on real-world fashion catalog images with cloth texture occlusions and large body pose variations. We assumed fixed topology parametric template mesh models for known types of garments (e.g., T-shirts, Trousers) and perform mapping of high-quality texture from an input catalog image to UV map panels corres…
▽ More
In this paper, we develop a robust 3D garment digitization solution that can generalize well on real-world fashion catalog images with cloth texture occlusions and large body pose variations. We assumed fixed topology parametric template mesh models for known types of garments (e.g., T-shirts, Trousers) and perform mapping of high-quality texture from an input catalog image to UV map panels corresponding to the parametric mesh model of the garment. We achieve this by first predicting a sparse set of 2D landmarks on the boundary of the garments. Subsequently, we use these landmarks to perform Thin-Plate-Spline-based texture transfer on UV map panels. Subsequently, we employ a deep texture inpainting network to fill the large holes (due to view variations & self-occlusions) in TPS output to generate consistent UV maps. Furthermore, to train the supervised deep networks for landmark prediction & texture inpainting tasks, we generated a large set of synthetic data with varying texture and lighting imaged from various views with the human present in a wide variety of poses. Additionally, we manually annotated a small set of fashion catalog images crawled from online fashion e-commerce platforms to finetune. We conduct thorough empirical evaluations and show impressive qualitative results of our proposed 3D garment texture solution on fashion catalog images. Such 3D garment digitization helps us solve the challenging task of enabling 3D Virtual Try-on.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Online Lifelong Generalized Zero-Shot Learning
Authors:
Chandan Gautam,
Sethupathy Parameswaran,
Ashish Mishra,
Suresh Sundaram
Abstract:
Methods proposed in the literature for zero-shot learning (ZSL) are typically suitable for offline learning and cannot continually learn from sequential streaming data. The sequential data comes in the form of tasks during training. Recently, a few attempts have been made to handle this issue and develop continual ZSL (CZSL) methods. However, these CZSL methods require clear task-boundary informat…
▽ More
Methods proposed in the literature for zero-shot learning (ZSL) are typically suitable for offline learning and cannot continually learn from sequential streaming data. The sequential data comes in the form of tasks during training. Recently, a few attempts have been made to handle this issue and develop continual ZSL (CZSL) methods. However, these CZSL methods require clear task-boundary information between the tasks during training, which is not practically possible. This paper proposes a task-free (i.e., task-agnostic) CZSL method, which does not require any task information during continual learning. The proposed task-free CZSL method employs a variational autoencoder (VAE) for performing ZSL. To develop the CZSL method, we combine the concept of experience replay with knowledge distillation and regularization. Here, knowledge distillation is performed using the training sample's dark knowledge, which essentially helps overcome the catastrophic forgetting issue. Further, it is enabled for task-free learning using short-term memory. Finally, a classifier is trained on the synthetic features generated at the latent space of the VAE. Moreover, the experiments are conducted in a challenging and practical ZSL setup, i.e., generalized ZSL (GZSL). These experiments are conducted for two kinds of single-head continual learning settings: (i) mild setting-: task-boundary is known only during training but not during testing; (ii) strict setting-: task-boundary is not known at training, as well as testing. Experimental results on five benchmark datasets exhibit the validity of the approach for CZSL.
△ Less
Submitted 21 March, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Generative Replay-based Continual Zero-Shot Learning
Authors:
Chandan Gautam,
Sethupathy Parameswaran,
Ashish Mishra,
Suresh Sundaram
Abstract:
Zero-shot learning is a new paradigm to classify objects from classes that are not available at training time. Zero-shot learning (ZSL) methods have attracted considerable attention in recent years because of their ability to classify unseen/novel class examples. Most of the existing approaches on ZSL works when all the samples from seen classes are available to train the model, which does not sui…
▽ More
Zero-shot learning is a new paradigm to classify objects from classes that are not available at training time. Zero-shot learning (ZSL) methods have attracted considerable attention in recent years because of their ability to classify unseen/novel class examples. Most of the existing approaches on ZSL works when all the samples from seen classes are available to train the model, which does not suit real life. In this paper, we tackle this hindrance by developing a generative replay-based continual ZSL (GRCZSL). The proposed method endows traditional ZSL to learn from streaming data and acquire new knowledge without forgetting the previous tasks' gained experience. We handle catastrophic forgetting in GRCZSL by replaying the synthetic samples of seen classes, which have appeared in the earlier tasks. These synthetic samples are synthesized using the trained conditional variational autoencoder (VAE) over the immediate past task. Moreover, we only require the current and immediate previous VAE at any time for training and testing. The proposed GRZSL method is developed for a single-head setting of continual learning, simulating a real-world problem setting. In this setting, task identity is given during training but unavailable during testing. GRCZSL performance is evaluated on five benchmark datasets for the generalized setup of ZSL with fixed and dynamic (incremental class) settings of continual learning. The existing class setting presented recently in the literature is not suitable for a class-incremental setting. Therefore, this paper proposes a new setting to address this issue. Experimental results show that the proposed method significantly outperforms the baseline and the state-of-the-art method and makes it more suitable for real-world applications.
△ Less
Submitted 6 June, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
SIMF: Single-Instruction Multiple-Flush Mechanism for Processor Temporal Isolation
Authors:
Tuo Li,
Bradley Hopkins,
Sri Parameswaran
Abstract:
Microarchitectural timing attacks are a type of information leakage attack, which exploit the time-shared microarchitectural components, such as caches, translation look-aside buffers (TLBs), branch prediction unit (BPU), and speculative execution, in modern processors to leak critical information from a victim process or thread. To mitigate such attacks, the mechanism for flushing the on-core sta…
▽ More
Microarchitectural timing attacks are a type of information leakage attack, which exploit the time-shared microarchitectural components, such as caches, translation look-aside buffers (TLBs), branch prediction unit (BPU), and speculative execution, in modern processors to leak critical information from a victim process or thread. To mitigate such attacks, the mechanism for flushing the on-core state is extensively used by operating-system-level solutions, since on-core state is too expensive to partition. In these systems, the flushing operations are implemented in software (using cache maintenance instructions), which severely limit the efficiency of timing attack protection.
To bridge this gap, we propose specialized hardware support, a single-instruction multiple-flush (SIMF) mechanism to flush the core-level state, which consists of L1 caches, BPU, TLBs, and register file. We demonstrate SIMF by implementing it as an ISA extension, i.e., flushx instruction, in scalar in-order RISC-V processor. The resultant processor is prototyped on Xilinx ZCU102 FPGA and validated with state-of-art seL4 microkernel, Linux kernel in multi-core scenarios, and a cache timing attack. Our evaluation shows that SIMF significantly alleviates the overhead of flushing by more than a factor of two in execution time and reduces dynamic instruction count by orders-of-magnitude.
△ Less
Submitted 13 April, 2022; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Generalized Continual Zero-Shot Learning
Authors:
Chandan Gautam,
Sethupathy Parameswaran,
Ashish Mishra,
Suresh Sundaram
Abstract:
Recently, zero-shot learning (ZSL) emerged as an exciting topic and attracted a lot of attention. ZSL aims to classify unseen classes by transferring the knowledge from seen classes to unseen classes based on the class description. Despite showing promising performance, ZSL approaches assume that the training samples from all seen classes are available during the training, which is practically not…
▽ More
Recently, zero-shot learning (ZSL) emerged as an exciting topic and attracted a lot of attention. ZSL aims to classify unseen classes by transferring the knowledge from seen classes to unseen classes based on the class description. Despite showing promising performance, ZSL approaches assume that the training samples from all seen classes are available during the training, which is practically not feasible. To address this issue, we propose a more generalized and practical setup for ZSL, i.e., continual ZSL (CZSL), where classes arrive sequentially in the form of a task and it actively learns from the changing environment by leveraging the past experience. Further, to enhance the reliability, we develop CZSL for a single head continual learning setting where task identity is revealed during the training process but not during the testing. To avoid catastrophic forgetting and intransigence, we use knowledge distillation and storing and replay the few samples from previous tasks using a small episodic memory. We develop baselines and evaluate generalized CZSL on five ZSL benchmark datasets for two different settings of continual learning: with and without class incremental. Moreover, CZSL is developed for two types of variational autoencoders, which generates two types of features for classification: (i) generated features at output space and (ii) generated discriminative features at the latent space. The experimental results clearly indicate the single head CZSL is more generalizable and suitable for practical applications.
△ Less
Submitted 31 January, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network (SRCNN)
Authors:
Chris M. Ward,
Josh Harguess,
Brendan Crabb,
Shibin Parameswaran
Abstract:
Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to understanding the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/re…
▽ More
Traditional metrics for evaluating the efficacy of image processing techniques do not lend themselves to understanding the capabilities and limitations of modern image processing methods - particularly those enabled by deep learning. When applying image processing in engineering solutions, a scientist or engineer has a need to justify their design decisions with clear metrics. By applying blind/referenceless image spatial quality (BRISQUE), Structural SIMilarity (SSIM) index scores, and Peak signal-to-noise ratio (PSNR) to images before and after image processing, we can quantify quality improvements in a meaningful way and determine the lowest recoverable image quality for a given method.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
Image denoising with generalized Gaussian mixture model patch priors
Authors:
Charles-Alban Deledalle,
Shibin Parameswaran,
Truong Q. Nguyen
Abstract:
Patch priors have become an important component of image restoration. A powerful approach in this category of restoration algorithms is the popular Expected Patch Log-Likelihood (EPLL) algorithm. EPLL uses a Gaussian mixture model (GMM) prior learned on clean image patches as a way to regularize degraded patches. In this paper, we show that a generalized Gaussian mixture model (GGMM) captures the…
▽ More
Patch priors have become an important component of image restoration. A powerful approach in this category of restoration algorithms is the popular Expected Patch Log-Likelihood (EPLL) algorithm. EPLL uses a Gaussian mixture model (GMM) prior learned on clean image patches as a way to regularize degraded patches. In this paper, we show that a generalized Gaussian mixture model (GGMM) captures the underlying distribution of patches better than a GMM. Even though GGMM is a powerful prior to combine with EPLL, the non-Gaussianity of its components presents major challenges to be applied to a computationally intensive process of image restoration. Specifically, each patch has to undergo a patch classification step and a shrinkage step. These two steps can be efficiently solved with a GMM prior but are computationally impractical when using a GGMM prior. In this paper, we provide approximations and computational recipes for fast evaluation of these two steps, so that EPLL can embed a GGMM prior on an image with more than tens of thousands of patches. Our main contribution is to analyze the accuracy of our approximations based on thorough theoretical analysis. Our evaluations indicate that the GGMM prior is consistently a better fit formodeling image patch distribution and performs better on average in image denoising task.
△ Less
Submitted 11 June, 2018; v1 submitted 5 February, 2018;
originally announced February 2018.
-
Accelerating GMM-based patch priors for image restoration: Three ingredients for a 100$\times$ speed-up
Authors:
Shibin Parameswaran,
Charles-Alban Deledalle,
Loïc Denis,
Truong Q. Nguyen
Abstract:
Image restoration methods aim to recover the underlying clean image from corrupted observations. The Expected Patch Log-likelihood (EPLL) algorithm is a powerful image restoration method that uses a Gaussian mixture model (GMM) prior on the patches of natural images. Although it is very effective for restoring images, its high runtime complexity makes EPLL ill-suited for most practical application…
▽ More
Image restoration methods aim to recover the underlying clean image from corrupted observations. The Expected Patch Log-likelihood (EPLL) algorithm is a powerful image restoration method that uses a Gaussian mixture model (GMM) prior on the patches of natural images. Although it is very effective for restoring images, its high runtime complexity makes EPLL ill-suited for most practical applications. In this paper, we propose three approximations to the original EPLL algorithm. The resulting algorithm, which we call the fast-EPLL (FEPLL), attains a dramatic speed-up of two orders of magnitude over EPLL while incurring a negligible drop in the restored image quality (less than 0.5 dB). We demonstrate the efficacy and versatility of our algorithm on a number of inverse problems such as denoising, deblurring, super-resolution, inpainting and devignetting. To the best of our knowledge, FEPLL is the first algorithm that can competitively restore a 512x512 pixel image in under 0.5s for all the degradations mentioned above without specialized code optimizations such as CPU parallelization or GPU implementation.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
SecureD: A Secure Dual Core Embedded Processor
Authors:
Roshan G. Ragel,
Jude A. Ambrose,
Sri Parameswaran
Abstract:
Security of embedded computing systems is becoming of paramount concern as these devices become more ubiquitous, contain personal information and are increasingly used for financial transactions. Security attacks targeting embedded systems illegally gain access to the information in these devices or destroy information. The two most common types of attacks embedded systems encounter are code-injec…
▽ More
Security of embedded computing systems is becoming of paramount concern as these devices become more ubiquitous, contain personal information and are increasingly used for financial transactions. Security attacks targeting embedded systems illegally gain access to the information in these devices or destroy information. The two most common types of attacks embedded systems encounter are code-injection and power analysis attacks. In the past, a number of countermeasures, both hardware- and software-based, were proposed individually against these two types of attacks. However, no single system exists to counter both of these two prominent attacks in a processor based embedded system. Therefore, this paper, for the first time, proposes a hardware/software based countermeasure against both code-injection attacks and power analysis based side-channel attacks in a dual core embedded system. The proposed processor, named SecureD, has an area overhead of just 3.80% and an average runtime increase of 20.0% when compared to a standard dual processing system. The overhead were measured using a set of industry standard application benchmarks, with two encryption and five other programs.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
CIPARSim: Cache Intersection Property Assisted Rapid Single-pass FIFO Cache Simulation Technique
Authors:
Mohammad Shihabul Haque,
Jorgen Peddersen,
Sri Parameswaran
Abstract:
In this paper, for the first time, we introduce a cache property called the Intersection Property that helps to reduce singlepass simulation time in a manner similar to inclusion property. An intersection property defines conditions that if met, prove a particular element exists in larger caches, thus avoiding further search time. We have discussed three such intersection properties for caches usi…
▽ More
In this paper, for the first time, we introduce a cache property called the Intersection Property that helps to reduce singlepass simulation time in a manner similar to inclusion property. An intersection property defines conditions that if met, prove a particular element exists in larger caches, thus avoiding further search time. We have discussed three such intersection properties for caches using the FIFO replacement policy in this paper. A rapid singlepass FIFO cache simulator CIPARSim has also been proposed. CIPARSim is the first singlepass simulator dependent on the FIFO cache properties to reduce simulation time significantly. CIPARSim simulation time was up to 5 times faster compared to the state of the art singlepass FIFO cache simulator for the cache configurations tested. CIPARSim produces the cache hit and miss rates of an application accurately on various cache configurations. During simulation, CIPARSim intersection properties alone predict up to 90% of the total hits, reducing simulationtime immensely
△ Less
Submitted 31 August, 2015; v1 submitted 10 June, 2015;
originally announced June 2015.
-
DEW: A Fast Level 1 Cache Simulation Approach for Embedded Processors with FIFO Replacement Policy
Authors:
Mohammad Shihabul Haque,
Jorgen Peddersen,
Andhi Janapsatya,
Sri Parameswaran
Abstract:
Increasing the speed of cache simulation to obtain hit/miss rates en- ables performance estimation, cache exploration for embedded sys- tems and energy estimation. Previously, such simulations, particu- larly exact approaches, have been exclusively for caches which uti- lize the least recently used (LRU) replacement policy. In this paper, we propose a new, fast and exact cache simulation method fo…
▽ More
Increasing the speed of cache simulation to obtain hit/miss rates en- ables performance estimation, cache exploration for embedded sys- tems and energy estimation. Previously, such simulations, particu- larly exact approaches, have been exclusively for caches which uti- lize the least recently used (LRU) replacement policy. In this paper, we propose a new, fast and exact cache simulation method for the First In First Out(FIFO) replacement policy. This method, called DEW, is able to simulate multiple level 1 cache configurations (dif- ferent set sizes, associativities, and block sizes) with FIFO replace- ment policy. DEW utilizes a binomial tree based representation of cache configurations and a novel searching method to speed up sim- ulation over single cache simulators like Dinero IV. Depending on different cache block sizes and benchmark applications, DEW oper- ates around 8 to 40 times faster than Dinero IV. Dinero IV compares 2.17 to 19.42 times more cache ways than DEW to determine accu- rate miss rates.
△ Less
Submitted 31 August, 2015; v1 submitted 10 June, 2015;
originally announced June 2015.