-
Calipers: A Criticality-aware Framework for Modeling Processor Performance
Authors:
Hossein Golestani,
Rathijit Sen,
Vinson Young,
Gagan Gupta
Abstract:
Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, with low efforts and at a desired accuracy. We propose Calipers, a criticality-based framework to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and eve…
▽ More
Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, with low efforts and at a desired accuracy. We propose Calipers, a criticality-based framework to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables architects to investigate a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Modeling algorithms are described in detail.
We apply Calipers to explore and gain insights in complex microarchitectural and ISA ideas for RISC and EDGE processors, at lower effort than cycle-accurate simulators and with comparable accuracy. For example, among a variety of investigations presented in the paper, experiments show that targeting only a fraction of critical loads can help realize most benefits of value prediction.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
Authors:
Seyyed Salar Latifi Oskouei,
Hossein Golestani,
Matin Hashemi,
Soheil Ghiasi
Abstract:
Many mobile applications running on smartphones and wearable devices would potentially benefit from the accuracy and scalability of deep CNN-based machine learning algorithms. However, performance and energy consumption limitations make the execution of such computationally intensive algorithms on mobile devices prohibitive. We present a GPU-accelerated library, dubbed CNNdroid, for execution of t…
▽ More
Many mobile applications running on smartphones and wearable devices would potentially benefit from the accuracy and scalability of deep CNN-based machine learning algorithms. However, performance and energy consumption limitations make the execution of such computationally intensive algorithms on mobile devices prohibitive. We present a GPU-accelerated library, dubbed CNNdroid, for execution of trained deep CNNs on Android-based mobile devices. Empirical evaluations show that CNNdroid achieves up to 60X speedup and 130X energy saving on current mobile devices. The CNNdroid open source library is available for download at https://github.com/ENCP/CNNdroid
△ Less
Submitted 15 October, 2016; v1 submitted 23 November, 2015;
originally announced November 2015.
-
Enhance Robustness of Image-in-Image Watermarking through Data Partitioning
Authors:
Hossein Bakhshi Golestani,
Shahrokh Ghaemmaghami
Abstract:
Vulnerability of watermarking schemes against intense signal processing attacks is generally a major concern, particularly when there are techniques to reproduce an acceptable copy of the original signal with no chance for detecting the watermark. In this paper, we propose a two-layer, data partitioning (DP) based, image in image watermarking method in the DCT domain to improve the watermark detec…
▽ More
Vulnerability of watermarking schemes against intense signal processing attacks is generally a major concern, particularly when there are techniques to reproduce an acceptable copy of the original signal with no chance for detecting the watermark. In this paper, we propose a two-layer, data partitioning (DP) based, image in image watermarking method in the DCT domain to improve the watermark detection performance. Truncated singular value decomposition, binary wavelet decomposition and spatial scalability idea in H.264/SVC are analyzed and employed as partitioning methods. It is shown that the proposed scheme outperforms its two recent competitors in terms of both data payload and robustness to intense attacks.
△ Less
Submitted 8 January, 2015;
originally announced January 2015.
-
Minimization of image watermarking side effects through subjective optimization
Authors:
Hossein Bakhshi Golestani,
Mohammed Ghanbari
Abstract:
This paper investigates the use of Structural Similaritys (SSIM) index on the minimized side effect to image watermarking. For fast implementation and more compatibility with the standard DCT based codecs, watermark insertion is carried out on the DCT coefficients and hence a SSIM model for DCT based watermarking is developed. For faster implementation, the SSIM index is maximized over independent…
▽ More
This paper investigates the use of Structural Similaritys (SSIM) index on the minimized side effect to image watermarking. For fast implementation and more compatibility with the standard DCT based codecs, watermark insertion is carried out on the DCT coefficients and hence a SSIM model for DCT based watermarking is developed. For faster implementation, the SSIM index is maximized over independent 4x4 non-overlapped blocks but the disparity between the adjacent blocks reduces the overall image quality. This problem is resolved through optimization of overlapped blocks, but, the higher image quality is achieved at a cost of high computational complexity. To reduce the computational complexity while preserving the good quality, optimization of semi-overlapped blocks is introduced. We show that while SSIM-based optimization over overlapped blocks has as high as 64 times the complexity of the 4x4 non-overlapped method, with semi-overlapped optimization the high quality of overlapped method is preserved only at a cost of less than 8 times the non-overlapped method.
△ Less
Submitted 8 January, 2015;
originally announced January 2015.
-
A Study on Clustering for Clustering Based Image De-Noising
Authors:
Hossein Bakhshi Golestani,
Mohsen Joneidi,
Mostafa Sadeghi
Abstract:
In this paper, the problem of de-noising of an image contaminated with Additive White Gaussian Noise (AWGN) is studied. This subject is an open problem in signal processing for more than 50 years. Local methods suggested in recent years, have obtained better results than global methods. However by more intelligent training in such a way that first, important data is more effective for training, se…
▽ More
In this paper, the problem of de-noising of an image contaminated with Additive White Gaussian Noise (AWGN) is studied. This subject is an open problem in signal processing for more than 50 years. Local methods suggested in recent years, have obtained better results than global methods. However by more intelligent training in such a way that first, important data is more effective for training, second, clustering in such way that training blocks lie in low-rank subspaces, we can design a dictionary applicable for image de-noising and obtain results near the state of the art local methods. In the present paper, we suggest a method based on global clustering of image constructing blocks. As the type of clustering plays an important role in clustering-based de-noising methods, we address two questions about the clustering. The first, which parts of the data should be considered for clustering? and the second, what data clustering method is suitable for de-noising.? Then clustering is exploited to learn an over complete dictionary. By obtaining sparse decomposition of the noisy image blocks in terms of the dictionary atoms, the de-noised version is achieved. In addition to our framework, 7 popular dictionary learning methods are simulated and compared. The results are compared based on two major factors: (1) de-noising performance and (2) execution time. Experimental results show that our dictionary learning framework outperforms its competitors in terms of both factors.
△ Less
Submitted 6 January, 2015;
originally announced January 2015.