Search | arXiv e-print repository

ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics

Authors: Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Arvind Krishnamurthy

Abstract: As modern DNN models grow ever larger, collective communications between the accelerators (allreduce, etc.) emerge as a significant performance bottleneck. Designing efficient communication schedules is challenging, given today's heterogeneous and diverse network fabrics. We present ForestColl, a tool that generates throughput-optimal schedules for any network topology. ForestColl constructs broad… ▽ More As modern DNN models grow ever larger, collective communications between the accelerators (allreduce, etc.) emerge as a significant performance bottleneck. Designing efficient communication schedules is challenging, given today's heterogeneous and diverse network fabrics. We present ForestColl, a tool that generates throughput-optimal schedules for any network topology. ForestColl constructs broadcast/aggregation spanning trees as the communication schedule, achieving theoretical optimality. Its schedule generation runs in strongly polynomial time and is highly scalable. ForestColl supports any network fabrics, including both switching fabrics and direct accelerator connections. We evaluated ForestColl on multi-box AMD MI250 and NVIDIA DGX A100 platforms. ForestColl showed significant improvements over the vendors' own optimized communication libraries, RCCL and NCCL, across various settings and in LLM training. ForestColl also outperformed other state-of-the-art schedule generation techniques with both more efficient generated schedules and substantially faster schedule generation speed. △ Less

Submitted 1 February, 2025; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.18461

arXiv:2402.06194 [pdf, other]

SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation

Authors: Yifan Xiong, Yuting Jiang, Ziyue Yang, Lei Qu, Guoshuai Zhao, Shuguang Liu, Dong Zhong, Boris Pinzur, Jie Zhang, Yang Wang, Jithin Jose, Hossein Pourreza, Jeff Baxter, Kushal Datta, Prabhat Ram, Luke Melton, Joe Chau, Peng Cheng, Yongqiang Xiong, Lidong Zhou

Abstract: Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, so called "gray failure", for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions… ▽ More Reliability in cloud AI infrastructure is crucial for cloud service providers, prompting the widespread use of hardware redundancies. However, these redundancies can inadvertently lead to hidden degradation, so called "gray failure", for AI workloads, significantly affecting end-to-end performance and concealing performance issues, which complicates root cause analysis for failures and regressions. We introduce SuperBench, a proactive validation system for AI infrastructure that mitigates hidden degradation caused by hardware redundancies and enhances overall reliability. SuperBench features a comprehensive benchmark suite, capable of evaluating individual hardware components and representing most real AI workloads. It comprises a Validator which learns benchmark criteria to clearly pinpoint defective components. Additionally, SuperBench incorporates a Selector to balance validation time and issue-related penalties, enabling optimal timing for validation execution with a tailored subset of benchmarks. Through testbed evaluation and simulation, we demonstrate that SuperBench can increase the mean time between incidents by up to 22.61x. SuperBench has been successfully deployed in Azure production, validating hundreds of thousands of GPUs over the last two years. △ Less

Submitted 7 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: USENIX ATC '24

arXiv:2308.00828 [pdf, other]

Deep Learning Approaches in Pavement Distress Identification: A Review

Authors: Sizhe Guan, Haolan Liu, Hamid R. Pourreza, Hamidreza Mahyar

Abstract: This paper presents a comprehensive review of recent advancements in image processing and deep learning techniques for pavement distress detection and classification, a critical aspect in modern pavement management systems. The conventional manual inspection process conducted by human experts is gradually being superseded by automated solutions, leveraging machine learning and deep learning algori… ▽ More This paper presents a comprehensive review of recent advancements in image processing and deep learning techniques for pavement distress detection and classification, a critical aspect in modern pavement management systems. The conventional manual inspection process conducted by human experts is gradually being superseded by automated solutions, leveraging machine learning and deep learning algorithms to enhance efficiency and accuracy. The ability of these algorithms to discern patterns and make predictions based on extensive datasets has revolutionized the domain of pavement distress identification. The paper investigates the integration of unmanned aerial vehicles (UAVs) for data collection, offering unique advantages such as aerial perspectives and efficient coverage of large areas. By capturing high-resolution images, UAVs provide valuable data that can be processed using deep learning algorithms to detect and classify various pavement distresses effectively. While the primary focus is on 2D image processing, the paper also acknowledges the challenges associated with 3D images, such as sensor limitations and computational requirements. Understanding these challenges is crucial for further advancements in the field. The findings of this review significantly contribute to the evolution of pavement distress detection, fostering the development of efficient pavement management systems. As automated approaches continue to mature, the implementation of deep learning techniques holds great promise in ensuring safer and more durable road infrastructure for the benefit of society. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2303.08398 [pdf, ps, other]

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

Authors: Saeideh Yousefzadeh, Hamidreza Pourreza, Hamidreza Mahyar

Abstract: Content-based image retrieval is the process of retrieving a subset of images from an extensive image gallery based on visual contents, such as color, shape or spatial relations, and texture. In some applications, such as localization, image retrieval is employed as the initial step. In such cases, the accuracy of the top-retrieved images significantly affects the overall system accuracy. The curr… ▽ More Content-based image retrieval is the process of retrieving a subset of images from an extensive image gallery based on visual contents, such as color, shape or spatial relations, and texture. In some applications, such as localization, image retrieval is employed as the initial step. In such cases, the accuracy of the top-retrieved images significantly affects the overall system accuracy. The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters, which offers acceptable accuracy in top-retrieved images. The proposed method benefits from a dilated residual convolutional neural network with triplet loss. Experimental evaluations show that this model can extract richer information (i.e., high-resolution representations) by enlarging the receptive field, thus improving image retrieval accuracy without increasing the depth or complexity of the model. To enhance the extracted representations' robustness, the current research obtains candidate regions of interest from each feature map and applies Generalized-Mean pooling to the regions. As the choice of triplets in a triplet-based network affects the model training, we employ a triplet online mining method. We test the performance of the proposed method under various configurations on two of the challenging image-retrieval datasets, namely Revisited Paris6k (RPar) and UKBench. The experimental results show an accuracy of 94.54 and 80.23 (mean precision at rank 10) in the RPar medium and hard modes and 3.86 (recall at rank 4) in the UKBench dataset, respectively. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2201.12944 [pdf, other]

doi 10.1145/3617592

Deep Learning Approaches on Image Captioning: A Review

Authors: Taraneh Ghandi, Hamidreza Pourreza, Hamidreza Mahyar

Abstract: Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training techniques has revolutionized the field, leading to more sophisticated methods and improved performance. In this survey paper, we provide a structured review of deep lea… ▽ More Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training techniques has revolutionized the field, leading to more sophisticated methods and improved performance. In this survey paper, we provide a structured review of deep learning methods in image captioning by presenting a comprehensive taxonomy and discussing each method category in detail. Additionally, we examine the datasets commonly employed in image captioning research, as well as the evaluation metrics used to assess the performance of different captioning models. We address the challenges faced in this field by emphasizing issues such as object hallucination, missing context, illumination conditions, contextual understanding, and referring expressions. We rank different deep learning methods' performance according to widely used evaluation metrics, giving insight into the current state of the art. Furthermore, we identify several potential future directions for research in this area, which include tackling the information misalignment problem between image and text modalities, mitigating dataset bias, incorporating vision-language pre-training methods to enhance caption generation, and developing improved evaluation tools to accurately measure the quality of image captions. △ Less

Submitted 22 August, 2023; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: 41 pages, 6 figures

ACM Class: I.2.7; I.4

arXiv:2201.01832 [pdf]

Multiple Sclerosis Lesions Segmentation using Attention-Based CNNs in FLAIR Images

Authors: Mehdi SadeghiBakhi, Hamidreza Pourreza, Hamidreza Mahyar

Abstract: Objective: Multiple Sclerosis (MS) is an autoimmune, and demyelinating disease that leads to lesions in the central nervous system. This disease can be tracked and diagnosed using Magnetic Resonance Imaging (MRI). Up to now a multitude of multimodality automatic biomedical approaches is used to segment lesions which are not beneficial for patients in terms of cost, time, and usability. The authors… ▽ More Objective: Multiple Sclerosis (MS) is an autoimmune, and demyelinating disease that leads to lesions in the central nervous system. This disease can be tracked and diagnosed using Magnetic Resonance Imaging (MRI). Up to now a multitude of multimodality automatic biomedical approaches is used to segment lesions which are not beneficial for patients in terms of cost, time, and usability. The authors of the present paper propose a method employing just one modality (FLAIR image) to segment MS lesions accurately. Methods: A patch-based Convolutional Neural Network (CNN) is designed, inspired by 3D-ResNet and spatial-channel attention module, to segment MS lesions. The proposed method consists of three stages: (1) the contrast-limited adaptive histogram equalization (CLAHE) is applied to the original images and concatenated to the extracted edges in order to create 4D images; (2) the patches of size 80 * 80 * 80 * 2 are randomly selected from the 4D images; and (3) the extracted patches are passed into an attention-based CNN which is used to segment the lesions. Finally, the proposed method was compared to previous studies of the same dataset. Results: The current study evaluates the model, with a test set of ISIB challenge data. Experimental results illustrate that the proposed approach significantly surpasses existing methods in terms of Dice similarity and Absolute Volume Difference while the proposed method use just one modality (FLAIR) to segment the lesions. Conclusions: The authors have introduced an automated approach to segment the lesions which is based on, at most, two modalities as an input. The proposed architecture is composed of convolution, deconvolution, and an SCA-VoxRes module as an attention module. The results show, the proposed method outperforms well compare to other methods. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2003.13440 [pdf]

Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)

Authors: Germán González, Daniel Jimenez-Carretero, Sara Rodríguez-López, Carlos Cano-Espinosa, Miguel Cazorla, Tanya Agarwal, Vinit Agarwal, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang, Mojtaba Masoudi, Noushin Eftekhari, Mahdi Saadatmand, Hamid-Reza Pourreza, Patricia Fraga-Rivas, Eduardo Fraile, Frank J. Rybicki, Ara Kassarjian, Raúl San José Estépar, Maria J. Ledesma-Carbayo

Abstract: Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography… ▽ More Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography pulmonary angiographies, use it to compare the sensitivity and false positive rate of current algorithms and to develop new methods that improve such metrics. Methods: 91 Computed tomography pulmonary angiography scans were annotated by at least one radiologist by segmenting all pulmonary emboli visible on the study. 20 annotated CTPAs were open to the public in the form of a medical image analysis challenge. 20 more were kept for evaluation purposes. 51 were made available post-challenge. 8 submissions, 6 of them novel, were evaluated on the 20 evaluation CTPAs. Performance was measured as per embolus sensitivity vs. false positives per scan curve. Results: The best algorithms achieved a per-embolus sensitivity of 75% at 2 false positives per scan (fps) or of 70% at 1 fps, outperforming the state of the art. Deep learning approaches outperformed traditional machine learning ones, and their performance improved with the number of training cases. Significance: Through this work and challenge we have improved the state-of-the art of computer aided detection algorithms for pulmonary embolism. An open database and an evaluation benchmark for such algorithms have been generated, easing the development of further improvements. Implications on clinical practice will need further research. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: 8 pages, 3 figures

arXiv:1710.05191 [pdf, other]

Microaneurysm Detection in Fundus Images Using a Two-step Convolutional Neural Networks

Authors: Noushin Eftekheri, Mojtaba Masoudi, Hamidreza Pourreza, Kamaledin Ghiasi Shirazi, Ehsan Saeedi

Abstract: Diabetic Retinopathy (DR) is a prominent cause of blindness in the world. The early treatment of DR can be conducted from detection of microaneurysms (MAs) which appears as reddish spots in retinal images. An automated microaneurysm detection can be a helpful system for ophthalmologists. In this paper, deep learning, in particular convolutional neural network (CNN), is used as a powerful tool to e… ▽ More Diabetic Retinopathy (DR) is a prominent cause of blindness in the world. The early treatment of DR can be conducted from detection of microaneurysms (MAs) which appears as reddish spots in retinal images. An automated microaneurysm detection can be a helpful system for ophthalmologists. In this paper, deep learning, in particular convolutional neural network (CNN), is used as a powerful tool to efficiently detect MAs from fundus images. In our method a new technique is used to utilise a two-stage training process which results in an accurate detection, while decreasing computational complexity in comparison with previous works. To validate our proposed method, an experiment is conducted using Keras library to implement our proposed CNN on two standard publicly available datasets. Our results show a promising sensitivity value of about 0.8 at the average number of false positive per image greater than 6 which is a competitive value with the state-of-the-art approaches. △ Less

Submitted 8 July, 2018; v1 submitted 14 October, 2017; originally announced October 2017.

arXiv:1707.01330 [pdf, ps, other]

A dataset for Computer-Aided Detection of Pulmonary Embolism in CTA images

Authors: Mojtaba Masoudi, Hamidreza Pourreza, Mahdi Saadatmand Tarzjan, Fateme Shafiee Zargar, Masoud Pezeshki Rad, Noushin Eftekhari

Abstract: Todays, researchers in the field of Pulmonary Embolism (PE) analysis need to use a publicly available dataset to assess and compare their methods. Different systems have been designed for the detection of pulmonary embolism (PE), but none of them have used any public datasets. All papers have used their own private dataset. In order to fill this gap, we have collected 5160 slices of computed tomog… ▽ More Todays, researchers in the field of Pulmonary Embolism (PE) analysis need to use a publicly available dataset to assess and compare their methods. Different systems have been designed for the detection of pulmonary embolism (PE), but none of them have used any public datasets. All papers have used their own private dataset. In order to fill this gap, we have collected 5160 slices of computed tomography angiography (CTA) images acquired from 20 patients, and after labeling the image by experts in this field, we provided a reliable dataset which is now publicly available. In some situation, PE detection can be difficult, for example when it occurs in the peripheral branches or when patients have pulmonary diseases (such as parenchymal disease). Therefore, the efficiency of CAD systems highly depends on the dataset. In the given dataset, 66% of PE are located in peripheral branches, and different pulmonary diseases are also included. △ Less

Submitted 5 July, 2017; originally announced July 2017.

arXiv:1603.04046 [pdf]

Image and Depth from a Single Defocused Image Using Coded Aperture Photography

Authors: Mina Masoudifar, Hamid Reza Pourreza

Abstract: Depth from defocus and defocus deblurring from a single image are two challenging problems that are derived from the finite depth of field in conventional cameras. Coded aperture imaging is one of the techniques that is used for improving the results of these two problems. Up to now, different methods have been proposed for improving the results of either defocus deblurring or depth estimation. In… ▽ More Depth from defocus and defocus deblurring from a single image are two challenging problems that are derived from the finite depth of field in conventional cameras. Coded aperture imaging is one of the techniques that is used for improving the results of these two problems. Up to now, different methods have been proposed for improving the results of either defocus deblurring or depth estimation. In this paper, a multi-objective function is proposed for evaluating and designing aperture patterns with the aim of improving the results of both depth from defocus and defocus deblurring. Pattern evaluation is performed by considering the scene illumination condition and camera system specification. Based on the proposed criteria, a single asymmetric pattern is designed that is used for restoring a sharp image and a depth map from a single input. Since the designed pattern is asymmetric, defocus objects on the two sides of the focal plane can be distinguished. Depth estimation is performed by using a new algorithm, which is based on image quality assessment criteria and can distinguish between blurred objects lying in front or behind the focal plane. Extensive simulations as well as experiments on a variety of real scenes are conducted to compare our aperture with previously proposed ones. △ Less

Submitted 13 March, 2016; originally announced March 2016.

Comments: 18 pages, 14 figures, submitted

arXiv:1601.02225 [pdf, other]

Parallel Stroked Multi Line: a model-based method for compressing large fingerprint databases

Authors: Hamid Mansouri, Hamid-Reza Pourreza

Abstract: With increasing usage of fingerprints as an important biometric data, the need to compress the large fingerprint databases has become essential. The most recommended compression algorithm, even by standards, is JPEG2K. But at high compression rates, this algorithm is ineffective. In this paper, a model is proposed which is based on parallel lines with same orientations, arbitrary widths and same g… ▽ More With increasing usage of fingerprints as an important biometric data, the need to compress the large fingerprint databases has become essential. The most recommended compression algorithm, even by standards, is JPEG2K. But at high compression rates, this algorithm is ineffective. In this paper, a model is proposed which is based on parallel lines with same orientations, arbitrary widths and same gray level values located on rectangle with constant gray level value as background. We refer to this algorithm as Parallel Stroked Multi Line (PSML). By using Adaptive Geometrical Wavelet and employing PSML, a compression algorithm is developed. This compression algorithm can preserve fingerprint structure and minutiae. The exact algorithm of computing the PSML model take exponential time. However, we have proposed an alternative approximation algorithm, which reduces the time complexity to $O(n^3)$. The proposed PSML alg. has significant advantage over Wedgelets Transform in PSNR value and visual quality in compressed images. The proposed method, despite the lower PSNR values than JPEG2K algorithm in common range of compression rates, in all compression rates have nearly equal or greater advantage over JPEG2K when used by Automatic Fingerprint Identification Systems (AFIS). At high compression rates, according to PSNR values, mean EER rate and visual quality, the encoded images with JPEG2K can not be identified from each other after compression. But, images encoded by the PSML alg. retained the sufficient information to maintain fingerprint identification performances similar to the ones obtained by raw images without compression. One the U.are.U 400 database, the mean EER rate for uncompressed images is 4.54%, while at 267:1 compression ratio, this value becomes 49.41% and 6.22% for JPEG2K and PSML, respectively. This result shows a significant improvement over the standard JPEG2K algorithm. △ Less

Submitted 10 January, 2016; originally announced January 2016.

Comments: 26 pages, 10 figures, submitted to Computer Vision and Image Understanding

arXiv:1512.02357 [pdf, other]

Towards the Application of Linear Programming Methods For Multi-Camera Pose Estimation

Authors: Masoud Aghamohamadian-Sharbaf, Ahmadreza Heravi, Hamidreza Pourreza

Abstract: We presented a separation based optimization algorithm which, rather than optimization the entire variables altogether, This would allow us to employ: 1) a class of nonlinear functions with three variables and 2) a convex quadratic multivariable polynomial, for minimization of reprojection error. Neglecting the inversion required to minimize the nonlinear functions, in this paper we demonstrate ho… ▽ More We presented a separation based optimization algorithm which, rather than optimization the entire variables altogether, This would allow us to employ: 1) a class of nonlinear functions with three variables and 2) a convex quadratic multivariable polynomial, for minimization of reprojection error. Neglecting the inversion required to minimize the nonlinear functions, in this paper we demonstrate how separation allows eradication of matrix inversion. △ Less

Submitted 8 December, 2015; originally announced December 2015.

arXiv:1406.0909 [pdf]

doi 10.14445/22315381/IJETT-V7P254

Improvement Tracking Dynamic Programming using Replication Function for Continuous Sign Language Recognition

Authors: S. Ildarabadi, M. Ebrahimi, H. R. Pourreza

Abstract: In this paper we used a Replication Function (R. F.)for improvement tracking with dynamic programming. The R. F. transforms values of gray level [0 255] to [0 1]. The resulting images of R. F. are more striking and visible in skin regions. The R. F. improves Dynamic Programming (D. P.) in overlapping hand and face. Results show that Tracking Error Rate 11% and Average Tracked Distance 7% reduced In this paper we used a Replication Function (R. F.)for improvement tracking with dynamic programming. The R. F. transforms values of gray level [0 255] to [0 1]. The resulting images of R. F. are more striking and visible in skin regions. The R. F. improves Dynamic Programming (D. P.) in overlapping hand and face. Results show that Tracking Error Rate 11% and Average Tracked Distance 7% reduced △ Less

Submitted 3 June, 2014; originally announced June 2014.

Comments: 5 pages, 13 figures, Published with "International Journal of Engineering Trends and Technology (IJETT)"

arXiv:1211.4499 [pdf, other]

Rate-Distortion Analysis of Multiview Coding in a DIBR Framework

Authors: Boshra Rajaei, Thomas Maugey, Hamid-Reza Pourreza, Pascal Frossard

Abstract: Depth image based rendering techniques for multiview applications have been recently introduced for efficient view generation at arbitrary camera positions. Encoding rate control has thus to consider both texture and depth data. Due to different structures of depth and texture images and their different roles on the rendered views, distributing the available bit budget between them however require… ▽ More Depth image based rendering techniques for multiview applications have been recently introduced for efficient view generation at arbitrary camera positions. Encoding rate control has thus to consider both texture and depth data. Due to different structures of depth and texture images and their different roles on the rendered views, distributing the available bit budget between them however requires a careful analysis. Information loss due to texture coding affects the value of pixels in synthesized views while errors in depth information lead to shift in objects or unexpected patterns at their boundaries. In this paper, we address the problem of efficient bit allocation between textures and depth data of multiview video sequences. We adopt a rate-distortion framework based on a simplified model of depth and texture images. Our model preserves the main features of depth and texture images. Unlike most recent solutions, our method permits to avoid rendering at encoding time for distortion estimation so that the encoding complexity is not augmented. In addition to this, our model is independent of the underlying inpainting method that is used at decoder. Experiments confirm our theoretical results and the efficiency of our rate allocation strategy. △ Less

Submitted 19 November, 2012; originally announced November 2012.

arXiv:0906.4789 [pdf]

Efficient IRIS Recognition through Improvement of Feature Extraction and subset Selection

Authors: Amir Azizi, Hamid Reza Pourreza

Abstract: The selection of the optimal feature subset and the classification has become an important issue in the field of iris recognition. In this paper we propose several methods for iris feature subset selection and vector creation. The deterministic feature sequence is extracted from the iris image by using the contourlet transform technique. Contourlet transform captures the intrinsic geometrical st… ▽ More The selection of the optimal feature subset and the classification has become an important issue in the field of iris recognition. In this paper we propose several methods for iris feature subset selection and vector creation. The deterministic feature sequence is extracted from the iris image by using the contourlet transform technique. Contourlet transform captures the intrinsic geometrical structures of iris image. It decomposes the iris image into a set of directional sub-bands with texture details captured in different orientations at various scales so for reducing the feature vector dimensions we use the method for extract only significant bit and information from normalized iris images. In this method we ignore fragile bits. And finally we use SVM (Support Vector Machine) classifier for approximating the amount of people identification in our proposed system. Experimental result show that most proposed method reduces processing time and increase the classification accuracy and also the iris feature vector length is much smaller versus the other methods. △ Less

Submitted 25 June, 2009; originally announced June 2009.

Comments: 10 pages, International Journal of Computer Science and Information Security (IJCSIS)

Journal ref: IJCSIS JUne 2009 Issue, Vol. 2, No. 1

Showing 1–15 of 15 results for author: Pourreza, H