-
LayerDropBack: A Universally Applicable Approach for Accelerating Training of Deep Networks
Authors:
Evgeny Hershkovitch Neiterman,
Gil Ben-Artzi
Abstract:
Training very deep convolutional networks is challenging, requiring significant computational resources and time. Existing acceleration methods often depend on specific architectures or require network modifications. We introduce LayerDropBack (LDB), a simple yet effective method to accelerate training across a wide range of deep networks. LDB introduces randomness only in the backward pass, maint…
▽ More
Training very deep convolutional networks is challenging, requiring significant computational resources and time. Existing acceleration methods often depend on specific architectures or require network modifications. We introduce LayerDropBack (LDB), a simple yet effective method to accelerate training across a wide range of deep networks. LDB introduces randomness only in the backward pass, maintaining the integrity of the forward pass, guaranteeing that the same network is used during both training and inference. LDB can be seamlessly integrated into the training process of any model without altering its architecture, making it suitable for various network topologies. Our extensive experiments across multiple architectures (ViT, Swin Transformer, EfficientNet, DLA) and datasets (CIFAR-100, ImageNet) show significant training time reductions of 16.93\% to 23.97\%, while preserving or even enhancing model accuracy. Code is available at \url{https://github.com/neiterman21/LDB}.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution
Authors:
Amir Ofir,
Gil Ben-Artzi
Abstract:
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory…
▽ More
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix multiplication (GEMM) with a matrix of weights. This results in two main drawbacks: (a) im2col requires a large memory buffer and can experience inefficient memory access, and (b) while GEMM is highly optimized for scientific matrices multiplications, it is not well suited for convolutions. We propose an approach that takes advantage of scalar-matrix multiplication and reduces memory overhead. Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
Deep BI-RADS Network for Improved Cancer Detection from Mammograms
Authors:
Gil Ben-Artzi,
Feras Daragma,
Shahar Mahpod
Abstract:
While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these…
▽ More
While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these expert-derived features. To address this question, we introduce a novel multi-modal approach that combines textual BI-RADS lesion descriptors with visual mammogram content. Our method employs iterative attention layers to effectively fuse these different modalities, significantly improving classification performance over image-only models. Experiments on the CBIS-DDSM dataset demonstrate substantial improvements across all metrics, demonstrating the contribution of handcrafted features to end-to-end.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
ChannelDropBack: Forward-Consistent Stochastic Regularization for Deep Networks
Authors:
Evgeny Hershkovitch Neiterman,
Gil Ben-Artzi
Abstract:
Incorporating stochasticity into the training process of deep convolutional networks is a widely used technique to reduce overfitting and improve regularization. Existing techniques often require modifying the architecture of the network by adding specialized layers, are effective only to specific network topologies or types of layers - linear or convolutional, and result in a trained model that i…
▽ More
Incorporating stochasticity into the training process of deep convolutional networks is a widely used technique to reduce overfitting and improve regularization. Existing techniques often require modifying the architecture of the network by adding specialized layers, are effective only to specific network topologies or types of layers - linear or convolutional, and result in a trained model that is different from the deployed one. We present ChannelDropBack, a simple stochastic regularization approach that introduces randomness only into the backward information flow, leaving the forward pass intact. ChannelDropBack randomly selects a subset of channels within the network during the backpropagation step and applies weight updates only to them. As a consequence, it allows for seamless integration into the training process of any model and layers without the need to change its architecture, making it applicable to various network topologies, and the exact same network is deployed during training and inference. Experimental evaluations validate the effectiveness of our approach, demonstrating improved accuracy on popular datasets and models, including ImageNet and ViT. Code is available at \url{https://github.com/neiterman21/ChannelDropBack.git}.
△ Less
Submitted 23 November, 2024; v1 submitted 16 November, 2024;
originally announced November 2024.
-
CTrGAN: Cycle Transformers GAN for Gait Transfer
Authors:
Shahar Mahpod,
Noam Gaash,
Hay Hoffman,
Gil Ben-Artzi
Abstract:
We introduce a novel approach for gait transfer from unconstrained videos in-the-wild. In contrast to motion transfer, the objective here is not to imitate the source's motions by the target, but rather to replace the walking source with the target, while transferring the target's typical gait. Our approach can be trained only once with multiple sources and is able to transfer the gait of the targ…
▽ More
We introduce a novel approach for gait transfer from unconstrained videos in-the-wild. In contrast to motion transfer, the objective here is not to imitate the source's motions by the target, but rather to replace the walking source with the target, while transferring the target's typical gait. Our approach can be trained only once with multiple sources and is able to transfer the gait of the target from unseen sources, eliminating the need for retraining for each new source independently. Furthermore, we propose a novel metrics for gait transfer based on gait recognition models that enable to quantify the quality of the transferred gait, and show that existing techniques yield a discrepancy that can be easily detected.
We introduce Cycle Transformers GAN (CTrGAN), that consist of a decoder and encoder, both Transformers, where the attention is on the temporal domain between complete images rather than the spatial domain between patches. Using a widely-used gait recognition dataset, we demonstrate that our approach is capable of producing over an order of magnitude more realistic personalized gaits than existing methods, even when used with sources that were not available during training. As part of our solution, we present a detector that determines whether a video is real or generated by our model.
△ Less
Submitted 7 January, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Hypernetwork-Based Adaptive Image Restoration
Authors:
Shai Aharon,
Gil Ben-Artzi
Abstract:
Adaptive image restoration models can restore images with different degradation levels at inference time without the need to retrain the model. We present an approach that is highly accurate and allows a significant reduction in the number of parameters. In contrast to existing methods, our approach can restore images using a single fixed-size model, regardless of the number of degradation levels.…
▽ More
Adaptive image restoration models can restore images with different degradation levels at inference time without the need to retrain the model. We present an approach that is highly accurate and allows a significant reduction in the number of parameters. In contrast to existing methods, our approach can restore images using a single fixed-size model, regardless of the number of degradation levels. On popular datasets, our approach yields state-of-the-art results in terms of size and accuracy for a variety of image restoration tasks, including denoising, deJPEG, and super-resolution.
△ Less
Submitted 26 February, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Adaptive Enhancement of Extreme Low-Light Images
Authors:
Evgeny Hershkovitch Neiterman,
Michael Klyuchka,
Gil Ben-Artzi
Abstract:
Existing methods for enhancing dark images captured in a very low-light environment assume that the intensity level of the optimal output image is known and already included in the training set. However, this assumption often does not hold, leading to output images that contain visual imperfections such as dark regions or low contrast. To facilitate the training and evaluation of adaptive models t…
▽ More
Existing methods for enhancing dark images captured in a very low-light environment assume that the intensity level of the optimal output image is known and already included in the training set. However, this assumption often does not hold, leading to output images that contain visual imperfections such as dark regions or low contrast. To facilitate the training and evaluation of adaptive models that can overcome this limitation, we have created a dataset of 1500 raw images taken in both indoor and outdoor low-light conditions. Based on our dataset, we introduce a deep learning model capable of enhancing input images with a wide range of intensity levels at runtime, including ones that are not seen during training. Our experimental results demonstrate that our proposed dataset combined with our model can consistently and effectively enhance images across a wide range of diverse and challenging scenarios.
△ Less
Submitted 4 April, 2023; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Separable Four Points Fundamental Matrix
Authors:
Gil Ben-Artzi
Abstract:
We present a novel approach for RANSAC-based computation of the fundamental matrix based on epipolar homography decomposition. We analyze the geometrical meaning of the decomposition-based representation and show that it directly induces a consecutive sampling strategy of two independent sets of correspondences. We show that our method guarantees a minimal number of evaluated hypotheses with respe…
▽ More
We present a novel approach for RANSAC-based computation of the fundamental matrix based on epipolar homography decomposition. We analyze the geometrical meaning of the decomposition-based representation and show that it directly induces a consecutive sampling strategy of two independent sets of correspondences. We show that our method guarantees a minimal number of evaluated hypotheses with respect to current minimal approaches, on the condition that there are four correspondences on an image line. We validate our approach on real-world image pairs, providing fast and accurate results.
△ Less
Submitted 29 September, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Camera Calibration by Global Constraints on the Motion of Silhouettes
Authors:
Gil Ben-Artzi
Abstract:
We address the problem of epipolar geometry using the motion of silhouettes. Such methods match epipolar lines or frontier points across views, which are then used as the set of putative correspondences. We introduce an approach that improves by two orders of magnitude the performance over state-of-the-art methods, by significantly reducing the number of outliers in the putative matching. We model…
▽ More
We address the problem of epipolar geometry using the motion of silhouettes. Such methods match epipolar lines or frontier points across views, which are then used as the set of putative correspondences. We introduce an approach that improves by two orders of magnitude the performance over state-of-the-art methods, by significantly reducing the number of outliers in the putative matching. We model the frontier points' correspondence problem as constrained flow optimization, requiring small differences between their coordinates over consecutive frames. Our approach is formulated as a Linear Integer Program and we show that due to the nature of our problem, it can be solved efficiently in an iterative manner. Our method was validated on four standard datasets providing accurate calibrations across very different viewpoints.
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
Fundamental Matrices from Moving Objects Using Line Motion Barcodes
Authors:
Yoni Kasten,
Gil Ben-Artzi,
Shmuel Peleg,
Michael Werman
Abstract:
Computing the epipolar geometry between cameras with very different viewpoints is often very difficult. The appearance of objects can vary greatly, and it is difficult to find corresponding feature points. Prior methods searched for corresponding epipolar lines using points on the convex hull of the silhouette of a single moving object. These methods fail when the scene includes multiple moving ob…
▽ More
Computing the epipolar geometry between cameras with very different viewpoints is often very difficult. The appearance of objects can vary greatly, and it is difficult to find corresponding feature points. Prior methods searched for corresponding epipolar lines using points on the convex hull of the silhouette of a single moving object. These methods fail when the scene includes multiple moving objects. This paper extends previous work to scenes having multiple moving objects by using the "Motion Barcodes", a temporal signature of lines. Corresponding epipolar lines have similar motion barcodes, and candidate pairs of corresponding epipoar lines are found by the similarity of their motion barcodes. As in previous methods we assume that cameras are relatively stationary and that moving objects have already been extracted using background subtraction.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.
-
Epipolar Geometry Based On Line Similarity
Authors:
Gil Ben-Artzi,
Tavi Halperin,
Michael Werman,
Shmuel Peleg
Abstract:
It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enabl…
▽ More
It is known that epipolar geometry can be computed from three epipolar line correspondences but this computation is rarely used in practice since there are no simple methods to find corresponding lines. Instead, methods for finding corresponding points are widely used. This paper proposes a similarity measure between lines that indicates whether two lines are corresponding epipolar lines and enables finding epipolar line correspondences as needed for the computation of epipolar geometry.
A similarity measure between two lines, suitable for video sequences of a dynamic scene, has been previously described. This paper suggests a stereo matching similarity measure suitable for images. It is based on the quality of stereo matching between the two lines, as corresponding epipolar lines yield a good stereo correspondence.
Instead of an exhaustive search over all possible pairs of lines, the search space is substantially reduced when two corresponding point pairs are given.
We validate the proposed method using real-world images and compare it to state-of-the-art methods. We found this method to be more accurate by a factor of five compared to the standard method using seven corresponding points and comparable to the 8-points algorithm.
△ Less
Submitted 7 January, 2017; v1 submitted 17 April, 2016;
originally announced April 2016.
-
Camera Calibration from Dynamic Silhouettes Using Motion Barcodes
Authors:
Gil Ben-Artzi,
Yoni Kasten,
Shmuel Peleg,
Michael Werman
Abstract:
Computing the epipolar geometry between cameras with very different viewpoints is often problematic as matching points are hard to find. In these cases, it has been proposed to use information from dynamic objects in the scene for suggesting point and line correspondences.
We propose a speed up of about two orders of magnitude, as well as an increase in robustness and accuracy, to methods comput…
▽ More
Computing the epipolar geometry between cameras with very different viewpoints is often problematic as matching points are hard to find. In these cases, it has been proposed to use information from dynamic objects in the scene for suggesting point and line correspondences.
We propose a speed up of about two orders of magnitude, as well as an increase in robustness and accuracy, to methods computing epipolar geometry from dynamic silhouettes. This improvement is based on a new temporal signature: motion barcode for lines. Motion barcode is a binary temporal sequence for lines, indicating for each frame the existence of at least one foreground pixel on that line. The motion barcodes of two corresponding epipolar lines are very similar, so the search for corresponding epipolar lines can be limited only to lines having similar barcodes. The use of motion barcodes leads to increased speed, accuracy, and robustness in computing the epipolar geometry.
△ Less
Submitted 7 January, 2017; v1 submitted 25 June, 2015;
originally announced June 2015.
-
Event Retrieval Using Motion Barcodes
Authors:
Gil Ben-Artzi,
Michael Werman,
Shmuel Peleg
Abstract:
We introduce a simple and effective method for retrieval of videos showing a specific event, even when the videos of that event were captured from significantly different viewpoints. Appearance-based methods fail in such cases, as appearances change with large changes of viewpoints.
Our method is based on a pixel-based feature, "motion barcode", which records the existence/non-existence of motio…
▽ More
We introduce a simple and effective method for retrieval of videos showing a specific event, even when the videos of that event were captured from significantly different viewpoints. Appearance-based methods fail in such cases, as appearances change with large changes of viewpoints.
Our method is based on a pixel-based feature, "motion barcode", which records the existence/non-existence of motion as a function of time. While appearance, motion magnitude, and motion direction can vary greatly between disparate viewpoints, the existence of motion is viewpoint invariant. Based on the motion barcode, a similarity measure is developed for videos of the same event taken from very different viewpoints. This measure is robust to occlusions common under different viewpoints, and can be computed efficiently.
Event retrieval is demonstrated using challenging videos from stationary and hand held cameras.
△ Less
Submitted 12 May, 2015; v1 submitted 3 December, 2014;
originally announced December 2014.