-
Deep BI-RADS Network for Improved Cancer Detection from Mammograms
Authors:
Gil Ben-Artzi,
Feras Daragma,
Shahar Mahpod
Abstract:
While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these…
▽ More
While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these expert-derived features. To address this question, we introduce a novel multi-modal approach that combines textual BI-RADS lesion descriptors with visual mammogram content. Our method employs iterative attention layers to effectively fuse these different modalities, significantly improving classification performance over image-only models. Experiments on the CBIS-DDSM dataset demonstrate substantial improvements across all metrics, demonstrating the contribution of handcrafted features to end-to-end.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
CTrGAN: Cycle Transformers GAN for Gait Transfer
Authors:
Shahar Mahpod,
Noam Gaash,
Hay Hoffman,
Gil Ben-Artzi
Abstract:
We introduce a novel approach for gait transfer from unconstrained videos in-the-wild. In contrast to motion transfer, the objective here is not to imitate the source's motions by the target, but rather to replace the walking source with the target, while transferring the target's typical gait. Our approach can be trained only once with multiple sources and is able to transfer the gait of the targ…
▽ More
We introduce a novel approach for gait transfer from unconstrained videos in-the-wild. In contrast to motion transfer, the objective here is not to imitate the source's motions by the target, but rather to replace the walking source with the target, while transferring the target's typical gait. Our approach can be trained only once with multiple sources and is able to transfer the gait of the target from unseen sources, eliminating the need for retraining for each new source independently. Furthermore, we propose a novel metrics for gait transfer based on gait recognition models that enable to quantify the quality of the transferred gait, and show that existing techniques yield a discrepancy that can be easily detected.
We introduce Cycle Transformers GAN (CTrGAN), that consist of a decoder and encoder, both Transformers, where the attention is on the temporal domain between complete images rather than the spatial domain between patches. Using a widely-used gait recognition dataset, we demonstrate that our approach is capable of producing over an order of magnitude more realistic personalized gaits than existing methods, even when used with sources that were not available during training. As part of our solution, we present a detector that determines whether a video is real or generated by our model.
△ Less
Submitted 7 January, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Auto-ML Deep Learning for Rashi Scripts OCR
Authors:
Shahar Mahpod,
Yosi Keller
Abstract:
In this work we propose an OCR scheme for manuscripts printed in Rashi font that is an ancient Hebrew font and corresponding dialect used in religious Jewish literature, for more than 600 years. The proposed scheme utilizes a convolution neural network (CNN) for visual inference and Long-Short Term Memory (LSTM) to learn the Rashi scripts dialect. In particular, we derive an AutoML scheme to optim…
▽ More
In this work we propose an OCR scheme for manuscripts printed in Rashi font that is an ancient Hebrew font and corresponding dialect used in religious Jewish literature, for more than 600 years. The proposed scheme utilizes a convolution neural network (CNN) for visual inference and Long-Short Term Memory (LSTM) to learn the Rashi scripts dialect. In particular, we derive an AutoML scheme to optimize the CNN architecture, and a book-specific CNN training to improve the OCR accuracy. The proposed scheme achieved an accuracy of more than 99.8% using a dataset of more than 3M annotated letters from the Responsa Project dataset.
△ Less
Submitted 22 February, 2020; v1 submitted 3 November, 2018;
originally announced November 2018.
-
Facial Landmarks Localization using Cascaded Neural Networks
Authors:
Shahar Mahpod,
Rig Das,
Emanuele Maiorana,
Yosi Keller,
Patrizio Campisi
Abstract:
The accurate localization of facial landmarks is at the core of face analysis tasks, such as face recognition and facial expression analysis, to name a few. In this work, we propose a novel localization approach based on a deep learning architecture that utilizes cascaded subnetworks with convolutional neural network units. The cascaded units of the first subnetwork estimate heatmap-based encoding…
▽ More
The accurate localization of facial landmarks is at the core of face analysis tasks, such as face recognition and facial expression analysis, to name a few. In this work, we propose a novel localization approach based on a deep learning architecture that utilizes cascaded subnetworks with convolutional neural network units. The cascaded units of the first subnetwork estimate heatmap-based encodings of the landmarks locations, while the cascaded units of the second subnetwork receive as input the output of the corresponding heatmap estimation units, and refine them through regression. The proposed scheme is experimentally shown to compare favorably with contemporary state-of-the-art schemes, especially when applied to images depicting challenging localization conditions.
△ Less
Submitted 19 July, 2021; v1 submitted 3 May, 2018;
originally announced May 2018.