Search | arXiv e-print repository

Representing local protein environments with atomistic foundation models

Authors: Meital Bojan, Sanketh Vedula, Advaith Maddipatla, Nadav Bojan Sellam, Federico Napoli, Paul Schanda, Alex M. Bronstein

Abstract: The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remai… ▽ More The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.00101 [pdf, other]

From Lab to Wrist: Bridging Metabolic Monitoring and Consumer Wearables for Heart Rate and Oxygen Consumption Modeling

Authors: Barak Gahtan, Sanketh Vedula, Gil Samuelly Leichtag, Einat Kodesh, Alex M. Bronstein

Abstract: Understanding physiological responses during running is critical for performance optimization, tailored training prescriptions, and athlete health management. We introduce a comprehensive framework -- what we believe to be the first capable of predicting instantaneous oxygen consumption (VO$_{2}$) trajectories exclusively from consumer-grade wearable data. Our approach employs two complementary ph… ▽ More Understanding physiological responses during running is critical for performance optimization, tailored training prescriptions, and athlete health management. We introduce a comprehensive framework -- what we believe to be the first capable of predicting instantaneous oxygen consumption (VO$_{2}$) trajectories exclusively from consumer-grade wearable data. Our approach employs two complementary physiological models: (1) accurate modeling of heart rate (HR) dynamics via a physiologically constrained ordinary differential equation (ODE) and neural Kalman filter, trained on over 3 million HR observations, achieving 1-second interval predictions with mean absolute errors as low as 2.81\,bpm (correlation 0.87); and (2) leveraging the principles of precise HR modeling, a novel VO$_{2}$ prediction architecture requiring only the initial second of VO$_{2}$ data for calibration, enabling robust, sequence-to-sequence metabolic demand estimation. Despite relying solely on smartwatch and chest-strap data, our method achieves mean absolute percentage errors of approximately 13\%, effectively capturing rapid physiological transitions and steady-state conditions across diverse running intensities. Our synchronized dataset, complemented by blood lactate measurements, further lays the foundation for future noninvasive metabolic zone identification. By embedding physiological constraints within modern machine learning, this framework democratizes advanced metabolic monitoring, bridging laboratory-grade accuracy and everyday accessibility, thus empowering both elite athletes and recreational fitness enthusiasts. △ Less

Submitted 30 April, 2025; originally announced May 2025.

arXiv:2503.17846 [pdf, other]

Smart Ankleband for Plug-and-Play Hand-Prosthetic Control

Authors: Dean Zadok, Oren Salzman, Alon Wolf, Alex M. Bronstein

Abstract: Building robotic prostheses requires the creation of a sensor-based interface designed to provide the robotic hand with the control required to perform hand gestures. Traditional Electromyography (EMG) based prosthetics and emerging alternatives often face limitations such as muscle-activation limitations, high cost, and complex-calibration procedures. In this paper, we present a low-cost robotic… ▽ More Building robotic prostheses requires the creation of a sensor-based interface designed to provide the robotic hand with the control required to perform hand gestures. Traditional Electromyography (EMG) based prosthetics and emerging alternatives often face limitations such as muscle-activation limitations, high cost, and complex-calibration procedures. In this paper, we present a low-cost robotic system composed of a smart ankleband for intuitive, calibration-free control of a robotic hand, and a robotic prosthetic hand that executes actions corresponding to leg gestures. The ankleband integrates an Inertial Measurement Unit (IMU) sensor with a lightweight temporal neural network to infer user-intended leg gestures from motion data. Our system represents a significant step towards higher adoption rates of robotic prostheses among arm amputees, as it enables one to operate a prosthetic hand using a low-cost, low-power, and calibration-free solution. To evaluate our work, we collected data from 10 subjects and tested our prototype ankleband with a robotic hand on an individual with upper-limb amputations. Our results demonstrate that this system empowers users to perform daily tasks more efficiently, requiring few compensatory movements. △ Less

Submitted 22 March, 2025; originally announced March 2025.

arXiv:2502.20333 [pdf, other]

T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration

Authors: Tamir Shor, Moti Freiman, Chaim Baskin, Alex Bronstein

Abstract: Cardiac T1 mapping provides critical quantitative insights into myocardial tissue composition, enabling the assessment of pathologies such as fibrosis, inflammation, and edema. However, the inherently dynamic nature of the heart imposes strict limits on acquisition times, making high-resolution T1 mapping a persistent challenge. Compressed sensing (CS) approaches have reduced scan durations by und… ▽ More Cardiac T1 mapping provides critical quantitative insights into myocardial tissue composition, enabling the assessment of pathologies such as fibrosis, inflammation, and edema. However, the inherently dynamic nature of the heart imposes strict limits on acquisition times, making high-resolution T1 mapping a persistent challenge. Compressed sensing (CS) approaches have reduced scan durations by undersampling k-space and reconstructing images from partial data, and recent studies show that jointly optimizing the undersampling patterns with the reconstruction network can substantially improve performance. Still, most current T1 mapping pipelines rely on static, hand-crafted masks that do not exploit the full acceleration and accuracy potential. In this work, we introduce T1-PILOT: an end-to-end method that explicitly incorporates the T1 signal relaxation model into the sampling-reconstruction framework to guide the learning of non-Cartesian trajectories, crossframe alignment, and T1 decay estimation. Through extensive experiments on the CMRxRecon dataset, T1-PILOT significantly outperforms several baseline strategies (including learned single-mask and fixed radial or golden-angle sampling schemes), achieving higher T1 map fidelity at greater acceleration factors. In particular, we observe consistent gains in PSNR and VIF relative to existing methods, along with marked improvements in delineating finer myocardial structures. Our results highlight that optimizing sampling trajectories in tandem with the physical relaxation model leads to both enhanced quantitative accuracy and reduced acquisition times. Code for reproducing all results will be made publicly available upon publication. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.20325 [pdf, other]

On Adversarial Attacks In Acoustic Drone Localization

Authors: Tamir Shor, Chaim Baskin, Alex Bronstein

Abstract: Multi-rotor aerial autonomous vehicles (MAVs, more widely known as "drones") have been generating increased interest in recent years due to their growing applicability in a vast and diverse range of fields (e.g., agriculture, commercial delivery, search and rescue). The sensitivity of visual-based methods to lighting conditions and occlusions had prompted growing study of navigation reliant on oth… ▽ More Multi-rotor aerial autonomous vehicles (MAVs, more widely known as "drones") have been generating increased interest in recent years due to their growing applicability in a vast and diverse range of fields (e.g., agriculture, commercial delivery, search and rescue). The sensitivity of visual-based methods to lighting conditions and occlusions had prompted growing study of navigation reliant on other modalities, such as acoustic sensing. A major concern in using drones in scale for tasks in non-controlled environments is the potential threat of adversarial attacks over their navigational systems, exposing users to mission-critical failures, security breaches, and compromised safety outcomes that can endanger operators and bystanders. While previous work shows impressive progress in acoustic-based drone localization, prior research in adversarial attacks over drone navigation only addresses visual sensing-based systems. In this work, we aim to compensate for this gap by supplying a comprehensive analysis of the effect of PGD adversarial attacks over acoustic drone localization. We furthermore develop an algorithm for adversarial perturbation recovery, capable of markedly diminishing the affect of such attacks in our setting. The code for reproducing all experiments will be released upon publication. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.20314 [pdf, other]

Adversarial Robustness in Parameter-Space Classifiers

Authors: Tamir Shor, Ethan Fetaya, Chaim Baskin, Alex Bronstein

Abstract: Implicit Neural Representations (INRs) have been recently garnering increasing interest in various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream tasks can be performed directly in the INR parameter-space. Doing so can substantially reduce the computational resources require… ▽ More Implicit Neural Representations (INRs) have been recently garnering increasing interest in various research fields, mainly due to their ability to represent large, complex data in a compact and continuous manner. Past work further showed that numerous popular downstream tasks can be performed directly in the INR parameter-space. Doing so can substantially reduce the computational resources required to process the represented data in their native domain. A major difficulty in using modern machine-learning approaches, is their high susceptibility to adversarial attacks, which have been shown to greatly limit the reliability and applicability of such methods in a wide range of settings. In this work, we show that parameter-space models trained for classification are inherently robust to adversarial attacks -- without the need of any robust training. To support our claims, we develop a novel suite of adversarial attacks targeting parameter-space classifiers, and furthermore analyze practical considerations of attacking parameter-space classifiers. △ Less

Submitted 18 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

arXiv:2502.09372 [pdf, other]

Inverse problems with experiment-guided AlphaFold

Authors: Advaith Maddipatla, Nadav Bojan Sellam, Meital Bojan, Sanketh Vedula, Paul Schanda, Ailie Marx, Alex M. Bronstein

Abstract: Proteins exist as a dynamic ensemble of multiple conformations, and these motions are often crucial for their functions. However, current structure prediction methods predominantly yield a single conformation, overlooking the conformational heterogeneity revealed by diverse experimental modalities. Here, we present a framework for building experiment-grounded protein structure generative models th… ▽ More Proteins exist as a dynamic ensemble of multiple conformations, and these motions are often crucial for their functions. However, current structure prediction methods predominantly yield a single conformation, overlooking the conformational heterogeneity revealed by diverse experimental modalities. Here, we present a framework for building experiment-grounded protein structure generative models that infer conformational ensembles consistent with measured experimental data. The key idea is to treat state-of-the-art protein structure predictors (e.g., AlphaFold3) as sequence-conditioned structural priors, and cast ensemble modeling as posterior inference of protein structures given experimental measurements. Through extensive real-data experiments, we demonstrate the generality of our method to incorporate a variety of experimental measurements. In particular, our framework uncovers previously unmodeled conformational heterogeneity from crystallographic densities, and generates high-accuracy NMR ensembles orders of magnitude faster than the status quo. Notably, we demonstrate that our ensembles outperform AlphaFold3 and sometimes better fit experimental data than publicly deposited structures to the Protein Data Bank (PDB). We believe that this approach will unlock building predictive models that fully embrace experimentally observed conformational diversity. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2412.13223 [pdf, other]

Generative modeling of protein ensembles guided by crystallographic electron densities

Authors: Sai Advaith Maddipatla, Nadav Bojan Sellam, Sanketh Vedula, Ailie Marx, Alex Bronstein

Abstract: Proteins are dynamic, adopting ensembles of conformations. The nature of this conformational heterogenity is imprinted in the raw electron density measurements obtained from X-ray crystallography experiments. Fitting an ensemble of protein structures to these measurements is a challenging, ill-posed inverse problem. We propose a non-i.i.d. ensemble guidance approach to solve this problem using exi… ▽ More Proteins are dynamic, adopting ensembles of conformations. The nature of this conformational heterogenity is imprinted in the raw electron density measurements obtained from X-ray crystallography experiments. Fitting an ensemble of protein structures to these measurements is a challenging, ill-posed inverse problem. We propose a non-i.i.d. ensemble guidance approach to solve this problem using existing protein structure generative models and demonstrate that it accurately recovers complicated multi-modal alternate protein backbone conformations observed in certain single crystal measurements. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2411.15648 [pdf, other]

Sample- and Parameter-Efficient Auto-Regressive Image Models

Authors: Elad Amrani, Leonid Karlinsky, Alex Bronstein

Abstract: We introduce XTRA, a vision model pre-trained with a novel auto-regressive objective that significantly enhances both sample and parameter efficiency compared to previous auto-regressive image models. Unlike contrastive or masked image modeling methods, which have not been demonstrated as having consistent scaling behavior on unbalanced internet data, auto-regressive vision models exhibit scalable… ▽ More We introduce XTRA, a vision model pre-trained with a novel auto-regressive objective that significantly enhances both sample and parameter efficiency compared to previous auto-regressive image models. Unlike contrastive or masked image modeling methods, which have not been demonstrated as having consistent scaling behavior on unbalanced internet data, auto-regressive vision models exhibit scalable and promising performance as model and dataset size increase. In contrast to standard auto-regressive models, XTRA employs a Block Causal Mask, where each Block represents k $\times$ k tokens rather than relying on a standard causal mask. By reconstructing pixel values block by block, XTRA captures higher-level structural patterns over larger image regions. Predicting on blocks allows the model to learn relationships across broader areas of pixels, enabling more abstract and semantically meaningful representations than traditional next-token prediction. This simple modification yields two key results. First, XTRA is sample-efficient. Despite being trained on 152$\times$ fewer samples (13.1M vs. 2B), XTRA ViT-H/14 surpasses the top-1 average accuracy of the previous state-of-the-art auto-regressive model across 15 diverse image recognition benchmarks. Second, XTRA is parameter-efficient. Compared to auto-regressive models trained on ImageNet-1k, XTRA ViT-B/16 outperforms in linear and attentive probing tasks, using 7-16$\times$ fewer parameters (85M vs. 1.36B/0.63B). △ Less

Submitted 21 March, 2025; v1 submitted 23 November, 2024; originally announced November 2024.

Comments: CVPR 2025 camera-ready with supplementary. For code see https://github.com/elad-amrani/xtra

arXiv:2410.19791 [pdf, other]

Data-Driven Cellular Network Selector for Vehicle Teleoperations

Authors: Barak Gahtan, Reuven Cohen, Alex M. Bronstein, Eli Shapira

Abstract: Remote control of robotic systems, also known as teleoperation, is crucial for the development of autonomous vehicle (AV) technology. It allows a remote operator to view live video from AVs and, in some cases, to make real-time decisions. The effectiveness of video-based teleoperation systems is heavily influenced by the quality of the cellular network and, in particular, its packet loss rate and… ▽ More Remote control of robotic systems, also known as teleoperation, is crucial for the development of autonomous vehicle (AV) technology. It allows a remote operator to view live video from AVs and, in some cases, to make real-time decisions. The effectiveness of video-based teleoperation systems is heavily influenced by the quality of the cellular network and, in particular, its packet loss rate and latency. To optimize these parameters, an AV can be connected to multiple cellular networks and determine in real time over which cellular network each video packet will be transmitted. We present an algorithm, called Active Network Selector (ANS), which uses a time series machine learning approach for solving this problem. We compare ANS to a baseline non-learning algorithm, which is used today in commercial systems, and show that ANS performs much better, with respect to both packet loss and packet latency. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: IEEE Network of Future 2024

arXiv:2410.17222 [pdf, other]

Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods

Authors: Tsachi Blau, Moshe Kimhi, Yonatan Belinkov, Alexander Bronstein, Chaim Baskin

Abstract: Fine-tuning Large Language Models (LLMs) typically involves updating at least a few billions of parameters. A more parameter-efficient approach is Prompt Tuning (PT), which updates only a few learnable tokens, and differently, In-Context Learning (ICL) adapts the model to a new task by simply including examples in the input without any training. When applying optimization-based methods, such as fi… ▽ More Fine-tuning Large Language Models (LLMs) typically involves updating at least a few billions of parameters. A more parameter-efficient approach is Prompt Tuning (PT), which updates only a few learnable tokens, and differently, In-Context Learning (ICL) adapts the model to a new task by simply including examples in the input without any training. When applying optimization-based methods, such as fine-tuning and PT for few-shot learning, the model is specifically adapted to the small set of training examples, whereas ICL leaves the model unchanged. This distinction makes traditional learning methods more prone to overfitting; in contrast, ICL is less sensitive to the few-shot scenario. While ICL is not prone to overfitting, it does not fully extract the information that exists in the training examples. This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks. We build on the ICL strategy of concatenating examples before the input, but we extend this by PT-like learning, refining the context embedding through iterative optimization to extract deeper insights from the training examples. We carefully modify specific context tokens, considering the unique structure of input and output formats. Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss. Moreover, we apply a projected gradient descent algorithm to keep token embeddings close to their original values, under the assumption that the user-provided data is inherently valuable. Our method has been shown to achieve superior accuracy across multiple classification tasks using various LLM models. △ Less

Submitted 22 October, 2024; originally announced October 2024.

arXiv:2410.06188 [pdf, other]

Beyond the Alphabet: Deep Signal Embedding for Enhanced DNA Clustering

Authors: Hadas Abraham, Barak Gahtan, Adir Kobovich, Orian Leitersdorf, Alex M. Bronstein, Eitan Yaakobi

Abstract: The emerging field of DNA storage employs strands of DNA bases (A/T/C/G) as a storage medium for digital information to enable massive density and durability. The DNA storage pipeline includes: (1) encoding the raw data into sequences of DNA bases; (2) synthesizing the sequences as DNA \textit{strands} that are stored over time as an unordered set; (3) sequencing the DNA strands to generate DNA \t… ▽ More The emerging field of DNA storage employs strands of DNA bases (A/T/C/G) as a storage medium for digital information to enable massive density and durability. The DNA storage pipeline includes: (1) encoding the raw data into sequences of DNA bases; (2) synthesizing the sequences as DNA \textit{strands} that are stored over time as an unordered set; (3) sequencing the DNA strands to generate DNA \textit{reads}; and (4) deducing the original data. The DNA synthesis and sequencing stages each generate several independent error-prone duplicates of each strand which are then utilized in the final stage to reconstruct the best estimate for the original strand. Specifically, the reads are first \textit{clustered} into groups likely originating from the same strand (based on their similarity to each other), and then each group approximates the strand that led to the reads of that group. This work improves the DNA clustering stage by embedding it as part of the DNA sequencing. Traditional DNA storage solutions begin after the DNA sequencing process generates discrete DNA reads (A/T/C/G), yet we identify that there is untapped potential in using the raw signals generated by the Nanopore DNA sequencing machine before they are discretized into bases, a process known as \textit{basecalling}, which is done using a deep neural network. We propose a deep neural network that clusters these signals directly, demonstrating superior accuracy, and reduced computation times compared to current approaches that cluster after basecalling. △ Less

Submitted 27 January, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.06140 [pdf, other]

Estimating the Number of HTTP/3 Responses in QUIC Using Deep Learning

Authors: Barak Gahtan, Robert J. Shahla, Reuven Cohen, Alex M. Bronstein

Abstract: QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals… ▽ More QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals server behavior, client-server interactions, and data transmission efficiency, which is crucial for various applications such as designing a load balancing solution and detecting HTTP/3 flood attacks. The proposed scheme transforms QUIC connection traces into image sequences and uses machine learning (ML) models, guided by a tailored loss function, to predict response counts. Evaluations on more than seven million images-derived from 100,000 traces collected across 44,000 websites over four months-achieve up to 97% accuracy in both known and unknown server settings and 92% accuracy on previously unseen complete QUIC traces. △ Less

Submitted 28 April, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05452 [pdf, other]

WearableMil: An End-to-End Framework for Military Activity Recognition and Performance Monitoring

Authors: Barak Gahtan, Shany Funk, Einat Kodesh, Itay Ketko, Tsvi Kuflik, Alex M. Bronstein

Abstract: Musculoskeletal injuries during military training significantly impact readiness, making prevention through activity monitoring crucial. While Human Activity Recognition (HAR) using wearable devices offers promising solutions, it faces challenges in processing continuous data streams and recognizing diverse activities without predefined sessions. This paper introduces an end-to-end framework for p… ▽ More Musculoskeletal injuries during military training significantly impact readiness, making prevention through activity monitoring crucial. While Human Activity Recognition (HAR) using wearable devices offers promising solutions, it faces challenges in processing continuous data streams and recognizing diverse activities without predefined sessions. This paper introduces an end-to-end framework for preprocessing, analyzing, and recognizing activities from wearable data in military training contexts. Using data from 135 soldiers wearing \textit{Garmin--55} smartwatches over six months with over 15 million minutes. We develop a hierarchical deep learning approach that achieves 93.8% accuracy in temporal splits and 83.8% in cross-user evaluation. Our framework addresses missing data through physiologically-informed methods, reducing unknown sleep states from 40.38% to 3.66%. We demonstrate that while longer time windows (45-60 minutes) improve basic state classification, they present trade-offs in detecting fine-grained activities. Additionally, we introduce an intuitive visualization system that enables real-time comparison of individual performance against group metrics across multiple physiological indicators. This approach to activity recognition and performance monitoring provides military trainers with actionable insights for optimizing training programs and preventing injuries. △ Less

Submitted 28 April, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.03728 [pdf, other]

Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis

Authors: Barak Gahtan, Robert J. Shahla, Alex M. Bronstein, Reuven Cohen

Abstract: The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, colle… ▽ More The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted QUIC traffic, achieving 97% accuracy using only observable packet features. By publicly releasing VisQUIC, we provide an open foundation for advancing encrypted traffic analysis, QUIC security research, and network monitoring. △ Less

Submitted 24 May, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

Comments: The dataset and the supplementary material can be provided upon request

arXiv:2409.12777 [pdf, other]

TEAM PILOT -- Learned Feasible Extendable Set of Dynamic MRI Acquisition Trajectories

Authors: Tamir Shor, Chaim Baskin, Alex Bronstein

Abstract: Dynamic Magnetic Resonance Imaging (MRI) is a crucial non-invasive method used to capture the movement of internal organs and tissues, making it a key tool for medical diagnosis. However, dynamic MRI faces a major challenge: long acquisition times needed to achieve high spatial and temporal resolution. This leads to higher costs, patient discomfort, motion artifacts, and lower image quality. Compr… ▽ More Dynamic Magnetic Resonance Imaging (MRI) is a crucial non-invasive method used to capture the movement of internal organs and tissues, making it a key tool for medical diagnosis. However, dynamic MRI faces a major challenge: long acquisition times needed to achieve high spatial and temporal resolution. This leads to higher costs, patient discomfort, motion artifacts, and lower image quality. Compressed Sensing (CS) addresses this problem by acquiring a reduced amount of MR data in the Fourier domain, based on a chosen sampling pattern, and reconstructing the full image from this partial data. While various deep learning methods have been developed to optimize these sampling patterns and improve reconstruction, they often struggle with slow optimization and inference times or are limited to specific temporal dimensions used during training. In this work, we introduce a novel deep-compressed sensing approach that uses 3D window attention and flexible, temporally extendable acquisition trajectories. Our method significantly reduces both training and inference times compared to existing approaches, while also adapting to different temporal dimensions during inference without requiring additional training. Tests with real data show that our approach outperforms current state-of-theart techniques. The code for reproducing all experiments will be made available upon acceptance of the paper. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2406.13507 [pdf, other]

Scalable unsupervised alignment of general metric and non-metric structures

Authors: Sanketh Vedula, Valentino Maiorca, Lorenzo Basile, Francesco Locatello, Alex Bronstein

Abstract: Aligning data from different domains is a fundamental problem in machine learning with broad applications across very different areas, most notably aligning experimental readouts in single-cell multiomics. Mathematically, this problem can be formulated as the minimization of disagreement of pair-wise quantities such as distances and is related to the Gromov-Hausdorff and Gromov-Wasserstein distanc… ▽ More Aligning data from different domains is a fundamental problem in machine learning with broad applications across very different areas, most notably aligning experimental readouts in single-cell multiomics. Mathematically, this problem can be formulated as the minimization of disagreement of pair-wise quantities such as distances and is related to the Gromov-Hausdorff and Gromov-Wasserstein distances. Computationally, it is a quadratic assignment problem (QAP) that is known to be NP-hard. Prior works attempted to solve the QAP directly with entropic or low-rank regularization on the permutation, which is computationally tractable only for modestly-sized inputs, and encode only limited inductive bias related to the domains being aligned. We consider the alignment of metric structures formulated as a discrete Gromov-Wasserstein problem and instead of solving the QAP directly, we propose to learn a related well-scalable linear assignment problem (LAP) whose solution is also a minimizer of the QAP. We also show a flexible extension of the proposed framework to general non-metric dissimilarities through differentiable ranks. We extensively evaluate our approach on synthetic and real datasets from single-cell multiomics and neural latent spaces, achieving state-of-the-art performance while being conceptually and computationally simple. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.00047 [pdf, ps, other]

A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Electronic Schrodinger Equation

Authors: Daniel Freedman, Eyal Rozenberg, Alex Bronstein

Abstract: A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as t… ▽ More A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature ("cusps") of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrodinger Equation. △ Less

Submitted 19 January, 2025; v1 submitted 28 May, 2024; originally announced June 2024.

Comments: AAAI 2025 Camera Ready Version

arXiv:2404.06589 [pdf, other]

Leveraging Latents for Efficient Thermography Classification and Segmentation

Authors: Tamir Shor, Chaim Baskin, Alex Bronstein

Abstract: Breast cancer is a prominent health concern worldwide, currently being the secondmost common and second-deadliest type of cancer in women. While current breast cancer diagnosis mainly relies on mammography imaging, in recent years the use of thermography for breast cancer imaging has been garnering growing popularity. Thermographic imaging relies on infrared cameras to capture body-emitted heat di… ▽ More Breast cancer is a prominent health concern worldwide, currently being the secondmost common and second-deadliest type of cancer in women. While current breast cancer diagnosis mainly relies on mammography imaging, in recent years the use of thermography for breast cancer imaging has been garnering growing popularity. Thermographic imaging relies on infrared cameras to capture body-emitted heat distributions. While these heat signatures have proven useful for computer-vision systems for accurate breast cancer segmentation and classification, prior work often relies on handcrafted feature engineering or complex architectures, potentially limiting the comparability and applicability of these methods. In this work, we present a novel algorithm for both breast cancer classification and segmentation. Rather than focusing efforts on manual feature and architecture engineering, our algorithm focuses on leveraging an informative, learned feature space, thus making our solution simpler to use and extend to other frameworks and downstream tasks, as well as more applicable to data-scarce settings. Our classification produces SOTA results, while we are the first work to produce segmentation regions studied in this paper. △ Less

Submitted 23 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2402.17289 [pdf, other]

Active propulsion noise shaping for multi-rotor aircraft localization

Authors: Gabriele Serussi, Tamir Shor, Tom Hirshberg, Chaim Baskin, Alex Bronstein

Abstract: Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benef… ▽ More Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benefits of lower system cost and energy footprint, which is especially important for micro aircraft. This paper proposes actively controlling and shaping the aircraft propulsion noise generated by the rotors to benefit localization tasks, rather than considering it a harmful nuisance. We present a neural network architecture for selfnoise-based localization in a known environment. We show that training it simultaneously with learning time-varying rotor phase modulation achieves accurate and robust localization. The proposed methods are evaluated using a computationally affordable simulation of MAV rotor noise in 2D acoustic environments that is fitted to real recordings of rotor pressure fields. △ Less

Submitted 29 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2311.12314 [pdf, other]

Demystifying Graph Sparsification Algorithms in Graph Properties Preservation

Authors: Yuhan Chen, Haojie Ye, Sanketh Vedula, Alex Bronstein, Ronald Dreslinski, Trevor Mudge, Nishil Talati

Abstract: Graph sparsification is a technique that approximates a given graph by a sparse graph with a subset of vertices and/or edges. The goal of an effective sparsification algorithm is to maintain specific graph properties relevant to the downstream task while minimizing the graph's size. Graph algorithms often suffer from long execution time due to the irregularity and the large real-world graph size.… ▽ More Graph sparsification is a technique that approximates a given graph by a sparse graph with a subset of vertices and/or edges. The goal of an effective sparsification algorithm is to maintain specific graph properties relevant to the downstream task while minimizing the graph's size. Graph algorithms often suffer from long execution time due to the irregularity and the large real-world graph size. Graph sparsification can be applied to greatly reduce the run time of graph algorithms by substituting the full graph with a much smaller sparsified graph, without significantly degrading the output quality. However, the interaction between numerous sparsifiers and graph properties is not widely explored, and the potential of graph sparsification is not fully understood. In this work, we cover 16 widely-used graph metrics, 12 representative graph sparsification algorithms, and 14 real-world input graphs spanning various categories, exhibiting diverse characteristics, sizes, and densities. We developed a framework to extensively assess the performance of these sparsification algorithms against graph metrics, and provide insights to the results. Our study shows that there is no one sparsifier that performs the best in preserving all graph properties, e.g. sparsifiers that preserve distance-related graph properties (eccentricity) struggle to perform well on Graph Neural Networks (GNN). This paper presents a comprehensive experimental study evaluating the performance of sparsification algorithms in preserving essential graph metrics. The insights inform future research in incorporating matching graph sparsification to graph algorithms to maximize benefits while minimizing quality degradation. Furthermore, we provide a framework to facilitate the future evaluation of evolving sparsification algorithms, graph metrics, and ever-growing graph data. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted at VLDB 2024

arXiv:2307.01037 [pdf, other]

Vector Quantile Regression on Manifolds

Authors: Marco Pegoraro, Sanketh Vedula, Aviv A. Rosenberg, Irene Tallini, Emanuele Rodolà, Alex M. Bronstein

Abstract: Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds… ▽ More Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments. △ Less

Submitted 7 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2304.06810 [pdf, other]

Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State Engineering

Authors: Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim, Joshua Foley-Comer, Sivan Trajtenberg-Mills, Sarika Mishra, Shashi Prabhakar, Ravindra Pratap, Daniel Freedman, Alex M. Bronstein, Ady Arie

Abstract: We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We de… ▽ More We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We demonstrate the effectiveness of our model through the design of structured nonlinear photonic crystals (NLPCs) and shaped pump beams; and show, theoretically and experimentally, how to generate maximally entangled states in the spatial degree of freedom. The learning of NLPC structures offers a promising new avenue for shaping and controlling arbitrary quantum states and enables all-optical coherent control of the generated states. We believe that this approach can readily be extended from bulky crystals to thin Metasurfaces and potentially applied to other quantum systems sharing a similar Hamiltonian structures, such as superfluids and superconductors. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: ICLR Machine Learning for Materials (ML4Materials) Workshop 2023. arXiv admin note: text overlap with arXiv:2112.05934

arXiv:2303.15409 [pdf, other]

Class-Conditioned Transformation for Enhanced Robust Image Classification

Authors: Tsachi Blau, Roy Ganz, Chaim Baskin, Michael Elad, Alex M. Bronstein

Abstract: Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models. Real-world applications are exposed to this vulnerability, as malicious attackers might exploit alternative threat models. In this work, we propose a novel test-time threat model agnostic algorithm that enhances Adversarial-Train… ▽ More Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models. Real-world applications are exposed to this vulnerability, as malicious attackers might exploit alternative threat models. In this work, we propose a novel test-time threat model agnostic algorithm that enhances Adversarial-Trained (AT) models. Our method operates through COnditional image transformation and DIstance-based Prediction (CODIP) and includes two main steps: First, we transform the input image into each dataset class, where the input image might be either clean or attacked. Next, we make a prediction based on the shortest transformed distance. The conditional transformation utilizes the perceptually aligned gradients property possessed by AT models and, as a result, eliminates the need for additional models or additional training. Moreover, it allows users to choose the desired balance between clean and robust accuracy without training. The proposed method achieves state-of-the-art results demonstrated through extensive experiments on various models, AT methods, datasets, and attack types. Notably, applying CODIP leads to substantial robust accuracy improvement of up to $+23\%$, $+20\%$, $+26\%$, and $+22\%$ on CIFAR10, CIFAR100, ImageNet and Flowers datasets, respectively. △ Less

Submitted 4 November, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.07150 [pdf, other]

Multi PILOT: Learned Feasible Multiple Acquisition Trajectories for Dynamic MRI

Authors: Tamir Shor, Tomer Weiss, Dor Noti, Alex Bronstein

Abstract: Dynamic Magnetic Resonance Imaging (MRI) is known to be a powerful and reliable technique for the dynamic imaging of internal organs and tissues, making it a leading diagnostic tool. A major difficulty in using MRI in this setting is the relatively long acquisition time (and, hence, increased cost) required for imaging in high spatio-temporal resolution, leading to the appearance of related motion… ▽ More Dynamic Magnetic Resonance Imaging (MRI) is known to be a powerful and reliable technique for the dynamic imaging of internal organs and tissues, making it a leading diagnostic tool. A major difficulty in using MRI in this setting is the relatively long acquisition time (and, hence, increased cost) required for imaging in high spatio-temporal resolution, leading to the appearance of related motion artifacts and decrease in resolution. Compressed Sensing (CS) techniques have become a common tool to reduce MRI acquisition time by subsampling images in the k-space according to some acquisition trajectory. Several studies have particularly focused on applying deep learning techniques to learn these acquisition trajectories in order to attain better image reconstruction, rather than using some predefined set of trajectories. To the best of our knowledge, learning acquisition trajectories has been only explored in the context of static MRI. In this study, we consider acquisition trajectory learning in the dynamic imaging setting. We design an end-to-end pipeline for the joint optimization of multiple per-frame acquisition trajectories along with a reconstruction neural network, and demonstrate improved image reconstruction quality in shorter acquisition times. The code for reproducing all experiments is accessible at https://github.com/tamirshor7/MultiPILOT. △ Less

Submitted 23 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted For MIDL 2023

arXiv:2211.01434 [pdf, other]

doi 10.1145/3308560.3316589

Spectral Graph Complexity

Authors: Anton Tsitsulin, Davide Mottin, Panagiotis Karras, Alex Bronstein, Emmanuel Müller

Abstract: We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space. We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: BigNet workshop at the Web conferece'2019

arXiv:2210.01423 [pdf, other]

Using Deep Reinforcement Learning for mmWave Real-Time Scheduling

Authors: Barak Gahtan, Reuven Cohen, Alex M. Bronstein, Gil Kedar

Abstract: We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions with… ▽ More We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA. △ Less

Submitted 18 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2208.08191 [pdf, other]

Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems

Authors: Dan Navon, Alex M. Bronstein

Abstract: Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in visi… ▽ More Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in vision tasks. In this paper, we analyze the expressive power of mlp-based architectures in modeling dependencies between multiple different inputs simultaneously, and show an exponential gap between the attention and the mlp-based mechanisms. Our results suggest a theoretical explanation for the mlp inability to compete with attention-based mechanisms in NLP problems, they also suggest that the performance gap in vision tasks may be due to the mlp relative weakness in modeling dependencies between multiple different locations, and that combining smart input permutations with mlp architectures may not be enough to close the performance gap alone. △ Less

Submitted 17 November, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.08170 [pdf, ps, other]

Random Search Hyper-Parameter Tuning: Expected Improvement Estimation and the Corresponding Lower Bound

Authors: Dan Navon, Alex M. Bronstein

Abstract: Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter… ▽ More Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter tuning and allow for a wiser allocation of a project's computational budget. In this paper, we establish an empirical estimate for the expected accuracy improvement from an additional iteration of hyperparameter search. Our results hold for any hyperparameter tuning method which is based on random search \cite{bergstra2012random} and samples hyperparameters from a fixed distribution. We bound our estimate with an error of $O\left(\sqrt{\frac{\log k}{k}}\right)$ w.h.p. where $k$ is the current number of iterations. To the best of our knowledge this is the first bound on the expected gain from an additional iteration of hyperparameter search. Finally, we demonstrate that the optimal estimate for the expected accuracy will still have an error of $\frac{1}{k}$. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2207.08089 [pdf, other]

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Authors: Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, Michael Elad

Abstract: Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) --… ▽ More Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2207.05729 [pdf, other]

Physical Passive Patch Adversarial Attacks on Visual Odometry Systems

Authors: Yaniv Nemcovsky, Matan Jacoby, Alex M. Bronstein, Chaim Baskin

Abstract: Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic… ▽ More Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model's exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model's output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data. A reference implementation of the proposed method and the reported experiments is provided at https://github.com/patchadversarialattacks/patchadversarialattacks. △ Less

Submitted 4 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted to ACCV 2022

arXiv:2205.14977 [pdf, other]

Fast Nonlinear Vector Quantile Regression

Authors: Aviv A. Rosenberg, Sanketh Vedula, Yaniv Romano, Alex M. Bronstein

Abstract: Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. R… ▽ More Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications. △ Less

Submitted 2 June, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: 35 pages, 15 figures, code: https://github.com/vistalab-technion/vqr

Journal ref: The Eleventh International Conference on Learning Representations (ICLR 2023)

arXiv:2202.05204 [pdf, other]

Towards Predicting Fine Finger Motions from Ultrasound Images via Kinematic Representation

Authors: Dean Zadok, Oren Salzman, Alon Wolf, Alex M. Bronstein

Abstract: A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze muscle states. While estimating finge… ▽ More A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze muscle states. While estimating finger gestures has been done in the past by detecting prominent gestures, we are interested in detection, or inference, done in the context of fine motions that evolve over time. Examples include motions occurring when performing fine and dexterous tasks such as keyboard typing or piano playing. We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees, as it has the potential to dramatically increase functionality in performing daily tasks. To this end, we present an end-to-end robotic system, which can successfully infer fine finger motions. This is achieved by modeling the hand as a robotic manipulator and using it as an intermediate representation to encode muscles' dynamics from a sequence of US images. We evaluated our method by collecting data from a group of subjects and demonstrating how it can be used to replay music played or text typed. To the best of our knowledge, this is the first study demonstrating these downstream tasks within an end-to-end system. △ Less

Submitted 28 September, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2112.05934 [pdf, other]

SPDCinv: Inverse Quantum-Optical Design of High-Dimensional Qudits

Authors: Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim, Joshua Foley-Comer, Sivan Trajtenberg-Mills, Daniel Freedman, Alex M. Bronstein, Ady Arie

Abstract: Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process… ▽ More Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Comments: A code supporting the paper has been published (https://github.com/EyalRozenberg1/SPDCinv)

arXiv:2111.08419 [pdf, other]

Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples

Authors: Nir Diamant, Nitsan Sandor, Alex M Bronstein

Abstract: Understating and controlling generative models' latent space is a complex task. In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space, for the purpose of editing synthesized and real-world data samples accordingly. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise ed… ▽ More Understating and controlling generative models' latent space is a complex task. In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space, for the purpose of editing synthesized and real-world data samples accordingly. We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits. We present an Autoencoder-based model that learns to encode the semantics of changes between images as a basis for editing new samples later on, achieving precise desired results - example shown in Fig. 1. While previous editing methods rely on a known structure of latent spaces (e.g., linearity of some semantics in StyleGAN), our method inherently does not require any structural constraints. We demonstrate our method in the domain of facial imagery: editing different expressions, poses, and lighting attributes, achieving state-of-the-art results. △ Less

Submitted 17 November, 2021; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 8 pages, 13 figures

arXiv:2110.03218 [pdf, other]

Joint optimization of system design and reconstruction in MIMO radar imaging

Authors: Tomer Weiss, Nissim Peretz, Sanketh Vedula, Arie Feuer, Alex Bronstein

Abstract: Multiple-input multiple-output (MIMO) radar is one of the leading depth sensing modalities. However, the usage of multiple receive channels lead to relative high costs and prevent the penetration of MIMOs in many areas such as the automotive industry. Over the last years, few studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MIMO radars, however the… ▽ More Multiple-input multiple-output (MIMO) radar is one of the leading depth sensing modalities. However, the usage of multiple receive channels lead to relative high costs and prevent the penetration of MIMOs in many areas such as the automotive industry. Over the last years, few studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MIMO radars, however these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of simultaneous learning-based design of the acquisition and reconstruction schemes, manifesting significant improvement in the reconstruction quality. Inspired by these successes, in this work, we propose to learn MIMO acquisition parameters in the form of receive (Rx) antenna elements locations jointly with an image neural-network based reconstruction. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using our learned acquisition parameters with and without the neural-network reconstruction. △ Less

Submitted 7 October, 2021; originally announced October 2021.

arXiv:2106.05729 [pdf, ps, other]

GRASP: Graph Alignment through Spectral Signatures

Authors: Judith Hermanns, Anton Tsitsulin, Marina Munkhoeva, Alex Bronstein, Davide Mottin, Panagiotis Karras

Abstract: What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which… ▽ More What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which only graph structures are given. Other proposals translate the problem to one of aligning node embeddings, yet, by doing so, provide only a single-scale view of the graph. In this paper, we transfer the shape-analysis concept of functional maps from the continuous to the discrete case, and treat the graph alignment problem as a special case of the problem of finding a mapping between functions on graphs. We present GRASP, a method that first establishes a correspondence between functions derived from Laplacian matrix eigenvectors, which capture multiscale structural characteristics, and then exploits this correspondence to align nodes. Our experimental study, featuring noise levels higher than anything used in previous studies, shows that GRASP outperforms state-of-the-art methods for graph alignment across noise levels and graph types. △ Less

Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted to APWeb-WAIM

arXiv:2104.09829 [pdf, other]

Detector-Free Weakly Supervised Grounding by Separation

Authors: Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

Abstract: Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de… ▽ More Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to $8.5\%$ over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above $7\%$) over the detector-based approaches for WSG. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2103.13646 [pdf, other]

doi 10.1109/WACV51458.2022.00046

Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

Authors: Evgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson, Alex M. Bronstein, Or Litany

Abstract: The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D),… ▽ More The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage's susceptibility to noise level, shortening its duration, and improving extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning. Code for reproducing our experiments is available at https://github.com/ContrastToDivide/C2D △ Less

Submitted 20 October, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.10994 [pdf, other]

Self-Supervised Classification Network

Authors: Elad Amrani, Leonid Karlinsky, Alex Bronstein

Abstract: We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a… ▽ More We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a mathematically motivated variant of the cross-entropy loss that has a uniform prior asserted on the predicted labels. In our theoretical analysis, we prove that degenerate solutions are not in the set of optimal solutions of our approach. Self-Classifier is simple to implement and scalable. Unlike other popular unsupervised classification and contrastive representation learning approaches, it does not require any form of pre-training, expectation-maximization, pseudo-labeling, external clustering, a second network, stop-gradient operation, or negative pairs. Despite its simplicity, our approach sets a new state of the art for unsupervised classification of ImageNet; and even achieves comparable to state-of-the-art results for unsupervised representation learning. Code is available at https://github.com/elad-amrani/self-classifier. △ Less

Submitted 12 July, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: ECCV 2022 camera-ready with supplementary

arXiv:2102.10344 [pdf, other]

doi 10.1364/CLEO_QELS.2021.FM1N.7

Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals

Authors: Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim, Sivan Trajtenberg-Mills, Daniel Freedman, Alex M. Bronstein, Ady Arie

Abstract: We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs. We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: A supporting code will be published shortly

Journal ref: Conference on Lasers and Electro-Optics, Optical Society of America (2021), paper FM1N.7

arXiv:2101.10203 [pdf, other]

doi 10.1109/OJSP.2023.3239819

ISP Distillation

Authors: Eli Schwartz, Alex Bronstein, Raja Giryes

Abstract: Nowadays, many of the images captured are `observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera \ans{Image Signal Processor (ISP)}. However, the camera ISP is optimized for producing visually pleasing images for hum… ▽ More Nowadays, many of the images captured are `observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera \ans{Image Signal Processor (ISP)}. However, the camera ISP is optimized for producing visually pleasing images for human observers and not for machines. Therefore, one may spare the ISP compute time and apply vision models directly to RAW images. Yet, it has been shown that training such models directly on RAW images results in a performance drop. To mitigate this drop, we use a RAW and RGB image pairs dataset, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation is significantly better than models trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead. △ Less

Submitted 4 May, 2023; v1 submitted 25 January, 2021; originally announced January 2021.

Journal ref: IEEE Open Journal of Signal Processing 2023

arXiv:2012.08951 [pdf, other]

SimuGAN: Unsupervised forward modeling and optimal design of a LIDAR Camera

Authors: Nir Diamant, Tal Mund, Ohad Menashe, Aviad Zabatani, Alex M. Bronstein

Abstract: Energy-saving LIDAR camera for short distances estimates an object's distance using temporally intensity-coded laser light pulses and calculates the maximum correlation with the back-scattered pulse. Though on low power, the backs-scattered pulse is noisy and unstable, which leads to inaccurate and unreliable depth estimation. To address this problem, we use GANs (Generative Adversarial Networ… ▽ More Energy-saving LIDAR camera for short distances estimates an object's distance using temporally intensity-coded laser light pulses and calculates the maximum correlation with the back-scattered pulse. Though on low power, the backs-scattered pulse is noisy and unstable, which leads to inaccurate and unreliable depth estimation. To address this problem, we use GANs (Generative Adversarial Networks), which are two neural networks that can learn complicated class distributions through an adversarial process. We learn the LIDAR camera's hidden properties and behavior, creating a novel, fully unsupervised forward model that simulates the camera. Then, we use the model's differentiability to explore the camera parameter space and optimize those parameters in terms of depth, accuracy, and stability. To achieve this goal, we also propose a new custom loss function designated to the back-scattered code distribution's weaknesses and its circular behavior. The results are demonstrated on both synthetic and real data. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2012.04515 [pdf, other]

Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times

Authors: Omer Dahary, Matan Jacoby, Alex M. Bronstein

Abstract: Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off betwee… ▽ More Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst's exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method's advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst on both synthetic and real data. △ Less

Submitted 23 November, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: CVPR 2021

arXiv:2011.13076 [pdf, other]

doi 10.1111/cgf.12970

Non-Rigid Puzzles

Authors: Or Litany, Emanuele Rodolà, Alex Bronstein, Michael Bronstein, Daniel Cremers

Abstract: Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigi… ▽ More Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigid multi-part shape matching algorithm. We assume to be given a reference shape and its multiple parts undergoing a non-rigid deformation. Each of these query parts can be additionally contaminated by clutter, may overlap with other parts, and there might be missing parts or redundant ones. Our method simultaneously solves for the segmentation of the reference model, and for a dense correspondence to (subsets of) the parts. Experimental results on synthetic as well as real scans demonstrate the effectiveness of our method in dealing with this challenging matching scenario. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Journal ref: Computer Graphics Forum, Volume 35, Issue 5, August 2016

arXiv:2008.10312 [pdf, other]

Self-Supervised Learning for Large-Scale Unsupervised Image Clustering

Authors: Evgenii Zheltonozhskii, Chaim Baskin, Alex M. Bronstein, Avi Mendelson

Abstract: Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument f… ▽ More Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting. In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning. The code is available at https://github.com/Randl/kmeans_selfsuper △ Less

Submitted 9 November, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: accepted to NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice

arXiv:2008.04808 [pdf, other]

3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI

Authors: Jonathan Alush-Aben, Linor Ackerman-Schraier, Tomer Weiss, Sanketh Vedula, Ortal Senouf, Alex Bronstein

Abstract: Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today's diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantl… ▽ More Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today's diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantly accelerated acquisition. However, the benefit of compressed sensing has not been fully exploited; most of the sampling densities obtained through CS do not produce a trajectory that obeys the stringent constraints of the MRI machine imposed in practice. Inspired by recent success of deep learning based approaches for image reconstruction and ideas from computational imaging on learning-based design of imaging systems, we introduce 3D FLAT, a novel protocol for data-driven design of 3D non-Cartesian accelerated trajectories in MRI. Our proposal leverages the entire 3D k-space to simultaneously learn a physically feasible acquisition trajectory with a reconstruction method. Experimental results, performed as a proof-of-concept, suggest that 3D FLAT achieves higher image quality for a given readout time compared to standard trajectories such as radial, stack-of-stars, or 2D learned trajectories (trajectories that evolve only in the 2D plane while fully sampling along the third dimension). Furthermore, we demonstrate evidence supporting the significant benefit of performing MRI acquisitions using non-Cartesian 3D trajectories over 2D non-Cartesian trajectories acquired slice-wise. △ Less

Submitted 11 August, 2020; originally announced August 2020.

arXiv:2006.01035 [pdf, other]

Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-lapse Imaging

Authors: David H. Silver, Martin Feder, Yael Gold-Zamir, Avital L. Polsky, Shahar Rosentraub, Efrat Shachor, Adi Weinberger, Pavlo Mazur, Valery D. Zukin, Alex M. Bronstein

Abstract: The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the pati… ▽ More The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the patient, a process typically conducted manually and without any universally accepted and standardized criteria. In this paper we describe a novel data-driven system trained to directly predict embryo implantation probability from embryogenesis time-lapse imaging videos. Using retrospectively collected videos from 272 embryos, we demonstrate that, when compared to an external panel of embryologists, our algorithm results in a 12% increase of positive predictive value and a 29% increase of negative predictive value. △ Less

Submitted 2 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Report number: MIDL/2020/ExtendedAbstract/TujK1uTkTP

arXiv:2004.08906 [pdf, other]

doi 10.3390/su13020717

HCM: Hardware-Aware Complexity Metric for Neural Network Architectures

Authors: Alex Karbachevsky, Chaim Baskin, Evgenii Zheltonozhskii, Yevgeny Yermolin, Freddy Gabbay, Alex M. Bronstein, Avi Mendelson

Abstract: Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that ne… ▽ More Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions. This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages. △ Less

Submitted 26 April, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

arXiv:2003.10895 [pdf, other]

Do We Need Depth in State-Of-The-Art Face Authentication?

Authors: Amir Livne, Alex Bronstein, Ron Kimmel, Ziv Aviv, Shahaf Grofit

Abstract: Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly c… ▽ More Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly compute the facial surface or depth map. The raw face stereo images along with the location in the image from which the face is extracted allow the proposed CNN to improve the recognition task while avoiding the need to explicitly handle the geometric structure of the face. This way, we keep the simplicity and cost efficiency of identity authentication from a single image, while enjoying the benefits of geometric data without explicitly reconstructing it. We demonstrate that the suggested method outperforms both existing single-image and explicit depth based methods on large-scale benchmarks, and even capable of recognize spoofing attacks. We also provide an ablation study that shows that the suggested method uses the face locations in the left and right images to encode informative features that improve the overall performance. △ Less

Submitted 10 November, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

Showing 1–50 of 128 results for author: Bronstein, A