-
Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Authors:
Yotam Alexander,
Yonatan Slutzky,
Yuval Ran-Milo,
Nadav Cohen
Abstract:
Conventional wisdom attributes the mysterious generalization abilities of overparameterized neural networks to gradient descent (and its variants). The recent volume hypothesis challenges this view: it posits that these generalization abilities persist even when gradient descent is replaced by Guess & Check (G&C), i.e., by drawing weight settings until one that fits the training data is found. The…
▽ More
Conventional wisdom attributes the mysterious generalization abilities of overparameterized neural networks to gradient descent (and its variants). The recent volume hypothesis challenges this view: it posits that these generalization abilities persist even when gradient descent is replaced by Guess & Check (G&C), i.e., by drawing weight settings until one that fits the training data is found. The validity of the volume hypothesis for wide and deep neural networks remains an open question. In this paper, we theoretically investigate this question for matrix factorization (with linear and non-linear activation)--a common testbed in neural network theory. We first prove that generalization under G&C deteriorates with increasing width, establishing what is, to our knowledge, the first case where G&C is provably inferior to gradient descent. Conversely, we prove that generalization under G&C improves with increasing depth, revealing a stark contrast between wide and deep networks, which we further validate empirically. These findings suggest that even in simple settings, there may not be a simple answer to the question of whether neural networks need gradient descent to generalize well.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Reinforcement Learning for Hanabi
Authors:
Nina Cohen,
Kordel K. France
Abstract:
Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and a…
▽ More
Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and balancing of play types compared to tabular agents. Specifically, tabular Expected SARSA and deep Q-Learning agents showed the best performance.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
When Are Concepts Erased From Diffusion Models?
Authors:
Kevin Lu,
Nicky Kriplani,
Rohit Gandikota,
Minh Pham,
David Bau,
Chinmay Hegde,
Niv Cohen
Abstract:
Concept erasure, the ability to selectively prevent a model from generating specific concepts, has attracted growing interest, with various approaches emerging to address the challenge. However, it remains unclear how thoroughly these methods erase the target concept. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) reducing the likelihood of generatin…
▽ More
Concept erasure, the ability to selectively prevent a model from generating specific concepts, has attracted growing interest, with various approaches emerging to address the challenge. However, it remains unclear how thoroughly these methods erase the target concept. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) reducing the likelihood of generating the target concept, and (ii) interfering with the model's internal guidance mechanisms. To thoroughly assess whether a concept has been truly erased from the model, we introduce a suite of independent evaluations. Our evaluation framework includes adversarial attacks, novel probing techniques, and analysis of the model's alternative generations in place of the erased concept. Our results shed light on the tension between minimizing side effects and maintaining robustness to adversarial prompts. Broadly, our work underlines the importance of comprehensive evaluation for erasure in diffusion models.
△ Less
Submitted 30 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
Multi-parameter constraints on empirical infrasound period-yield relations for bolides and implications for planetary defense
Authors:
Elizabeth A. Silber,
Josep M. Trigo-Rodríguez,
Iyare Oseghae,
Eloy Peña Asensio,
Mark Boslough,
Rodney Whitaker,
Christoph Pilger,
Philip Lubin,
Vedant Sawal,
Claus Hetzer,
Randy Longenbaugh,
Peter Jenniskens,
Brin Bailey,
Esther Mas Sanz,
Patrick Hupe,
Alexander N. Cohen,
Thom R. Edwards,
Sasha Egan,
Reynold E. Silber,
Summer Czarnowski,
Miro Ronac Giannone
Abstract:
How effective are methods for estimating bolide energies from infrasound signal period-yield relationships? A single global period-energy relation can obscure significant variability introduced by parameters such as the atmospheric Doppler wind profile and the bolide's energy deposition profile as a function of altitude. Bolide speed, entry angle, burst altitude, and multi-episode fragmentation al…
▽ More
How effective are methods for estimating bolide energies from infrasound signal period-yield relationships? A single global period-energy relation can obscure significant variability introduced by parameters such as the atmospheric Doppler wind profile and the bolide's energy deposition profile as a function of altitude. Bolide speed, entry angle, burst altitude, and multi-episode fragmentation all may play a role in defining the detected period of the shockwave. By leveraging bolide light curve data from the Center for Near Earth Object Studies (CNEOS), we re-examined the period-energy relation as a function of these parameters. Through a bootstrap approach, we show that various event subsets can deviate from widely cited period-energy models and we identify which specific conditions most strongly reshape the period-energy scaling. The results define both the fidelity and reliability of period-energy relations when no additional data beyond the infrasound record is available and improve the outcome when supporting data from bolide trajectories and light curves are included. Ultimately, these findings expand the scope of earlier models, providing a nuanced and robust framework for infrasound-only yield estimation under a range of bolide scenarios.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
Authors:
Anubhav Jain,
Yuya Kobayashi,
Naoki Murata,
Yuhta Takida,
Takashi Shibuya,
Yuki Mitsufuji,
Niv Cohen,
Nasir Memon,
Julian Togelius
Abstract:
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diff…
▽ More
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Diffusion-Driven Inertial Generated Data for Smartphone Location Classification
Authors:
Noa Cohen,
Rotem Dror,
Itzik Klein
Abstract:
Despite the crucial role of inertial measurements in motion tracking and navigation systems, the time-consuming and resource-intensive nature of collecting extensive inertial data has hindered the development of robust machine learning models in this field. In recent years, diffusion models have emerged as a revolutionary class of generative models, reshaping the landscape of artificial data gener…
▽ More
Despite the crucial role of inertial measurements in motion tracking and navigation systems, the time-consuming and resource-intensive nature of collecting extensive inertial data has hindered the development of robust machine learning models in this field. In recent years, diffusion models have emerged as a revolutionary class of generative models, reshaping the landscape of artificial data generation. These models surpass generative adversarial networks and other state-of-the-art approaches to complex tasks. In this work, we propose diffusion-driven specific force-generated data for smartphone location recognition. We provide a comprehensive evaluation methodology by comparing synthetic and real recorded specific force data across multiple metrics. Our results demonstrate that our diffusion-based generative model successfully captures the distinctive characteristics of specific force signals across different smartphone placement conditions. Thus, by creating diverse, realistic synthetic data, we can reduce the burden of extensive data collection while providing high-quality training data for machine learning models.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Transformer-Based Robust Underwater Inertial Navigation in Prolonged Doppler Velocity Log Outages
Authors:
Zeev Yampolsky,
Nadav Cohen,
Itzik Klein
Abstract:
Autonomous underwater vehicles (AUV) have a wide variety of applications in the marine domain, including exploration, surveying, and mapping. Their navigation systems rely heavily on fusing data from inertial sensors and a Doppler velocity log (DVL), typically via nonlinear filtering. The DVL estimates the AUV's velocity vector by transmitting acoustic beams to the seabed and analyzing the Doppler…
▽ More
Autonomous underwater vehicles (AUV) have a wide variety of applications in the marine domain, including exploration, surveying, and mapping. Their navigation systems rely heavily on fusing data from inertial sensors and a Doppler velocity log (DVL), typically via nonlinear filtering. The DVL estimates the AUV's velocity vector by transmitting acoustic beams to the seabed and analyzing the Doppler shift from the reflected signals. However, due to environmental challenges, DVL beams can deflect or fail in real-world settings, causing signal outages. In such cases, the AUV relies solely on inertial data, leading to accumulated navigation errors and mission terminations. To cope with these outages, we adopted ST-BeamsNet, a deep learning approach that uses inertial readings and prior DVL data to estimate AUV velocity during isolated outages. In this work, we extend ST-BeamsNet to address prolonged DVL outages and evaluate its impact within an extended Kalman filter framework. Experiments demonstrate that the proposed framework improves velocity RMSE by up to 63% and reduces final position error by up to 95% compared to pure inertial navigation. This is in scenarios involving up to 50 seconds of complete DVL outage.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Enhancing Underwater Navigation through Cross-Correlation-Aware Deep INS/DVL Fusion
Authors:
Nadav Cohen,
Itzik Klein
Abstract:
The accurate navigation of autonomous underwater vehicles critically depends on the precision of Doppler velocity log (DVL) velocity measurements. Recent advancements in deep learning have demonstrated significant potential in improving DVL outputs by leveraging spatiotemporal dependencies across multiple sensor modalities. However, integrating these estimates into model-based filters, such as the…
▽ More
The accurate navigation of autonomous underwater vehicles critically depends on the precision of Doppler velocity log (DVL) velocity measurements. Recent advancements in deep learning have demonstrated significant potential in improving DVL outputs by leveraging spatiotemporal dependencies across multiple sensor modalities. However, integrating these estimates into model-based filters, such as the extended Kalman filter, introduces statistical inconsistencies, most notably, cross-correlations between process and measurement noise. This paper addresses this challenge by proposing a cross-correlation-aware deep INS/DVL fusion framework. Building upon BeamsNet, a convolutional neural network designed to estimate AUV velocity using DVL and inertial data, we integrate its output into a navigation filter that explicitly accounts for the cross-correlation induced between the noise sources. This approach improves filter consistency and better reflects the underlying sensor error structure. Evaluated on two real-world underwater trajectories, the proposed method outperforms both least squares and cross-correlation-neglecting approaches in terms of state uncertainty. Notably, improvements exceed 10% in velocity and misalignment angle confidence metrics. Beyond demonstrating empirical performance, this framework provides a theoretically principled mechanism for embedding deep learning outputs within stochastic filters.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
SEAL: Semantic Aware Image Watermarking
Authors:
Kasra Arabi,
R. Teal Witter,
Chinmay Hegde,
Niv Cohen
Abstract:
Generative models have rapidly evolved to generate realistic outputs. However, their synthetic outputs increasingly challenge the clear distinction between natural and AI-generated content, necessitating robust watermarking techniques. Watermarks are typically expected to preserve the integrity of the target image, withstand removal attempts, and prevent unauthorized replication onto unrelated ima…
▽ More
Generative models have rapidly evolved to generate realistic outputs. However, their synthetic outputs increasingly challenge the clear distinction between natural and AI-generated content, necessitating robust watermarking techniques. Watermarks are typically expected to preserve the integrity of the target image, withstand removal attempts, and prevent unauthorized replication onto unrelated images. To address this need, recent methods embed persistent watermarks into images produced by diffusion models using the initial noise. Yet, to do so, they either distort the distribution of generated images or rely on searching through a long dictionary of used keys for detection.
In this paper, we propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark, enabling a distortion-free watermark that can be verified without requiring a database of key patterns. Instead, the key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Furthermore, conditioning the watermark detection on the original image content improves robustness against forgery attacks. To demonstrate that, we consider two largely overlooked attack strategies: (i) an attacker extracting the initial noise and generating a novel image with the same pattern; (ii) an attacker inserting an unrelated (potentially harmful) object into a watermarked image, possibly while preserving the watermark. We empirically validate our method's increased robustness to these attacks. Taken together, our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.
△ Less
Submitted 9 April, 2025; v1 submitted 15 March, 2025;
originally announced March 2025.
-
Performance Analysis of Spatial and Temporal Learning Networks in the Presence of DVL Noise
Authors:
Rajini Makam,
Nadav Cohen,
Sumukh Shadakshari,
Srinivasa Puranika Bhatta,
Itzik Klein,
Suresh Sundaram
Abstract:
Navigation is a critical aspect of autonomous underwater vehicles (AUVs) operating in complex underwater environments. Since global navigation satellite system (GNSS) signals are unavailable underwater, navigation relies on inertial sensing, which tends to accumulate errors over time. To mitigate this, the Doppler velocity log (DVL) plays a crucial role in determining navigation accuracy. In this…
▽ More
Navigation is a critical aspect of autonomous underwater vehicles (AUVs) operating in complex underwater environments. Since global navigation satellite system (GNSS) signals are unavailable underwater, navigation relies on inertial sensing, which tends to accumulate errors over time. To mitigate this, the Doppler velocity log (DVL) plays a crucial role in determining navigation accuracy. In this paper, we compare two neural network models: an adapted version of BeamsNet, based on a one-dimensional convolutional neural network, and a Spectrally Normalized Memory Neural Network (SNMNN). The former focuses on extracting spatial features, while the latter leverages memory and temporal features to provide more accurate velocity estimates while handling biased and noisy DVL data. The proposed approaches were trained and tested on real AUV data collected in the Mediterranean Sea. Both models are evaluated in terms of accuracy and estimation certainty and are benchmarked against the least squares (LS) method, the current model-based approach. The results show that the neural network models achieve over a 50% improvement in RMSE for the estimation of the AUV velocity, with a smaller standard deviation.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
SolidMark: Evaluating Image Memorization in Generative Models
Authors:
Nicky Kriplani,
Minh Pham,
Gowthami Somepalli,
Chinmay Hegde,
Niv Cohen
Abstract:
Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not.
This paper begins with a comprehensive exploration of issues surrounding memorizat…
▽ More
Recent works have shown that diffusion models are able to memorize training images and emit them at generation time. However, the metrics used to evaluate memorization and its mitigation techniques suffer from dataset-dependent biases and struggle to detect whether a given specific image has been memorized or not.
This paper begins with a comprehensive exploration of issues surrounding memorization metrics in diffusion models. Then, to mitigate these issues, we introduce $\rm \style{font-variant: small-caps}{SolidMark}$, a novel evaluation method that provides a per-image memorization score. We then re-evaluate existing memorization mitigation techniques. We also show that $\rm \style{font-variant: small-caps}{SolidMark}$ is capable of evaluating fine-grained pixel-level memorization. Finally, we release a variety of models based on $\rm \style{font-variant: small-caps}{SolidMark}$ to facilitate further research for understanding memorization phenomena in generative models. All of our code is available at https://github.com/NickyDCFP/SolidMark.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Gaussian Process Regression for Improved Underwater Navigation
Authors:
Nadav Cohen,
Itzik Klein
Abstract:
Accurate underwater navigation is a challenging task due to the absence of global navigation satellite system signals and the reliance on inertial navigation systems that suffer from drift over time. Doppler velocity logs (DVLs) are typically used to mitigate this drift through velocity measurements, which are commonly estimated using a parameter estimation approach such as least squares (LS). How…
▽ More
Accurate underwater navigation is a challenging task due to the absence of global navigation satellite system signals and the reliance on inertial navigation systems that suffer from drift over time. Doppler velocity logs (DVLs) are typically used to mitigate this drift through velocity measurements, which are commonly estimated using a parameter estimation approach such as least squares (LS). However, LS works under the assumption of ideal conditions and does not account for sensor biases, leading to suboptimal performance. This paper proposes a data-driven alternative based on multi-output Gaussian process regression (MOGPR) to improve DVL velocity estimation. MOGPR provides velocity estimates and associated measurement covariances, enabling an adaptive integration within an error-state Extended Kalman Filter (EKF). We evaluate our proposed approach using real-world AUV data and compare it against LS and a state-of-the-art deep learning model, BeamsNet. Results demonstrate that MOGPR reduces velocity estimation errors by approximately 20% while simultaneously enhancing overall navigation accuracy, particularly in the orientation states. Additionally, the incorporation of uncertainty estimates from MOGPR enables an adaptive EKF framework, improving navigation robustness in dynamic underwater environments.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
TexLiDAR: Automated Text Understanding for Panoramic LiDAR Data
Authors:
Naor Cohen,
Roy Orfaig,
Ben-Zion Bobrovsky
Abstract:
Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth,…
▽ More
Efforts to connect LiDAR data with text, such as LidarCLIP, have primarily focused on embedding 3D point clouds into CLIP text-image space. However, these approaches rely on 3D point clouds, which present challenges in encoding efficiency and neural network processing. With the advent of advanced LiDAR sensors like Ouster OS1, which, in addition to 3D point clouds, produce fixed resolution depth, signal, and ambient panoramic 2D images, new opportunities emerge for LiDAR based tasks. In this work, we propose an alternative approach to connect LiDAR data with text by leveraging 2D imagery generated by the OS1 sensor instead of 3D point clouds. Using the Florence 2 large model in a zero-shot setting, we perform image captioning and object detection. Our experiments demonstrate that Florence 2 generates more informative captions and achieves superior performance in object detection tasks compared to existing methods like CLIP. By combining advanced LiDAR sensor data with a large pre-trained model, our approach provides a robust and accurate solution for challenging detection scenarios, including real-time applications requiring high accuracy and robustness.
△ Less
Submitted 21 February, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Authors:
Nadav Z. Cohen,
Oron Nir,
Ariel Shamir
Abstract:
Balancing content fidelity and artistic style is a pivotal challenge in image generation. While traditional style transfer methods and modern Denoising Diffusion Probabilistic Models (DDPMs) strive to achieve this balance, they often struggle to do so without sacrificing either style, content, or sometimes both. This work addresses this challenge by analyzing the ability of DDPMs to maintain conte…
▽ More
Balancing content fidelity and artistic style is a pivotal challenge in image generation. While traditional style transfer methods and modern Denoising Diffusion Probabilistic Models (DDPMs) strive to achieve this balance, they often struggle to do so without sacrificing either style, content, or sometimes both. This work addresses this challenge by analyzing the ability of DDPMs to maintain content and style equilibrium. We introduce a novel method to identify sensitivities within the DDPM attention layers, identifying specific layers that correspond to different stylistic aspects. By directing conditional inputs only to these sensitive layers, our approach enables fine-grained control over style and content, significantly reducing issues arising from over-constrained inputs. Our findings demonstrate that this method enhances recent stylization techniques by better aligning style and content, ultimately improving the quality of generated visual content.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots
Authors:
Shani Goren,
Oren Kalinsky,
Tomer Stav,
Yuri Rapoport,
Yaron Fairstein,
Ram Yazdi,
Nachshon Cohen,
Alexander Libov,
Guy Kushilevitz
Abstract:
The rise of LLMs has deflected a growing portion of human-computer interactions towards LLM-based chatbots. The remarkable abilities of these models allow users to interact using long, diverse natural language text covering a wide range of topics and styles. Phrasing these messages is a time and effort consuming task, calling for an autocomplete solution to assist users. We introduce the task of c…
▽ More
The rise of LLMs has deflected a growing portion of human-computer interactions towards LLM-based chatbots. The remarkable abilities of these models allow users to interact using long, diverse natural language text covering a wide range of topics and styles. Phrasing these messages is a time and effort consuming task, calling for an autocomplete solution to assist users. We introduce the task of chatbot interaction autocomplete. We present ChaI-TeA: CHat InTEraction Autocomplete; An autcomplete evaluation framework for LLM-based chatbot interactions. The framework includes a formal definition of the task, coupled with suitable datasets and metrics. We use the framework to evaluate After formally defining the task along with suitable datasets and metrics, we test 9 models on the defined auto completion task, finding that while current off-the-shelf models perform fairly, there is still much room for improvement, mainly in ranking of the generated suggestions. We provide insights for practitioners working on this task and open new research directions for researchers in the field. We release our framework to serve as a foundation for future research.
△ Less
Submitted 5 March, 2025; v1 submitted 24 December, 2024;
originally announced December 2024.
-
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Authors:
Kasra Arabi,
Benjamin Feuer,
R. Teal Witter,
Chinmay Hegde,
Niv Cohen
Abstract:
As the quality of image generators continues to improve, deepfakes become a topic of considerable societal debate. Image watermarking allows responsible model owners to detect and label their AI-generated content, which can mitigate the harm. Yet, current state-of-the-art methods in image watermarking remain vulnerable to forgery and removal attacks. This vulnerability occurs in part because water…
▽ More
As the quality of image generators continues to improve, deepfakes become a topic of considerable societal debate. Image watermarking allows responsible model owners to detect and label their AI-generated content, which can mitigate the harm. Yet, current state-of-the-art methods in image watermarking remain vulnerable to forgery and removal attacks. This vulnerability occurs in part because watermarks distort the distribution of generated images, unintentionally revealing information about the watermarking techniques.
In this work, we first demonstrate a distortion-free watermarking method for images, based on a diffusion model's initial noise. However, detecting the watermark requires comparing the initial noise reconstructed for an image to all previously used initial noises. To mitigate these issues, we propose a two-stage watermarking framework for efficient detection. During generation, we augment the initial noise with generated Fourier patterns to embed information about the group of initial noises we used. For detection, we (i) retrieve the relevant group of noises, and (ii) search within the given group for an initial noise that might match our image. This watermarking approach achieves state-of-the-art robustness to forgery and removal against a large battery of attacks.
△ Less
Submitted 27 April, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Snake-Inspired Mobile Robot Positioning with Hybrid Learning
Authors:
Aviad Etzion,
Nadav Cohen,
Orzion Levy,
Zeev Yampolsky,
Itzik Klein
Abstract:
Mobile robots are used in various fields, from deliveries to search and rescue applications. Different types of sensors are mounted on the robot to provide accurate navigation and, thus, allow successful completion of its task. In real-world scenarios, due to environmental constraints, the robot frequently relies only on its inertial sensors. Therefore, due to noises and other error terms associat…
▽ More
Mobile robots are used in various fields, from deliveries to search and rescue applications. Different types of sensors are mounted on the robot to provide accurate navigation and, thus, allow successful completion of its task. In real-world scenarios, due to environmental constraints, the robot frequently relies only on its inertial sensors. Therefore, due to noises and other error terms associated with the inertial readings, the navigation solution drifts in time. To mitigate the inertial solution drift, we propose the MoRPINet framework consisting of a neural network to regress the robot's travelled distance. To this end, we require the mobile robot to maneuver in a snake-like slithering motion to encourage nonlinear behavior. MoRPINet was evaluated using a dataset of 290 minutes of inertial recordings during field experiments and showed an improvement of 33% in the positioning error over other state-of-the-art methods for pure inertial navigation.
△ Less
Submitted 1 December, 2024; v1 submitted 26 November, 2024;
originally announced November 2024.
-
Understanding Transfer Learning via Mean-field Analysis
Authors:
Gholamali Aminian,
Łukasz Szpruch,
Samuel N. Cohen
Abstract:
We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, $α$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk…
▽ More
We propose a novel framework for exploring generalization errors of transfer learning through the lens of differential calculus on the space of probability measures. In particular, we consider two main transfer learning scenarios, $α$-ERM and fine-tuning with the KL-regularized empirical risk minimization and establish generic conditions under which the generalization error and the population risk convergence rates for these scenarios are studied. Based on our theoretical results, we show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime under some suitable integrability and regularity assumptions on the loss and activation functions.
△ Less
Submitted 23 October, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Data-driven Coreference-based Ontology Building
Authors:
Shir Ashury-Tahan,
Amir David Nissan Cohen,
Nadav Cohen,
Yoram Louzoun,
Yoav Goldberg
Abstract:
While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the strin…
▽ More
While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the string phrases within these chains, establishing connections between phrases if they co-occur within the same coreference chain. We then use the graph structure and the betweeness centrality measure to distinguish between edges denoting hierarchy, identity and noise, assign directionality to edges denoting hierarchy, and split nodes (strings) that correspond to multiple distinct concepts. The result is a rich, data-driven ontology over concepts in the biomedical domain, parts of which overlaps significantly with human-authored ontologies. We release the coreference chains and resulting ontology under a creative-commons license, along with the code.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Provable Benefits of Complex Parameterizations for Structured State Space Models
Authors:
Yuval Ran-Milo,
Eden Lumbroso,
Edo Cohen-Karlik,
Raja Giryes,
Amir Globerson,
Nadav Cohen
Abstract:
Structured state space models (SSMs), the core engine behind prominent neural networks such as S4 and Mamba, are linear dynamical systems adhering to a specified structure, most notably diagonal. In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations. Theoretically explaining the benefits of complex parameterizations for SSMs is an…
▽ More
Structured state space models (SSMs), the core engine behind prominent neural networks such as S4 and Mamba, are linear dynamical systems adhering to a specified structure, most notably diagonal. In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations. Theoretically explaining the benefits of complex parameterizations for SSMs is an open problem. The current paper takes a step towards its resolution, by establishing formal gaps between real and complex diagonal SSMs. Firstly, we prove that while a moderate dimension suffices in order for a complex SSM to express all mappings of a real SSM, a much higher dimension is needed for a real SSM to express mappings of a complex SSM. Secondly, we prove that even if the dimension of a real SSM is high enough to express a given mapping, typically, doing so requires the parameters of the real SSM to hold exponentially large values, which cannot be learned in practice. In contrast, a complex SSM can express any given mapping with moderate parameter values. Experiments corroborate our theory, and suggest a potential extension of the theory that accounts for selectivity, a new architectural feature yielding state of the art performance.
△ Less
Submitted 31 October, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels
Authors:
Yonatan Slutzky,
Yotam Alexander,
Noam Razin,
Nadav Cohen
Abstract:
Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs), regarded as an efficient alternative to transformers. Prior work argued that the implicit bias of SSMs leads to generalization in a setting where…
▽ More
Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs), regarded as an efficient alternative to transformers. Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e. having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, particularly in large language models, we believe significant efforts should be invested in further delineating their susceptibility to clean-label poisoning, and in developing methods for overcoming this susceptibility.
△ Less
Submitted 6 February, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification
Authors:
Benjamin Feuer,
Jiawei Xu,
Niv Cohen,
Patrick Yubeaton,
Govind Mittal,
Chinmay Hegde
Abstract:
Data curation is the problem of how to collect and organize samples into a dataset that supports efficient learning. Despite the centrality of the task, little work has been devoted towards a large-scale, systematic comparison of various curation methods. In this work, we take steps towards a formal evaluation of data curation strategies and introduce SELECT, the first large-scale benchmark of cur…
▽ More
Data curation is the problem of how to collect and organize samples into a dataset that supports efficient learning. Despite the centrality of the task, little work has been devoted towards a large-scale, systematic comparison of various curation methods. In this work, we take steps towards a formal evaluation of data curation strategies and introduce SELECT, the first large-scale benchmark of curation strategies for image classification.
In order to generate baseline methods for the SELECT benchmark, we create a new dataset, ImageNet++, which constitutes the largest superset of ImageNet-1K to date. Our dataset extends ImageNet with 5 new training-data shifts, each approximately the size of ImageNet-1K itself, and each assembled using a distinct curation strategy. We evaluate our data curation baselines in two ways: (i) using each training-data shift to train identical image classification models from scratch (ii) using the data itself to fit a pretrained self-supervised representation.
Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap. We release our checkpoints, code, documentation, and a link to our dataset at https://github.com/jimmyxu123/SELECT.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Generalization and Robustness of the Tilted Empirical Risk
Authors:
Gholamali Aminian,
Amir R. Asadi,
Tian Li,
Ahmad Beirami,
Gesine Reinert,
Samuel N. Cohen
Abstract:
The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error…
▽ More
The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+ε)$-th moment of loss function for some $ε\in(0,1]$ with a convergence rate of $O(n^{-ε/(1+ε)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.
△ Less
Submitted 7 June, 2025; v1 submitted 28 September, 2024;
originally announced September 2024.
-
Quality Matters: Evaluating Synthetic Data for Tool-Using LLMs
Authors:
Shadi Iskander,
Nachshon Cohen,
Zohar Karnin,
Ori Shapira,
Sofia Tolmach
Abstract:
Training large language models (LLMs) for external tool usage is a rapidly expanding field, with recent research focusing on generating synthetic data to address the shortage of available data. However, the absence of systematic data quality checks poses complications for properly training and testing models. To that end, we propose two approaches for assessing the reliability of data for training…
▽ More
Training large language models (LLMs) for external tool usage is a rapidly expanding field, with recent research focusing on generating synthetic data to address the shortage of available data. However, the absence of systematic data quality checks poses complications for properly training and testing models. To that end, we propose two approaches for assessing the reliability of data for training LLMs to use external tools. The first approach uses intuitive, human-defined correctness criteria. The second approach uses a model-driven assessment with in-context evaluation. We conduct a thorough evaluation of data quality on two popular benchmarks, followed by an extrinsic evaluation that showcases the impact of data quality on model performance. Our results demonstrate that models trained on high-quality data outperform those trained on unvalidated data, even when trained with a smaller quantity of data. These findings empirically support the significance of assessing and ensuring the reliability of training data for tool-using LLMs.
△ Less
Submitted 26 September, 2024; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
Authors:
Nadav Cohen,
Noam Razin
Abstract:
These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theo…
▽ More
These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theory are also discussed. The theory is based on mathematical tools that are dynamical in nature. It showcases the potential of such tools to push the envelope of our understanding of optimization and generalization in deep learning. The text assumes familiarity with the basics of statistical learning theory. Exercises (without solutions) are included.
△ Less
Submitted 6 November, 2024; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Deep Learning Assisted Inertial Dead Reckoning and Fusion
Authors:
Dror Hurwitz,
Nadav Cohen,
Itzik Klein
Abstract:
The interest in mobile platforms across a variety of applications has increased significantly in recent years. One of the reasons is the ability to achieve accurate navigation by using low-cost sensors. To this end, inertial sensors are fused with global navigation satellite systems (GNSS) signals. GNSS outages during platform operation can result in pure inertial navigation, causing the navigatio…
▽ More
The interest in mobile platforms across a variety of applications has increased significantly in recent years. One of the reasons is the ability to achieve accurate navigation by using low-cost sensors. To this end, inertial sensors are fused with global navigation satellite systems (GNSS) signals. GNSS outages during platform operation can result in pure inertial navigation, causing the navigation solution to drift. In such situations, periodic trajectories with dedicated algorithms were suggested to mitigate the drift. With periodic dynamics, inertial deep learning approaches can capture the motion more accurately and provide accurate dead-reckoning for drones and mobile robots. In this paper, we propose approaches to extend deep learning-assisted inertial sensing and fusion capabilities during periodic motion. We begin by demonstrating that fusion between GNSS and inertial sensors in periodic trajectories achieves better accuracy compared to straight-line trajectories. Next, we propose an empowered network architecture to accurately regress the change in distance of the platform. Utilizing this network, we drive a hybrid approach for a neural-inertial fusion filter. Finally, we utilize this approach for situations when GNSS is available and show its benefits. A dataset of 337 minutes of data collected from inertial sensors mounted on a mobile robot and a quadrotor is used to evaluate our approaches.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Evaluating D-MERIT of Partial-annotation on Information Retrieval
Authors:
Royi Rassin,
Yaron Fairstein,
Oren Kalinsky,
Guy Kushilevitz,
Nachshon Cohen,
Alexander Libov,
Yoav Goldberg
Abstract:
Retrieval models are often evaluated on partially-annotated datasets. Each query is mapped to a few relevant texts and the remaining corpus is assumed to be irrelevant. As a result, models that successfully retrieve false negatives are punished in evaluation. Unfortunately, completely annotating all texts for every query is not resource efficient. In this work, we show that using partially-annotat…
▽ More
Retrieval models are often evaluated on partially-annotated datasets. Each query is mapped to a few relevant texts and the remaining corpus is assumed to be irrelevant. As a result, models that successfully retrieve false negatives are punished in evaluation. Unfortunately, completely annotating all texts for every query is not resource efficient. In this work, we show that using partially-annotated datasets in evaluation can paint a distorted picture. We curate D-MERIT, a passage retrieval evaluation set from Wikipedia, aspiring to contain all relevant passages for each query. Queries describe a group (e.g., "journals about linguistics") and relevant passages are evidence that entities belong to the group (e.g., a passage indicating that "Language" is a journal about linguistics). We show that evaluating on a dataset containing annotations for only a subset of the relevant passages might result in misleading ranking of the retrieval systems and that as more relevant texts are included in the evaluation set, the rankings converge. We propose our dataset as a resource for evaluation and our study as a recommendation for balance between resource-efficiency and reliable evaluation when annotating evaluation sets for text retrieval.
△ Less
Submitted 13 October, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Authors:
Assaf Ben-Kish,
Itamar Zimerman,
Shady Abu-Hussein,
Nadav Cohen,
Amir Globerson,
Lior Wolf,
Raja Giryes
Abstract:
Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be re…
▽ More
Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are significantly longer than the ones seen during training, while enjoying faster inference.
△ Less
Submitted 9 April, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
How to design a dataset compliant with an ML-based system ODD?
Authors:
Cyril Cappi,
Noémie Cohen,
Mélanie Ducoffe,
Christophe Gabreau,
Laurent Gardes,
Adrien Gauffriau,
Jean-Brice Ginestet,
Franck Mamalet,
Vincent Mussot,
Claire Pagetti,
David Vigouroux
Abstract:
This paper focuses on a Vision-based Landing task and presents the design and the validation of a dataset that would comply with the Operational Design Domain (ODD) of a Machine-Learning (ML) system. Relying on emerging certification standards, we describe the process for establishing ODDs at both the system and image levels. In the process, we present the translation of high-level system constrai…
▽ More
This paper focuses on a Vision-based Landing task and presents the design and the validation of a dataset that would comply with the Operational Design Domain (ODD) of a Machine-Learning (ML) system. Relying on emerging certification standards, we describe the process for establishing ODDs at both the system and image levels. In the process, we present the translation of high-level system constraints into actionable image-level properties, allowing for the definition of verifiable Data Quality Requirements (DQRs). To illustrate this approach, we use the Landing Approach Runway Detection (LARD) dataset which combines synthetic imagery and real footage, and we focus on the steps required to verify the DQRs. The replicable framework presented in this paper addresses the challenges of designing a dataset compliant with the stringent needs of ML-based systems certification in safety-critical applications.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Authors:
Edoardo Debenedetti,
Javier Rando,
Daniel Paleka,
Silaghi Fineas Florin,
Dragos Albastroiu,
Niv Cohen,
Yuval Lemberg,
Reshmi Ghosh,
Rui Wen,
Ahmed Salem,
Giovanni Cherubin,
Santiago Zanella-Beguelin,
Robin Schmid,
Victor Klemm,
Takahiro Miki,
Chenhao Li,
Stefan Kraft,
Mario Fritz,
Florian Tramèr,
Sahar Abdelnabi,
Lea Schönherr
Abstract:
Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed…
▽ More
Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed defenses to prevent the model from leaking the secret. During the second phase, teams were challenged to extract the secrets hidden for defenses proposed by the other teams. This report summarizes the main insights from the competition. Notably, we found that all defenses were bypassed at least once, highlighting the difficulty of designing a successful defense and the necessity for additional research to protect LLM systems. To foster future research in this direction, we compiled a dataset with over 137k multi-turn attack chats and open-sourced the platform.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Aegis: Tethering a Blockchain with Primary-Chain Stake
Authors:
Yogev Bar-On,
Roi Bar-Zur,
Omer Ben-Porat,
Nimrod Cohen,
Ittay Eyal,
Matan Sitbon
Abstract:
Blockchains implement decentralized monetary systems and applications. Recent advancements enable what we call tethering a blockchain to a primary blockchain, securing the tethered chain by nodes that post primary-chain tokens as collateral. The collateral ensures nodes behave as intended, until they withdraw it. Unlike a Proof of Stake blockchain which uses its own token as collateral, using prim…
▽ More
Blockchains implement decentralized monetary systems and applications. Recent advancements enable what we call tethering a blockchain to a primary blockchain, securing the tethered chain by nodes that post primary-chain tokens as collateral. The collateral ensures nodes behave as intended, until they withdraw it. Unlike a Proof of Stake blockchain which uses its own token as collateral, using primary-chain tokens shields the tethered chain from the volatility of its own token.
State-of-the-art tethered blockchains either rely on centralization, or make extreme assumptions: that all communication is synchronous, that operators remain correct even post-withdrawal, or that withdrawals can be indefinitely delayed by tethered-chain failures.
We prove that with partial synchrony, there is no solution to the problem. However, under the standard assumptions that communication with the primary chain is synchronous and communication among the tethered chain nodes is partially synchronous, there is a solution. We present a tethered-chain protocol called Aegis. Aegis uses references from its blocks to primary blocks to define committees, checkpoints on the primary chain to perpetuate decisions, and resets to establish new committees when previous ones become obsolete. It ensures safety at all times and rapid progress when latency among Aegis nodes is low.
△ Less
Submitted 24 April, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Authors:
Nathaniel Cohen,
Vladimir Kulikov,
Matan Kleiner,
Inbar Huberman-Spiegelglas,
Tomer Michaeli
Abstract:
Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pretrained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle…
▽ More
Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pretrained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle with strong nonrigid motion. In this paper, we introduce a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. Based on this observation, we present Slicedit, a method for text-based video editing that utilizes a pretrained T2I diffusion model to process both spatial and spatiotemporal slices. Our method generates videos that retain the structure and motion of the original video while adhering to the target text. Through extensive experiments, we demonstrate Slicedit's ability to edit a wide range of real-world videos, confirming its clear advantages compared to existing competing methods. Webpage: https://matankleiner.github.io/slicedit/
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Seamless Underwater Navigation with Limited Doppler Velocity Log Measurements
Authors:
Nadav Cohen,
Itzik Klein
Abstract:
Autonomous Underwater Vehicles (AUVs) commonly utilize an inertial navigation system (INS) and a Doppler velocity log (DVL) for underwater navigation. To that end, their measurements are integrated through a nonlinear filter such as the extended Kalman filter (EKF). The DVL velocity vector estimate depends on retrieving reflections from the seabed, ensuring that at least three out of its four tran…
▽ More
Autonomous Underwater Vehicles (AUVs) commonly utilize an inertial navigation system (INS) and a Doppler velocity log (DVL) for underwater navigation. To that end, their measurements are integrated through a nonlinear filter such as the extended Kalman filter (EKF). The DVL velocity vector estimate depends on retrieving reflections from the seabed, ensuring that at least three out of its four transmitted acoustic beams return successfully. When fewer than three beams are obtained, the DVL cannot provide a velocity update to bind the navigation solution drift. To cope with this challenge, in this paper, we propose a hybrid neural coupled (HNC) approach for seamless AUV navigation in situations of limited DVL measurements. First, we drive an approach to regress two or three missing DVL beams. Then, those beams, together with the measured beams, are incorporated into the EKF. We examined INS/DVL fusion both in loosely and tightly coupled approaches. Our method was trained and evaluated on recorded data from AUV experiments conducted in the Mediterranean Sea on two different occasions. The results illustrate that our proposed method outperforms the baseline loosely and tightly coupled model-based approaches by an average of 96.15%. It also demonstrates superior performance compared to a model-based beam estimator by an average of 12.41% in terms of velocity accuracy for scenarios involving two or three missing beams. Therefore, we demonstrate that our approach offers seamless AUV navigation in situations of limited beam measurements.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Identifying Shopping Intent in Product QA for Proactive Recommendations
Authors:
Besnik Fetahu,
Nachshon Cohen,
Elad Haramaty,
Liane Lewin-Eytan,
Oleg Rokhlenko,
Shervin Malmasi
Abstract:
Voice assistants have become ubiquitous in smart devices allowing users to instantly access information via voice questions. While extensive research has been conducted in question answering for voice search, little attention has been paid on how to enable proactive recommendations from a voice assistant to its users. This is a highly challenging problem that often leads to user friction, mainly d…
▽ More
Voice assistants have become ubiquitous in smart devices allowing users to instantly access information via voice questions. While extensive research has been conducted in question answering for voice search, little attention has been paid on how to enable proactive recommendations from a voice assistant to its users. This is a highly challenging problem that often leads to user friction, mainly due to recommendations provided to the users at the wrong time. We focus on the domain of e-commerce, namely in identifying Shopping Product Questions (SPQs), where the user asking a product-related question may have an underlying shopping need. Identifying a user's shopping need allows voice assistants to enhance shopping experience by determining when to provide recommendations, such as product or deal recommendations, or proactive shopping actions recommendation. Identifying SPQs is a challenging problem and cannot be done from question text alone, and thus requires to infer latent user behavior patterns inferred from user's past shopping history. We propose features that capture the user's latent shopping behavior from their purchase history, and combine them using a novel Mixture-of-Experts (MoE) model. Our evaluation shows that the proposed approach is able to identify SPQs with a high score of F1=0.91. Furthermore, based on an online evaluation with real voice assistant users, we identify SPQs in real-time and recommend shopping actions to users to add the queried product into their shopping list. We demonstrate that we are able to accurately identify SPQs, as indicated by the significantly higher rate of added products to users' shopping lists when being prompted after SPQs vs random PQs.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Robust Concept Erasure Using Task Vectors
Authors:
Minh Pham,
Kelly O. Marshall,
Chinmay Hegde,
Niv Cohen
Abstract:
With the rapid growth of text-to-image models, a variety of techniques have been suggested to prevent undesirable image generations. Yet, these methods often only protect against specific user prompts and have been shown to allow unsafe generations with other inputs. Here we focus on unconditionally erasing a concept from a text-to-image model rather than conditioning the erasure on the user's pro…
▽ More
With the rapid growth of text-to-image models, a variety of techniques have been suggested to prevent undesirable image generations. Yet, these methods often only protect against specific user prompts and have been shown to allow unsafe generations with other inputs. Here we focus on unconditionally erasing a concept from a text-to-image model rather than conditioning the erasure on the user's prompt. We first show that compared to input-dependent erasure methods, concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs, not seen during training. However, TV-based erasure can also affect the core performance of the edited model, particularly when the required edit strength is unknown. To this end, we propose a method called Diverse Inversion, which we use to estimate the required strength of the TV edit. Diverse Inversion finds within the model input space a large set of word embeddings, each of which induces the generation of the target concept. We find that encouraging diversity in the set makes our estimation more robust to unexpected prompts. Finally, we show that Diverse Inversion enables us to apply a TV edit only to a subset of the model weights, enhancing the erasure capabilities while better maintaining the core functionality of the model.
△ Less
Submitted 19 February, 2025; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Verification for Object Detection -- IBP IoU
Authors:
Noémie Cohen,
Mélanie Ducoffe,
Ryma Boumazouza,
Christophe Gabreau,
Claire Pagetti,
Xavier Pucel,
Audrey Galametz
Abstract:
We introduce a novel Interval Bound Propagation (IBP) approach for the formal verification of object detection models, specifically targeting the Intersection over Union (IoU) metric. The approach has been implemented in an open source code, named IBP IoU, compatible with popular abstract interpretation based verification tools. The resulting verifier is evaluated on landing approach runway detect…
▽ More
We introduce a novel Interval Bound Propagation (IBP) approach for the formal verification of object detection models, specifically targeting the Intersection over Union (IoU) metric. The approach has been implemented in an open source code, named IBP IoU, compatible with popular abstract interpretation based verification tools. The resulting verifier is evaluated on landing approach runway detection and handwritten digit recognition case studies. Comparisons against a baseline (Vanilla IBP IoU) highlight the superior performance of IBP IoU in ensuring accuracy and stability, contributing to more secure and robust machine learning applications.
△ Less
Submitted 30 January, 2024;
originally announced March 2024.
-
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
Authors:
Benjamin Feuer,
Robin Tibor Schirrmeister,
Valeriia Cherepanova,
Chinmay Hegde,
Frank Hutter,
Micah Goldblum,
Niv Cohen,
Colin White
Abstract:
While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adopt…
▽ More
While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context. We conduct extensive experiments on 19 algorithms over 98 datasets and find that TuneTables achieves the best performance on average, outperforming boosted trees such as CatBoost, while optimizing fewer than 5% of TabPFN's parameters. Furthermore, we show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective. We open-source our code and raw results at https://github.com/penfever/TuneTables.
△ Less
Submitted 21 October, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
Authors:
Noam Razin,
Yotam Alexander,
Edo Cohen-Karlik,
Raja Giryes,
Amir Globerson,
Nadav Cohen
Abstract:
In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (re…
▽ More
In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Experiments corroborate our theory, and demonstrate its conclusions on problems beyond LQR, where systems are non-linear and controllers are neural networks. We hypothesize that real-world optimal control may be greatly improved by developing methods for informed selection of initial states to train on.
△ Less
Submitted 1 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Generalization Error of Graph Neural Networks in the Mean-field Regime
Authors:
Gholamali Aminian,
Yixuan He,
Gesine Reinert,
Łukasz Szpruch,
Samuel N. Cohen
Abstract:
This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the gen…
▽ More
This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the generalization error in the over-parametrized regime were uninformative, limiting our understanding of over-parameterized network performance. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks. We establish upper bounds with a convergence rate of $O(1/n)$, where $n$ is the number of graph samples. These upper bounds offer a theoretical assurance of the networks' performance on unseen data in the challenging over-parameterized regime and overall contribute to our understanding of their performance.
△ Less
Submitted 1 July, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Classifying Nodes in Graphs without GNNs
Authors:
Daniel Winter,
Niv Cohen,
Yedid Hoshen
Abstract:
Graph neural networks (GNNs) are the dominant paradigm for classifying nodes in a graph, but they have several undesirable attributes stemming from their message passing architecture. Recently, distillation methods succeeded in eliminating the use of GNNs at test time but they still require them during training. We perform a careful analysis of the role that GNNs play in distillation methods. This…
▽ More
Graph neural networks (GNNs) are the dominant paradigm for classifying nodes in a graph, but they have several undesirable attributes stemming from their message passing architecture. Recently, distillation methods succeeded in eliminating the use of GNNs at test time but they still require them during training. We perform a careful analysis of the role that GNNs play in distillation methods. This analysis leads us to propose a fully GNN-free approach for node classification, not requiring them at train or test time. Our method consists of three key components: smoothness constraints, pseudo-labeling iterations and neighborhood-label histograms. Our final approach can match the state-of-the-art accuracy on standard popular benchmarks such as citation and co-purchase networks, without training a GNN.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Robustness Assessment of a Runway Object Classifier for Safe Aircraft Taxiing
Authors:
Yizhak Elboher,
Raya Elsaleh,
Omri Isac,
Mélanie Ducoffe,
Audrey Galametz,
Guillaume Povéda,
Ryma Boumazouza,
Noémie Cohen,
Guy Katz
Abstract:
As deep neural networks (DNNs) are becoming the prominent solution for many computational problems, the aviation industry seeks to explore their potential in alleviating pilot workload and in improving operational safety. However, the use of DNNs in this type of safety-critical applications requires a thorough certification process. This need can be addressed through formal verification, which pro…
▽ More
As deep neural networks (DNNs) are becoming the prominent solution for many computational problems, the aviation industry seeks to explore their potential in alleviating pilot workload and in improving operational safety. However, the use of DNNs in this type of safety-critical applications requires a thorough certification process. This need can be addressed through formal verification, which provides rigorous assurances -- e.g.,~by proving the absence of certain mispredictions. In this case-study paper, we demonstrate this process using an image-classifier DNN currently under development at Airbus and intended for use during the aircraft taxiing phase. We use formal methods to assess this DNN's robustness to three common image perturbation types: noise, brightness and contrast, and some of their combinations. This process entails multiple invocations of the underlying verifier, which might be computationally expensive; and we therefore propose a method that leverages the monotonicity of these robustness properties, as well as the results of past verification queries, in order to reduce the overall number of verification queries required by nearly 60%. Our results provide an indication of the level of robustness achieved by the DNN classifier under study, and indicate that it is considerably more vulnerable to noise than to brightness or contrast perturbations.
△ Less
Submitted 6 August, 2024; v1 submitted 8 January, 2024;
originally announced February 2024.
-
Data-Driven Strategies for Coping with Incomplete DVL Measurements
Authors:
Nadav Cohen,
Itzik Klein
Abstract:
Autonomous underwater vehicles are specialized platforms engineered for deep underwater operations. Critical to their functionality is autonomous navigation, typically relying on an inertial navigation system and a Doppler velocity log. In real-world scenarios, incomplete Doppler velocity log measurements occur, resulting in positioning errors and mission aborts. To cope with such situations, a mo…
▽ More
Autonomous underwater vehicles are specialized platforms engineered for deep underwater operations. Critical to their functionality is autonomous navigation, typically relying on an inertial navigation system and a Doppler velocity log. In real-world scenarios, incomplete Doppler velocity log measurements occur, resulting in positioning errors and mission aborts. To cope with such situations, a model and learning approaches were derived. This paper presents a comparative analysis of two cutting-edge deep learning methodologies, namely LiBeamsNet and MissBeamNet, alongside a model-based average estimator. These approaches are evaluated for their efficacy in regressing missing Doppler velocity log beams when two beams are unavailable. In our study, we used data recorded by a DVL mounted on an autonomous underwater vehicle operated in the Mediterranean Sea. We found that both deep learning architectures outperformed model-based approaches by over 16% in velocity prediction accuracy.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Adaptive Kalman-Informed Transformer
Authors:
Nadav Cohen,
Itzik Klein
Abstract:
The extended Kalman filter (EKF) is a widely adopted method for sensor fusion in navigation applications. A crucial aspect of the EKF is the online determination of the process noise covariance matrix reflecting the model uncertainty. While common EKF implementation assumes a constant process noise, in real-world scenarios, the process noise varies, leading to inaccuracies in the estimated state a…
▽ More
The extended Kalman filter (EKF) is a widely adopted method for sensor fusion in navigation applications. A crucial aspect of the EKF is the online determination of the process noise covariance matrix reflecting the model uncertainty. While common EKF implementation assumes a constant process noise, in real-world scenarios, the process noise varies, leading to inaccuracies in the estimated state and potentially causing the filter to diverge. Model-based adaptive EKF methods were proposed and demonstrated performance improvements to cope with such situations, highlighting the need for a robust adaptive approach. In this paper, we derive an adaptive Kalman-informed transformer (A-KIT) designed to learn the varying process noise covariance online. Built upon the foundations of the EKF, A-KIT utilizes the well-known capabilities of set transformers, including inherent noise reduction and the ability to capture nonlinear behavior in the data. This approach is suitable for any application involving the EKF. In a case study, we demonstrate the effectiveness of A-KIT in nonlinear fusion between a Doppler velocity log and inertial sensors. This is accomplished using real data recorded from sensors mounted on an autonomous underwater vehicle operating in the Mediterranean Sea. We show that A-KIT outperforms the conventional EKF by more than 49.5% and model-based adaptive EKF by an average of 35.4% in terms of position accuracy.
△ Less
Submitted 7 March, 2025; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Set Features for Anomaly Detection
Authors:
Niv Cohen,
Issar Tzachor,
Yedid Hoshen
Abstract:
This paper proposes to use set features for detecting anomalies in samples that consist of unusual combinations of normal elements. Many leading methods discover anomalies by detecting an unusual part of a sample. For example, state-of-the-art segmentation-based approaches, first classify each element of the sample (e.g., image patch) as normal or anomalous and then classify the entire sample as a…
▽ More
This paper proposes to use set features for detecting anomalies in samples that consist of unusual combinations of normal elements. Many leading methods discover anomalies by detecting an unusual part of a sample. For example, state-of-the-art segmentation-based approaches, first classify each element of the sample (e.g., image patch) as normal or anomalous and then classify the entire sample as anomalous if it contains anomalous elements. However, such approaches do not extend well to scenarios where the anomalies are expressed by an unusual combination of normal elements. In this paper, we overcome this limitation by proposing set features that model each sample by the distribution of its elements. We compute the anomaly score of each sample using a simple density estimation method, using fixed features. Our approach outperforms the previous state-of-the-art in image-level logical anomaly detection and sequence-level time series anomaly detection.
△ Less
Submitted 18 March, 2025; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
Authors:
Benjamin Feuer,
Chinmay Hegde,
Niv Cohen
Abstract:
Tabular classification has traditionally relied on supervised algorithms, which estimate the parameters of a prediction model using its training data. Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully learned to classify tabular data in-context: the model parameters are designed to classify new samples based on labelled training samples given after the model training. Wh…
▽ More
Tabular classification has traditionally relied on supervised algorithms, which estimate the parameters of a prediction model using its training data. Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully learned to classify tabular data in-context: the model parameters are designed to classify new samples based on labelled training samples given after the model training. While such models show great promise, their applicability to real-world data remains limited due to the computational scale needed. Here we study the following question: given a pre-trained PFN for tabular data, what is the best way to summarize the labelled training samples before feeding them to the model? We conduct an initial investigation of sketching and feature-selection methods for TabPFN, and note certain key differences between it and conventionally fitted tabular models.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
From Posterior Sampling to Meaningful Diversity in Image Restoration
Authors:
Noa Cohen,
Hila Manor,
Yuval Bahat,
Tomer Michaeli
Abstract:
Image restoration problems are typically ill-posed in the sense that each degraded image can be restored in infinitely many valid ways. To accommodate this, many works generate a diverse set of outputs by attempting to randomly sample from the posterior distribution of natural images given the degraded input. Here we argue that this strategy is commonly of limited practical value because of the he…
▽ More
Image restoration problems are typically ill-posed in the sense that each degraded image can be restored in infinitely many valid ways. To accommodate this, many works generate a diverse set of outputs by attempting to randomly sample from the posterior distribution of natural images given the degraded input. Here we argue that this strategy is commonly of limited practical value because of the heavy tail of the posterior distribution. Consider for example inpainting a missing region of the sky in an image. Since there is a high probability that the missing region contains no object but clouds, any set of samples from the posterior would be entirely dominated by (practically identical) completions of sky. However, arguably, presenting users with only one clear sky completion, along with several alternative solutions such as airships, birds, and balloons, would better outline the set of possibilities. In this paper, we initiate the study of meaningfully diverse image restoration. We explore several post-processing approaches that can be combined with any diverse image restoration method to yield semantically meaningful diversity. Moreover, we propose a practical approach for allowing diffusion based image restoration methods to generate meaningfully diverse outputs, while incurring only negligent computational overhead. We conduct extensive user studies to analyze the proposed techniques, and find the strategy of reducing similarity between outputs to be significantly favorable over posterior sampling. Code and examples are available at https://noa-cohen.github.io/MeaningfulDiversityInIR.
△ Less
Submitted 11 March, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Asteroid 2023 NT1: A Cautionary Tale
Authors:
Brin K. Bailey,
Alexander N. Cohen,
Sasha Egan,
Philip Lubin,
Ruitao Xu,
Mark Boslough,
Darrel Robertson,
Elizabeth Silber,
Irina Sagert,
Oleg Korobkin,
Glenn Sjoden
Abstract:
We investigate various short-warning mitigation scenarios via fragmentation for a hypothetical impact of asteroid 2023 NT1, a Near-Earth Object (NEO) that was discovered on July 15, 2023, two days after its closest approach to Earth on July 13. The asteroid passed by Earth within ~0.25 lunar distances, with a closest approach of ~1$\times10^5$ km and velocity of 11.27 km/s. Its size remains largel…
▽ More
We investigate various short-warning mitigation scenarios via fragmentation for a hypothetical impact of asteroid 2023 NT1, a Near-Earth Object (NEO) that was discovered on July 15, 2023, two days after its closest approach to Earth on July 13. The asteroid passed by Earth within ~0.25 lunar distances, with a closest approach of ~1$\times10^5$ km and velocity of 11.27 km/s. Its size remains largely uncertain, with an estimated diameter range of 26-58 m and a most probable estimate of 34 m [JPL Sentry, September 15, 2023] (weighted by the NEO size frequency distribution). If 2023 NT1 had collided with Earth, it could have caused significant local damage. Assuming a spherical asteroid with a diameter of 34 m, uniform density of 2.6 g/cm$^3$, and impact velocity of 15.59 km/s, a collision would have yielded an estimated impact energy of ~1.5 Mt, approximately three times the energy of the Chelyabinsk airburst in 2013. We analyze the effectiveness of mitigation via intentional robust disruption (IRD) for objects similar to 2023 NT1. We utilize Pulverize It (PI), a NASA Innovative Advanced Concepts (NIAC) study of planetary defense via fragmentation, to model potential mitigation scenarios through simulations of hypervelocity asteroid disruption and atmospheric ground effects in the case of a terminal defense mode. Simulations suggest that PI is an effective multi-modal approach for planetary defense that can operate in extremely short interdiction modes, in addition to long interdiction time scales with extended warning. Our simulations support the proposition that threats like 2023 NT1 can be effectively mitigated with intercepts of one day (or less) prior to impact, yielding minimal to no ground damage.
△ Less
Submitted 16 January, 2025; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew
Authors:
Shaltiel Shmidman,
Avi Shmidman,
Amir David Nissan Cohen,
Moshe Koppel
Abstract:
We present DictaLM, a large-scale language model tailored for Modern Hebrew. Boasting 7B parameters, this model is predominantly trained on Hebrew-centric data. As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license. Concurrently, we introduce DictaLM-Rab, another foundation mo…
▽ More
We present DictaLM, a large-scale language model tailored for Modern Hebrew. Boasting 7B parameters, this model is predominantly trained on Hebrew-centric data. As a commitment to promoting research and development in the Hebrew language, we release both the foundation model and the instruct-tuned model under a Creative Commons license. Concurrently, we introduce DictaLM-Rab, another foundation model geared towards Rabbinic/Historical Hebrew. These foundation models serve as ideal starting points for fine-tuning various Hebrew-specific tasks, such as instruction, Q&A, sentiment analysis, and more. This release represents a preliminary step, offering an initial Hebrew LLM model for the Hebrew NLP community to experiment with.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Optimal adaptive control with separable drift uncertainty
Authors:
Samuel N. Cohen,
Christoph Knochenhauer,
Alexander Merkel
Abstract:
We consider a problem of stochastic optimal control with separable drift uncertainty in strong formulation on a finite horizon. The drift coefficient of the state $Y^{u}$ is multiplicatively influenced by an unknown random variable $λ$, while admissible controls $u$ are required to be adapted to the observation filtration. Choosing a control actively influences the state and information acquisitio…
▽ More
We consider a problem of stochastic optimal control with separable drift uncertainty in strong formulation on a finite horizon. The drift coefficient of the state $Y^{u}$ is multiplicatively influenced by an unknown random variable $λ$, while admissible controls $u$ are required to be adapted to the observation filtration. Choosing a control actively influences the state and information acquisition simultaneously and comes with a learning effect. The problem, initially non-Markovian, is embedded into a higher-dimensional Markovian, full information control problem with control-dependent filtration and noise. To that problem, we apply the stochastic Perron method to characterize the value function as the unique viscosity solution to the HJB equation, explicitly construct $\varepsilon$-optimal controls and show that the values of strong and weak formulations agree. Numerical illustrations show a significant difference between the adaptive control and the certainty equivalence control.
△ Less
Submitted 10 November, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.