-
Knowledge Editing in Language Models via Adapted Direct Preference Optimization
Authors:
Amit Rozner,
Barak Battash,
Lior Wolf,
Ofir Lindenbaum
Abstract:
Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a v…
▽ More
Large Language Models (LLMs) can become outdated over time as they may lack updated world knowledge, leading to factual knowledge errors and gaps. Knowledge Editing (KE) aims to overcome this challenge using weight updates that do not require expensive retraining. We propose treating KE as an LLM alignment problem. Toward this goal, we introduce Knowledge Direct Preference Optimization (KDPO), a variation of the Direct Preference Optimization (DPO) that is more effective for knowledge modifications. Our method is based on an online approach that continually updates the knowledge stored in the model. We use the current knowledge as a negative sample and the new knowledge we want to introduce as a positive sample in a process called DPO. We also use teacher-forcing for negative sample generation and optimize using the positive sample, which helps maintain localized changes. We tested our KE method on various datasets and models, comparing it to several cutting-edge methods, with 100 and 500 sequential edits. Additionally, we conducted an ablation study comparing our method to the standard DPO approach. Our experimental results show that our modified DPO method allows for more refined KE, achieving similar or better performance compared to previous methods.
△ Less
Submitted 24 September, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Obtaining Favorable Layouts for Multiple Object Generation
Authors:
Barak Battash,
Amit Rozner,
Lior Wolf,
Ofir Lindenbaum
Abstract:
Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve mult…
▽ More
Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve multiple subjects. When presented with a prompt containing more than one subject, these models may omit some subjects or merge them together. To address this challenge, we propose a novel approach based on a guiding principle. We allow the diffusion model to initially propose a layout, and then we rearrange the layout grid. This is achieved by enforcing cross-attention maps (XAMs) to adhere to proposed masks and by migrating pixels from latent maps to new locations determined by us. We introduce new loss terms aimed at reducing XAM entropy for clearer spatial definition of subjects, reduce the overlap between XAMs, and ensure that XAMs align with their respective masks. We contrast our approach with several alternative methods and show that it more faithfully captures the desired concepts across a variety of text prompts.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Efficient Verification-Based Face Identification
Authors:
Amit Rozner,
Barak Battash,
Ofir Lindenbaum,
Lior Wolf
Abstract:
We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate…
▽ More
We study the problem of performing face verification with an efficient neural model $f$. The efficiency of $f$ stems from simplifying the face verification problem from an embedding nearest neighbor search into a binary problem; each user has its own neural network $f$. To allow information sharing between different individuals in the training set, we do not train $f$ directly but instead generate the model weights using a hypernetwork $h$. This leads to the generation of a compact personalized model for face identification that can be deployed on edge devices. Key to the method's success is a novel way of generating hard negatives and carefully scheduling the training objectives. Our model leads to a substantially small $f$ requiring only 23k parameters and 5M floating point operations (FLOPS). We use six face verification datasets to demonstrate that our method is on par or better than state-of-the-art models, with a significantly reduced number of parameters and computational burden. Furthermore, we perform an extensive ablation study to demonstrate the importance of each element in our method.
△ Less
Submitted 25 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Anomaly Detection with Variance Stabilized Density Estimation
Authors:
Amit Rozner,
Barak Battash,
Henry Li,
Lior Wolf,
Ofir Lindenbaum
Abstract:
We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the lik…
▽ More
We propose a modified density estimation problem that is highly effective for detecting anomalies in tabular data. Our approach assumes that the density function is relatively stable (with lower variance) around normal samples. We have verified this hypothesis empirically using a wide range of real-world data. Then, we present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples while minimizing the variance of the density around normal samples. To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution. We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results while alleviating the need for data-specific hyperparameter tuning. Finally, we have used an ablation study to demonstrate the importance of each of the proposed components, followed by a stability analysis evaluating the robustness of our model.
△ Less
Submitted 8 May, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Domain-Generalizable Multiple-Domain Clustering
Authors:
Amit Rozner,
Barak Battash,
Lior Wolf,
Ofir Lindenbaum
Abstract:
This work generalizes the problem of unsupervised domain generalization to the case in which no labeled samples are available (completely unsupervised). We are given unlabeled samples from multiple source domains, and we aim to learn a shared predictor that assigns examples to semantically related clusters. Evaluation is done by predicting cluster assignments in previously unseen domains. Towards…
▽ More
This work generalizes the problem of unsupervised domain generalization to the case in which no labeled samples are available (completely unsupervised). We are given unlabeled samples from multiple source domains, and we aim to learn a shared predictor that assigns examples to semantically related clusters. Evaluation is done by predicting cluster assignments in previously unseen domains. Towards this goal, we propose a two-stage training framework: (1) self-supervised pre-training for extracting domain invariant semantic features. (2) multi-head cluster prediction with pseudo labels, which rely on both the feature space and cluster head prediction, further leveraging a novel prediction-based label smoothing scheme. We demonstrate empirically that our model is more accurate than baselines that require fine-tuning using samples from the target domain or some level of supervision. Our code is available at https://github.com/AmitRozner/domain-generalizable-multiple-domain-clustering.
△ Less
Submitted 31 January, 2024; v1 submitted 31 January, 2023;
originally announced January 2023.