-
Incorporating stochastic gene expression, signaling-mediated intercellular interactions, and regulated cell proliferation in models of coordinated tissue development
Authors:
Casey O. Barkan,
Tom Chou
Abstract:
Formulating quantitative and predictive models for tissue development requires consideration of the complex, stochastic gene expression dynamics, its regulation via cell-to-cell interactions, and cell proliferation. Including all of these processes into a practical mathematical framework requires complex expressions that are difficult to interpret and apply. We construct a simple theory that incor…
▽ More
Formulating quantitative and predictive models for tissue development requires consideration of the complex, stochastic gene expression dynamics, its regulation via cell-to-cell interactions, and cell proliferation. Including all of these processes into a practical mathematical framework requires complex expressions that are difficult to interpret and apply. We construct a simple theory that incorporates intracellular stochastic gene expression dynamics, signaling chemicals that influence these dynamics and mediate cell-cell interactions, and cell proliferation and its accompanying differentiation. Cellular states (genetic and epigenetic) are described by a Waddington vector field that allows for non-gradient dynamics (cycles, entropy production, loss of detailed balance) which is precluded in Waddington potential landscape representations of gene expression dynamics. We define an epigenetic fitness landscape that describes the proliferation of different cell types, and elucidate how this fitness landscape is related to Waddington's vector field. We illustrate the applicability of our framework by analyzing two model systems: an interacting two-gene differentiation process and a spatiotemporal organism model inspired by planaria.
△ Less
Submitted 19 January, 2025;
originally announced January 2025.
-
BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation
Authors:
Oren Barkan,
Yehonatan Elisha,
Jonathan Weill,
Noam Koenigstein
Abstract:
Two prominent challenges in explainability research involve 1) the nuanced evaluation of explanations and 2) the modeling of missing information through baseline representations. The existing literature introduces diverse evaluation metrics, each scrutinizing the quality of explanations through distinct lenses. Additionally, various baseline representations have been proposed, each modeling the no…
▽ More
Two prominent challenges in explainability research involve 1) the nuanced evaluation of explanations and 2) the modeling of missing information through baseline representations. The existing literature introduces diverse evaluation metrics, each scrutinizing the quality of explanations through distinct lenses. Additionally, various baseline representations have been proposed, each modeling the notion of missingness differently. Yet, a consensus on the ultimate evaluation metric and baseline representation remains elusive. This work acknowledges the diversity in explanation metrics and baselines, demonstrating that different metrics exhibit preferences for distinct explanation maps resulting from the utilization of different baseline representations and distributions. To address the diversity in metrics and accommodate the variety of baseline representations in a unified manner, we propose Baseline Exploration-Exploitation (BEE) - a path-integration method that introduces randomness to the integration process by modeling the baseline as a learned random tensor. This tensor follows a learned mixture of baseline distributions optimized through a contextual exploration-exploitation procedure to enhance performance on the specific metric of interest. By resampling the baseline from the learned distribution, BEE generates a comprehensive set of explanation maps, facilitating the selection of the best-performing explanation map in this broad set for the given metric. Extensive evaluations across various model architectures showcase the superior performance of BEE in comparison to state-of-the-art explanation methods on a variety of objective evaluation metrics.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
Can an increase in productivity cause a decrease in production? Insights from a model economy with AI automation
Authors:
Casey O. Barkan
Abstract:
It is widely assumed that increases in economic productivity necessarily lead to economic growth. In this paper, it is shown that this is not always the case. An idealized model of an economy is presented in which a new technology allows capital to be utilized autonomously without labor input. This is motivated by the possibility that advances in artificial intelligence (AI) will give rise to AI a…
▽ More
It is widely assumed that increases in economic productivity necessarily lead to economic growth. In this paper, it is shown that this is not always the case. An idealized model of an economy is presented in which a new technology allows capital to be utilized autonomously without labor input. This is motivated by the possibility that advances in artificial intelligence (AI) will give rise to AI agents that act autonomously in the economy. The economic model involves a single profit-maximizing firm which is a monopolist in the product market and a monopsonist in the labor market. The new automation technology causes the firm to replace labor with capital in such a way that its profit increases while total production decreases. The model is not intended to capture the structure of a real economy, but rather to illustrate how basic economic mechanisms can give rise to counterintuitive and undesirable outcomes.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
On the convergence of phase space distributions to microcanonical equilibrium: dynamical isometry and generalized coarse-graining
Authors:
Casey O. Barkan
Abstract:
This work explores the manner in which classical phase space distribution functions converge to the microcanonical distribution. We first prove a theorem about the lack of convergence, then define a generalization of the coarse-graining procedure that leads to convergence. We prove that the time evolution of phase space distributions is an isometry for a broad class of statistical distance metrics…
▽ More
This work explores the manner in which classical phase space distribution functions converge to the microcanonical distribution. We first prove a theorem about the lack of convergence, then define a generalization of the coarse-graining procedure that leads to convergence. We prove that the time evolution of phase space distributions is an isometry for a broad class of statistical distance metrics, implying that ensembles do not get any closer to (or farther from) equilibrium, according to these metrics. This extends the known result that strong convergence of phase space distributions to the microcanonical distribution does not occur. However, it has long been known that weak convergence can occur, such that coarse-grained distributions--defined by partitioning phase space into a finite number of cells--converge pointwise to the microcanonical distribution. We define a generalization of coarse-graining that removes the need for partitioning phase space into cells. We prove that our generalized coarse-grained distribution converges pointwise to the microcanonical distribution if the dynamics are strong mixing. As an example, we study an ensemble of triangular billiard systems.
△ Less
Submitted 23 August, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers
Authors:
Yakir Yehuda,
Itzik Malkiel,
Oren Barkan,
Jonathan Weill,
Royi Ronen,
Noam Koenigstein
Abstract:
Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present…
▽ More
Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios. Through extensive evaluations across multiple datasets and LLMs, including Llama-2, we study the hallucination levels of various recent LLMs and demonstrate the effectiveness of our method to automatically detect them. Notably, we observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.
△ Less
Submitted 19 August, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
DiffMoog: a Differentiable Modular Synthesizer for Sound Matching
Authors:
Noy Uzrad,
Oren Barkan,
Almog Elharar,
Shlomi Shvartzman,
Moshe Laufer,
Lior Wolf,
Noam Koenigstein
Abstract:
This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope sha…
▽ More
This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Visual Explanations via Iterated Integrated Attributions
Authors:
Oren Barkan,
Yehonatan Elisha,
Yuval Asher,
Amit Eshel,
Noam Koenigstein
Abstract:
We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datas…
▽ More
We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datasets, and network architectures. Our results showcase that IIA produces accurate explanation maps, outperforming other state-of-the-art explanation techniques.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models
Authors:
Oren Barkan,
Yuval Asher,
Amit Eshel,
Yehonatan Elisha,
Noam Koenigstein
Abstract:
We present Learning to Explain (LTX), a model-agnostic framework designed for providing post-hoc explanations for vision models. The LTX framework introduces an "explainer" model that generates explanation maps, highlighting the crucial regions that justify the predictions made by the model being explained. To train the explainer, we employ a two-stage process consisting of initial pretraining fol…
▽ More
We present Learning to Explain (LTX), a model-agnostic framework designed for providing post-hoc explanations for vision models. The LTX framework introduces an "explainer" model that generates explanation maps, highlighting the crucial regions that justify the predictions made by the model being explained. To train the explainer, we employ a two-stage process consisting of initial pretraining followed by per-instance finetuning. During both stages of training, we utilize a unique configuration where we compare the explained model's prediction for a masked input with its original prediction for the unmasked input. This approach enables the use of a novel counterfactual objective, which aims to anticipate the model's output using masked versions of the input image. Importantly, the LTX framework is not restricted to a specific model architecture and can provide explanations for both Transformer-based and convolutional models. Through our evaluations, we demonstrate that LTX significantly outperforms the current state-of-the-art in explainability across various metrics.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Deep Integrated Explanations
Authors:
Oren Barkan,
Yehonatan Elisha,
Jonathan Weill,
Yuval Asher,
Amit Eshel,
Noam Koenigstein
Abstract:
This paper presents Deep Integrated Explanations (DIX) - a universal method for explaining vision models. DIX generates explanation maps by integrating information from the intermediate representations of the model, coupled with their corresponding gradients. Through an extensive array of both objective and subjective evaluations spanning diverse tasks, datasets, and model configurations, we showc…
▽ More
This paper presents Deep Integrated Explanations (DIX) - a universal method for explaining vision models. DIX generates explanation maps by integrating information from the intermediate representations of the model, coupled with their corresponding gradients. Through an extensive array of both objective and subjective evaluations spanning diverse tasks, datasets, and model configurations, we showcase the efficacy of DIX in generating faithful and accurate explanation maps, while surpassing current state-of-the-art methods.
△ Less
Submitted 27 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Migration feedback induces emergent ecotypes and abrupt transitions in evolving populations
Authors:
Casey O. Barkan,
Shenshen Wang
Abstract:
We explore the connection between migration patterns and emergent behaviors of evolving populations in spatially heterogeneous environments. Despite extensive studies in ecologically and medically important systems, a unifying framework that clarifies this connection and makes concrete predictions remains much needed. Using a simple evolutionary model on a network of interconnected habitats with d…
▽ More
We explore the connection between migration patterns and emergent behaviors of evolving populations in spatially heterogeneous environments. Despite extensive studies in ecologically and medically important systems, a unifying framework that clarifies this connection and makes concrete predictions remains much needed. Using a simple evolutionary model on a network of interconnected habitats with distinct fitness landscapes, we demonstrate a fundamental connection between migration feedback, emergent ecotypes, and an unusual form of discontinuous critical transition. We show how migration feedback generates spatially non-local niches in which emergent ecotypes can specialize. Rugged fitness landscapes lead to a complex, yet understandable, phase diagram in which different ecotypes coexist under different migration patterns. The discontinuous transitions are distinct from the standard first-order phase transitions in statistical physics. They arise due to simultaneous transcritical bifurcations and exhibit a "fine structure" due to symmetry breaking between intra- and inter-ecotype interactions. We suggest feasible experiments to test our predictions.
△ Less
Submitted 8 January, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond
Authors:
Oren Barkan,
Tal Reiss,
Jonathan Weill,
Ori Katz,
Roy Hirsch,
Itzik Malkiel,
Noam Koenigstein
Abstract:
Visual similarities discovery (VSD) is an important task with broad e-commerce applications. Given an image of a certain object, the goal of VSD is to retrieve images of different objects with high perceptual visual similarity. Although being a highly addressed problem, the evaluation of proposed methods for VSD is often based on a proxy of an identification-retrieval task, evaluating the ability…
▽ More
Visual similarities discovery (VSD) is an important task with broad e-commerce applications. Given an image of a certain object, the goal of VSD is to retrieve images of different objects with high perceptual visual similarity. Although being a highly addressed problem, the evaluation of proposed methods for VSD is often based on a proxy of an identification-retrieval task, evaluating the ability of a model to retrieve different images of the same object. We posit that evaluating VSD methods based on identification tasks is limited, and faithful evaluation must rely on expert annotations. In this paper, we introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs. Besides this major contribution, we share insight from the challenges we faced while curating this dataset. Based on these insights, we propose a novel and efficient labeling procedure that can be applied to any dataset. Our analysis examines its limitations and inductive biases, and based on these findings, we propose metrics to mitigate those limitations. Though our primary focus lies on visual similarity, the methodologies we present have broader applications for discovering and evaluating perceptual similarity across various domains.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Representation Learning via Variational Bayesian Networks
Authors:
Oren Barkan,
Avi Caciularu,
Idan Rejwan,
Ori Katz,
Jonathan Weill,
Itzik Malkiel,
Noam Koenigstein
Abstract:
We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that…
▽ More
We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that enable information propagation between entities sharing common ancestors. Additionally, VBN models explicit relations between entities that enforce complementary structure and consistency, guiding the learned representations towards a more meaningful arrangement in space. Second, VBN represents entities by densities (rather than vectors), hence modeling uncertainty that plays a complementary role in coping with data scarcity. Finally, we propose a scalable Variational Bayes optimization algorithm that enables fast approximate Bayesian inference. We evaluate the effectiveness of VBN on linguistic, recommendations, and medical inference tasks. Our findings show that VBN outperforms other existing methods across multiple datasets, and especially in the long-tail.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
GPT-Calls: Enhancing Call Segmentation and Tagging by Generating Synthetic Conversations via Large Language Models
Authors:
Itzik Malkiel,
Uri Alon,
Yakir Yehuda,
Shahar Keren,
Oren Barkan,
Royi Ronen,
Noam Koenigstein
Abstract:
Transcriptions of phone calls are of significant value across diverse fields, such as sales, customer service, healthcare, and law enforcement. Nevertheless, the analysis of these recorded conversations can be an arduous and time-intensive process, especially when dealing with extended or multifaceted dialogues. In this work, we propose a novel method, GPT-distilled Calls Segmentation and Tagging…
▽ More
Transcriptions of phone calls are of significant value across diverse fields, such as sales, customer service, healthcare, and law enforcement. Nevertheless, the analysis of these recorded conversations can be an arduous and time-intensive process, especially when dealing with extended or multifaceted dialogues. In this work, we propose a novel method, GPT-distilled Calls Segmentation and Tagging (GPT-Calls), for efficient and accurate call segmentation and topic extraction. GPT-Calls is composed of offline and online phases. The offline phase is applied once to a given list of topics and involves generating a distribution of synthetic sentences for each topic using a GPT model and extracting anchor vectors. The online phase is applied to every call separately and scores the similarity between the transcripted conversation and the topic anchors found in the offline phase. Then, time domain analysis is applied to the similarity scores to group utterances into segments and tag them with topics. The proposed paradigm provides an accurate and efficient method for call segmentation and topic extraction that does not require labeled data, thus making it a versatile approach applicable to various domains. Our algorithm operates in production under Dynamics 365 Sales Conversation Intelligence, and our research is based on real sales conversations gathered from various Dynamics 365 Sales tenants.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Geometric Signatures of Switching Behavior in Mechanobiology
Authors:
Casey O. Barkan,
Robijn F. Bruinsma
Abstract:
The proteins involved in cells' mechanobiological processes have evolved specialized and surprising responses to applied forces. Biochemical transformations that show catch-to-slip switching and force-induced pathway switching serve important functions in cell adhesion, mechano-sensing and signaling, and protein folding. We show that these switching behaviors are generated by singularities in the…
▽ More
The proteins involved in cells' mechanobiological processes have evolved specialized and surprising responses to applied forces. Biochemical transformations that show catch-to-slip switching and force-induced pathway switching serve important functions in cell adhesion, mechano-sensing and signaling, and protein folding. We show that these switching behaviors are generated by singularities in the flow field that describes force-induced deformation of bound and transition states. These singularities allow for a complete characterization of switching mechanisms in 2-dimensional (2D) free energy landscapes, and provide a path toward elucidating novel forms of switching in higher dimensional models. Remarkably, the singularity that generates a catch-slip switch occurs in almost every 2D free energy landscape, implying that almost any bond admitting a 2D model will exhibit catch-slip behavior under appropriate force. We apply our analysis to models of P-selectin and antigen extraction to illustrate how these singularities provide an intuitive framework for explaining known behaviors and predicting new behaviors.
△ Less
Submitted 26 February, 2023; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Interpreting BERT-based Text Similarity via Activation and Saliency Maps
Authors:
Itzik Malkiel,
Dvir Ginzburg,
Oren Barkan,
Avi Caciularu,
Jonathan Weill,
Noam Koenigstein
Abstract:
Recently, there has been growing interest in the ability of Transformer-based models to produce meaningful embeddings of text with several applications, such as text similarity. Despite significant progress in the field, the explanations for similarity predictions remain challenging, especially in unsupervised settings. In this work, we present an unsupervised technique for explaining paragraph si…
▽ More
Recently, there has been growing interest in the ability of Transformer-based models to produce meaningful embeddings of text with several applications, such as text similarity. Despite significant progress in the field, the explanations for similarity predictions remain challenging, especially in unsupervised settings. In this work, we present an unsupervised technique for explaining paragraph similarities inferred by pre-trained BERT models. By looking at a pair of paragraphs, our technique identifies important words that dictate each paragraph's semantics, matches between the words in both paragraphs, and retrieves the most important pairs that explain the similarity between the two. The method, which has been assessed by extensive human evaluations and demonstrated on datasets comprising long and complex paragraphs, has shown great promise, providing accurate interpretations that correlate better with human perceptions.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
MetricBERT: Text Representation Learning via Self-Supervised Triplet Training
Authors:
Itzik Malkiel,
Dvir Ginzburg,
Oren Barkan,
Avi Caciularu,
Yoni Weill,
Noam Koenigstein
Abstract:
We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of…
▽ More
We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
Authors:
Oren Barkan,
Edan Hauon,
Avi Caciularu,
Ori Katz,
Itzik Malkiel,
Omri Armstrong,
Noam Koenigstein
Abstract:
Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements…
▽ More
Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
Cold Item Integration in Deep Hybrid Recommenders via Tunable Stochastic Gates
Authors:
Oren Barkan,
Roy Hirsch,
Ori Katz,
Avi Caciularu,
Jonathan Weill,
Noam Koenigstein
Abstract:
A major challenge in collaborative filtering methods is how to produce recommendations for cold items (items with no ratings), or integrate cold item into an existing catalog. Over the years, a variety of hybrid recommendation models have been proposed to address this problem by utilizing items' metadata and content along with their ratings or usage patterns. In this work, we wish to revisit the c…
▽ More
A major challenge in collaborative filtering methods is how to produce recommendations for cold items (items with no ratings), or integrate cold item into an existing catalog. Over the years, a variety of hybrid recommendation models have been proposed to address this problem by utilizing items' metadata and content along with their ratings or usage patterns. In this work, we wish to revisit the cold start problem in order to draw attention to an overlooked challenge: the ability to integrate and balance between (regular) warm items and completely cold items. In this case, two different challenges arise: (1) preserving high quality performance on warm items, while (2) learning to promote cold items to relevant users. First, we show that these two objectives are in fact conflicting, and the balance between them depends on the business needs and the application at hand. Next, we propose a novel hybrid recommendation algorithm that bridges these two conflicting objectives and enables a harmonized balance between preserving high accuracy for warm items while effectively promoting completely cold items. We demonstrate the effectiveness of the proposed algorithm on movies, apps, and articles recommendations, and provide an empirical analysis of the cold-warm trade-off.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
GAM: Explainable Visual Similarity and Classification via Gradient Activation Maps
Authors:
Oren Barkan,
Omri Armstrong,
Amir Hertz,
Avi Caciularu,
Ori Katz,
Itzik Malkiel,
Noam Koenigstein
Abstract:
We present Gradient Activation Maps (GAM) - a machinery for explaining predictions made by visual similarity and classification models. By gleaning localized gradient and activation information from multiple network layers, GAM offers improved visual explanations, when compared to existing alternatives. The algorithmic advantages of GAM are explained in detail, and validated empirically, where it…
▽ More
We present Gradient Activation Maps (GAM) - a machinery for explaining predictions made by visual similarity and classification models. By gleaning localized gradient and activation information from multiple network layers, GAM offers improved visual explanations, when compared to existing alternatives. The algorithmic advantages of GAM are explained in detail, and validated empirically, where it is shown that GAM outperforms its alternatives across various tasks and datasets.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference
Authors:
Dvir Ginzburg,
Itzik Malkiel,
Oren Barkan,
Avi Caciularu,
Noam Koenigstein
Abstract:
We present a novel model for the problem of ranking a collection of documents according to their semantic similarity to a source (query) document. While the problem of document-to-document similarity ranking has been studied, most modern methods are limited to relatively short documents or rely on the existence of "ground-truth" similarity labels. Yet, in most common real-world cases, similarity r…
▽ More
We present a novel model for the problem of ranking a collection of documents according to their semantic similarity to a source (query) document. While the problem of document-to-document similarity ranking has been studied, most modern methods are limited to relatively short documents or rely on the existence of "ground-truth" similarity labels. Yet, in most common real-world cases, similarity ranking is an unsupervised problem as similarity labels are unavailable. Moreover, an ideal model should not be restricted by documents' length. Hence, we introduce SDR, a self-supervised method for document similarity that can be applied to documents of arbitrary length. Importantly, SDR can be effectively applied to extremely long documents, exceeding the 4,096 maximal token limits of Longformer. Extensive evaluations on large document datasets show that SDR significantly outperforms its alternatives across all metrics. To accelerate future research on unlabeled long document similarity ranking, and as an additional contribution to the community, we herein publish two human-annotated test sets of long documents similarity evaluation. The SDR code and datasets are publicly available.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Forecasting CPI Inflation Components with Hierarchical Recurrent Neural Networks
Authors:
Oren Barkan,
Jonathan Benchimol,
Itamar Caspi,
Eliya Cohen,
Allon Hammer,
Noam Koenigstein
Abstract:
We present a hierarchical architecture based on Recurrent Neural Networks (RNNs) for predicting disaggregated inflation components of the Consumer Price Index (CPI). While the majority of existing research is focused mainly on predicting the inflation headline, many economic and financial entities are more interested in its partial disaggregated components. To this end, we developed the novel Hier…
▽ More
We present a hierarchical architecture based on Recurrent Neural Networks (RNNs) for predicting disaggregated inflation components of the Consumer Price Index (CPI). While the majority of existing research is focused mainly on predicting the inflation headline, many economic and financial entities are more interested in its partial disaggregated components. To this end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model that utilizes information from higher levels in the CPI hierarchy to improve predictions at the more volatile lower levels. Our evaluations, based on a large data-set from the US CPI-U index, indicate that the HRNN model significantly outperforms a vast array of well-known inflation prediction baselines.
△ Less
Submitted 17 February, 2022; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Explainable Recommendations via Attentive Multi-Persona Collaborative Filtering
Authors:
Oren Barkan,
Yonatan Fuchs,
Avi Caciularu,
Noam Koenigstein
Abstract:
Two main challenges in recommender systems are modeling users with heterogeneous taste, and providing explainable recommendations. In this paper, we propose the neural Attentive Multi-Persona Collaborative Filtering (AMP-CF) model as a unified solution for both problems. AMP-CF breaks down the user to several latent 'personas' (profiles) that identify and discern the different tastes and inclinati…
▽ More
Two main challenges in recommender systems are modeling users with heterogeneous taste, and providing explainable recommendations. In this paper, we propose the neural Attentive Multi-Persona Collaborative Filtering (AMP-CF) model as a unified solution for both problems. AMP-CF breaks down the user to several latent 'personas' (profiles) that identify and discern the different tastes and inclinations of the user. Then, the revealed personas are used to generate and explain the final recommendation list for the user. AMP-CF models users as an attentive mixture of personas, enabling a dynamic user representation that changes based on the item under consideration. We demonstrate AMP-CF on five collaborative filtering datasets from the domains of movies, music, video games and social networks. As an additional contribution, we propose a novel evaluation scheme for comparing the different items in a recommendation list based on the distance from the underlying distribution of "tastes" in the user's historical items. Experimental results show that AMP-CF is competitive with other state-of-the-art models. Finally, we provide qualitative results to showcase the ability of AMP-CF to explain its recommendations.
△ Less
Submitted 26 September, 2020;
originally announced October 2020.
-
RecoBERT: A Catalog Language Model for Text-Based Recommendations
Authors:
Itzik Malkiel,
Oren Barkan,
Avi Caciularu,
Noam Razin,
Ori Katz,
Noam Koenigstein
Abstract:
Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learni…
▽ More
Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learning catalog-specialized language models for text-based item recommendations. We suggest novel training and inference procedures for scoring similarities between pairs of items, that don't require item similarity labels. Both the training and the inference techniques were designed to utilize the unlabeled structure of textual catalogs, and minimize the discrepancy between them. By incorporating four scores during inference, RecoBERT can infer text-based item-to-item similarities more accurately than other techniques. In addition, we introduce a new language understanding task for wine recommendations using similarities based on professional wine reviews. As an additional contribution, we publish annotated recommendations dataset crafted by human wine experts. Finally, we evaluate RecoBERT and compare it to various state-of-the-art NLP models on wine and fashion recommendations tasks.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Bayesian Hierarchical Words Representation Learning
Authors:
Oren Barkan,
Idan Rejwan,
Avi Caciularu,
Noam Koenigstein
Abstract:
This paper presents the Bayesian Hierarchical Words Representation (BHWR) learning algorithm. BHWR facilitates Variational Bayes word representation learning combined with semantic taxonomy modeling via hierarchical priors. By propagating relevant information between related words, BHWR utilizes the taxonomy to improve the quality of such representations. Evaluation of several linguistic datasets…
▽ More
This paper presents the Bayesian Hierarchical Words Representation (BHWR) learning algorithm. BHWR facilitates Variational Bayes word representation learning combined with semantic taxonomy modeling via hierarchical priors. By propagating relevant information between related words, BHWR utilizes the taxonomy to improve the quality of such representations. Evaluation of several linguistic datasets demonstrates the advantages of BHWR over suitable alternatives that facilitate Bayesian modeling with or without semantic priors. Finally, we further show that BHWR produces better representations for rare words.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Neural Attentive Multiview Machines
Authors:
Oren Barkan,
Ori Katz,
Noam Koenigstein
Abstract:
An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism. NAM harnesses multiple information sources and automatically quantifies their relevancy…
▽ More
An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism. NAM harnesses multiple information sources and automatically quantifies their relevancy with respect to a supervised task. Finally, a very practical advantage of NAM is its robustness to the case of dataset with missing views. We demonstrate the effectiveness of NAM for the task of movies and app recommendations. Our evaluations indicate that NAM outperforms single view models as well as alternative multiview methods on item recommendations tasks, including cold-start scenarios.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Attentive Item2Vec: Neural Attentive User Representations
Authors:
Oren Barkan,
Avi Caciularu,
Ori Katz,
Noam Koenigstein
Abstract:
Factorization methods for recommender systems tend to represent users as a single latent vector. However, user behavior and interests may change in the context of the recommendations that are presented to the user. For example, in the case of movie recommendations, it is usually true that earlier user data is less informative than more recent data. However, it is possible that a certain early movi…
▽ More
Factorization methods for recommender systems tend to represent users as a single latent vector. However, user behavior and interests may change in the context of the recommendations that are presented to the user. For example, in the case of movie recommendations, it is usually true that earlier user data is less informative than more recent data. However, it is possible that a certain early movie may become suddenly more relevant in the presence of a popular sequel movie. This is just a single example of a variety of possible dynamically altering user interests in the presence of a potential new recommendation. In this work, we present Attentive Item2vec (AI2V) - a novel attentive version of Item2vec (I2V). AI2V employs a context-target attention mechanism in order to learn and capture different characteristics of user historical behavior (context) with respect to a potential recommended item (target). The attentive context-target mechanism enables a final neural attentive user representation. We demonstrate the effectiveness of AI2V on several datasets, where it is shown to outperform other baselines.
△ Less
Submitted 19 April, 2020; v1 submitted 15 February, 2020;
originally announced February 2020.
-
Multiscale Self Attentive Convolutions for Vision and Language Modeling
Authors:
Oren Barkan
Abstract:
Self attention mechanisms have become a key building block in many state-of-the-art language understanding models. In this paper, we show that the self attention operator can be formulated in terms of 1x1 convolution operations. Following this observation, we propose several novel operators: First, we introduce a 2D version of self attention that is applicable for 2D signals such as images. Second…
▽ More
Self attention mechanisms have become a key building block in many state-of-the-art language understanding models. In this paper, we show that the self attention operator can be formulated in terms of 1x1 convolution operations. Following this observation, we propose several novel operators: First, we introduce a 2D version of self attention that is applicable for 2D signals such as images. Second, we present the 1D and 2D Self Attentive Convolutions (SAC) operator that generalizes self attention beyond 1x1 convolutions to 1xm and nxm convolutions, respectively. While 1D and 2D self attention operate on individual words and pixels, SAC operates on m-grams and image patches, respectively. Third, we present a multiscale version of SAC (MSAC) which analyzes the input by employing multiple SAC operators that vary by filter size, in parallel. Finally, we explain how MSAC can be utilized for vision and language modeling, and further harness MSAC to form a cross attentive image similarity machinery.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
Authors:
Oren Barkan,
Noam Razin,
Itzik Malkiel,
Ori Katz,
Avi Caciularu,
Noam Koenigstein
Abstract:
Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-cand…
▽ More
Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/Distilled-Sentence-Embedding.
△ Less
Submitted 21 November, 2019; v1 submitted 14 August, 2019;
originally announced August 2019.
-
InverSynth: Deep Estimation of Synthesizer Parameter Configurations from Audio Signals
Authors:
Oren Barkan,
David Tsiris,
Ori Katz,
Noam Koenigstein
Abstract:
Sound synthesis is a complex field that requires domain expertise. Manual tuning of synthesizer parameters to match a specific sound can be an exhaustive task, even for experienced sound engineers. In this paper, we introduce InverSynth - an automatic method for synthesizer parameters tuning to match a given input sound. InverSynth is based on strided convolutional neural networks and is capable o…
▽ More
Sound synthesis is a complex field that requires domain expertise. Manual tuning of synthesizer parameters to match a specific sound can be an exhaustive task, even for experienced sound engineers. In this paper, we introduce InverSynth - an automatic method for synthesizer parameters tuning to match a given input sound. InverSynth is based on strided convolutional neural networks and is capable of inferring the synthesizer parameters configuration from the input spectrogram and even from the raw audio. The effectiveness InverSynth is demonstrated on a subtractive synthesizer with four frequency modulated oscillators, envelope generator and a gater effect. We present extensive quantitative and qualitative results that showcase the superiority InverSynth over several baselines. Furthermore, we show that the network depth is an important factor that contributes to the prediction accuracy.
△ Less
Submitted 21 November, 2019; v1 submitted 15 December, 2018;
originally announced December 2018.
-
Automated selection of the local potential for transferable pseudopotentials
Authors:
Casey O. Barkan,
Andrew M. Rappe
Abstract:
We develop an automated procedure to select the local potential of a separable pseudopotential that minimizes transferability errors for the isolated atom, and we show that this optimization leads to significant improvements in the accuracy of predicted solid-state properties. We present pseudopotentials for Y, In, and Sn. For these pseudopotentials, our method reduces solid-state errors by 88% on…
▽ More
We develop an automated procedure to select the local potential of a separable pseudopotential that minimizes transferability errors for the isolated atom, and we show that this optimization leads to significant improvements in the accuracy of predicted solid-state properties. We present pseudopotentials for Y, In, and Sn. For these pseudopotentials, our method reduces solid-state errors by 88% on average, as measured by the $Δ$-factor test. These pseudopotentials are constructed in the Kleinman-Bylander form; however, our method is applicable to all separable pseudopotentials, such as ONCV pseudopotentials. We perform plane-wave convergence tests according to SSSP standards and show that the modifications to the local potential leave plane-wave convergence unchanged.
△ Less
Submitted 2 November, 2018;
originally announced November 2018.
-
Predicting Relevance Scores for Triples from Type-Like Relations using Neural Embedding - The Cabbage Triple Scorer at WSDM Cup 2017
Authors:
Yael Brumer,
Bracha Shapira,
Lior Rokach,
Oren Barkan
Abstract:
The WSDM Cup 2017 Triple scoring challenge is aimed at calculating and assigning relevance scores for triples from type-like relations. Such scores are a fundamental ingredient for ranking results in entity search. In this paper, we propose a method that uses neural embedding techniques to accurately calculate an entity score for a triple based on its nearest neighbor. We strive to develop a new l…
▽ More
The WSDM Cup 2017 Triple scoring challenge is aimed at calculating and assigning relevance scores for triples from type-like relations. Such scores are a fundamental ingredient for ranking results in entity search. In this paper, we propose a method that uses neural embedding techniques to accurately calculate an entity score for a triple based on its nearest neighbor. We strive to develop a new latent semantic model with a deep structure that captures the semantic and syntactic relations between words. Our method has been ranked among the top performers with accuracy - 0.74, average score difference - 1.74, and average Kendall's Tau - 0.35.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
CB2CF: A Neural Multiview Content-to-Collaborative Filtering Model for Completely Cold Item Recommendations
Authors:
Oren Barkan,
Noam Koenigstein,
Eylon Yogev,
Ori Katz
Abstract:
In Recommender Systems research, algorithms are often characterized as either Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained using a dataset of user preferences while CB algorithms are typically based on item profiles. These approaches harness different data sources and therefore the resulting recommended items are generally very different. This paper presents the CB…
▽ More
In Recommender Systems research, algorithms are often characterized as either Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained using a dataset of user preferences while CB algorithms are typically based on item profiles. These approaches harness different data sources and therefore the resulting recommended items are generally very different. This paper presents the CB2CF, a deep neural multiview model that serves as a bridge from items content into their CF representations. CB2CF is a real-world algorithm designed for Microsoft Store services that handle around a billion users worldwide. CB2CF is demonstrated on movies and apps recommendations, where it is shown to outperform an alternative CB model on completely cold items.
△ Less
Submitted 21 September, 2019; v1 submitted 1 November, 2016;
originally announced November 2016.
-
Bayesian Neural Word Embedding
Authors:
Oren Barkan
Abstract:
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm. The algorithm relies on a Variational Bayes solution for the Skip-Gra…
▽ More
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm. The algorithm relies on a Variational Bayes solution for the Skip-Gram objective and a detailed step by step description is provided. We present experimental results that demonstrate the performance of the proposed algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.
△ Less
Submitted 20 February, 2017; v1 submitted 21 March, 2016;
originally announced March 2016.
-
Item2Vec: Neural Item Embedding for Collaborative Filtering
Authors:
Oren Barkan,
Noam Koenigstein
Abstract:
Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was show…
▽ More
Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD.
△ Less
Submitted 20 February, 2017; v1 submitted 14 March, 2016;
originally announced March 2016.
-
Gaussian Process Regression for Out-of-Sample Extension
Authors:
Oren Barkan,
Jonathan Weill,
Amir Averbuch
Abstract:
Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding…
▽ More
Manifold learning methods are useful for high dimensional data analysis. Many of the existing methods produce a low dimensional representation that attempts to describe the intrinsic geometric structure of the original data. Typically, this process is computationally expensive and the produced embedding is limited to the training data. In many real life scenarios, the ability to produce embedding of unseen samples is essential. In this paper we propose a Bayesian non-parametric approach for out-of-sample extension. The method is based on Gaussian Process Regression and independent of the manifold learning algorithm. Additionally, the method naturally provides a measure for the degree of abnormality for a newly arrived data point that did not participate in the training process. We derive the mathematical connection between the proposed method and the Nystrom extension and show that the latter is a special case of the former. We present extensive experimental results that demonstrate the performance of the proposed method and compare it to other existing out-of-sample extension methods.
△ Less
Submitted 5 June, 2016; v1 submitted 7 March, 2016;
originally announced March 2016.