-
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Authors:
Jesus Rios,
Pierre Dognin,
Ronny Luss,
Karthikeyan N. Ramamurthy
Abstract:
Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. Parameter-Efficient Fine-Tuning (PEFT) methods aim at significantly reducing the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters. Currently, the most popul…
▽ More
Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. Parameter-Efficient Fine-Tuning (PEFT) methods aim at significantly reducing the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters. Currently, the most popular PEFT method is the Low-Rank Adaptation (LoRA), which freezes the parameters of the model to be fine-tuned and introduces a small set of trainable parameters in the form of low-rank matrices. We propose simply reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on. In this paper, we compare the efficiency and performance of our proposed approach with PEFT methods, including LoRA, as well as full parameter fine-tuning.
△ Less
Submitted 21 May, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
Granite Guardian
Authors:
Inkit Padhi,
Manish Nagireddy,
Giandomenico Cornacchia,
Subhajit Chaudhury,
Tejaswini Pedapati,
Pierre Dognin,
Keerthiram Murugesan,
Erik Miehling,
Martín Santillán Cooper,
Kieran Fraser,
Giulio Zizzo,
Muhammad Zaid Hameed,
Mark Purcell,
Michael Desmond,
Qian Pan,
Zahra Ashktorab,
Inge Vejsbjerg,
Elizabeth M. Daly,
Michael Hind,
Werner Geyer,
Ambrish Rawat,
Kush R. Varshney,
Prasanna Sattigeri
Abstract:
We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-r…
▽ More
We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks such as context relevance, groundedness, and answer relevance for retrieval-augmented generation (RAG). Trained on a unique dataset combining human annotations from diverse sources and synthetic data, Granite Guardian models address risks typically overlooked by traditional risk detection models, such as jailbreaks and RAG-specific issues. With AUC scores of 0.871 and 0.854 on harmful content and RAG-hallucination-related benchmarks respectively, Granite Guardian is the most generalizable and competitive model available in the space. Released as open-source, Granite Guardian aims to promote responsible AI development across the community.
https://github.com/ibm-granite/granite-guardian
△ Less
Submitted 16 December, 2024; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Evaluating the Prompt Steerability of Large Language Models
Authors:
Erik Miehling,
Michael Desmond,
Karthikeyan Natesan Ramamurthy,
Elizabeth M. Daly,
Pierre Dognin,
Jesus Rios,
Djallel Bouneffouf,
Miao Liu
Abstract:
Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on…
▽ More
Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model's joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited -- due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at https://github.com/IBM/prompt-steering.
△ Less
Submitted 15 February, 2025; v1 submitted 19 November, 2024;
originally announced November 2024.
-
Programming Refusal with Conditional Activation Steering
Authors:
Bruce W. Lee,
Inkit Padhi,
Karthikeyan Natesan Ramamurthy,
Erik Miehling,
Pierre Dognin,
Manish Nagireddy,
Amit Dhurandhar
Abstract:
LLMs have shown remarkable capabilities, but precisely controlling their response behavior remains challenging. Existing activation steering methods alter LLM behavior indiscriminately, limiting their practical applicability in settings where selective responses are essential, such as content moderation or domain-specific assistants. In this paper, we propose Conditional Activation Steering (CAST)…
▽ More
LLMs have shown remarkable capabilities, but precisely controlling their response behavior remains challenging. Existing activation steering methods alter LLM behavior indiscriminately, limiting their practical applicability in settings where selective responses are essential, such as content moderation or domain-specific assistants. In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context. Our method is based on the observation that different categories of prompts activate distinct patterns in the model's hidden states. Using CAST, one can systematically control LLM behavior with rules like "if input is about hate speech or adult content, then refuse" or "if input is not about legal advice, then refuse." This allows for selective modification of responses to specific content while maintaining normal responses to other content, all without requiring weight optimization. We release an open-source implementation of our framework at github.com/IBM/activation-steering .
△ Less
Submitted 17 February, 2025; v1 submitted 6 September, 2024;
originally announced September 2024.
-
Value Alignment from Unstructured Text
Authors:
Inkit Padhi,
Karthikeyan Natesan Ramamurthy,
Prasanna Sattigeri,
Manish Nagireddy,
Pierre Dognin,
Kush R. Varshney
Abstract:
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to th…
▽ More
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Contextual Moral Value Alignment Through Context-Based Aggregation
Authors:
Pierre Dognin,
Jesus Rios,
Ronny Luss,
Inkit Padhi,
Matthew D Riemer,
Miao Liu,
Prasanna Sattigeri,
Manish Nagireddy,
Kush R. Varshney,
Djallel Bouneffouf
Abstract:
Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. I…
▽ More
Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within the domain of Large Language Models (LLMs), the capability to consolidate multiple independently trained dialogue agents, each aligned with a distinct moral value, into a unified system that can adapt to and be aligned with multiple moral values is of paramount importance. In this paper, we propose a system that does contextual moral value alignment based on contextual aggregation. Here, aggregation is defined as the process of integrating a subset of LLM responses that are best suited to respond to a user input, taking into account features extracted from the user's input. The proposed system shows better results in term of alignment to human value compared to the state of the art.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Authors:
Swapnaja Achintalwar,
Ioana Baldini,
Djallel Bouneffouf,
Joan Byamugisha,
Maria Chang,
Pierre Dognin,
Eitan Farchi,
Ndivhuwo Makondo,
Aleksandra Mojsilovic,
Manish Nagireddy,
Karthikeyan Natesan Ramamurthy,
Inkit Padhi,
Orna Raz,
Jesus Rios,
Prasanna Sattigeri,
Moninder Singh,
Siphiwe Thwala,
Rosario A. Uceda-Sosa,
Kush R. Varshney
Abstract:
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia…
▽ More
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Authors:
Swapnaja Achintalwar,
Adriana Alvarado Garcia,
Ateret Anaby-Tavor,
Ioana Baldini,
Sara E. Berger,
Bishwaranjan Bhattacharjee,
Djallel Bouneffouf,
Subhajit Chaudhury,
Pin-Yu Chen,
Lamogha Chiazor,
Elizabeth M. Daly,
Kirushikesh DB,
Rogério Abreu de Paula,
Pierre Dognin,
Eitan Farchi,
Soumya Ghosh,
Michael Hind,
Raya Horesh,
George Kour,
Ja Young Lee,
Nishtha Madaan,
Sameep Mehta,
Erik Miehling,
Keerthiram Murugesan,
Manish Nagireddy
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen…
▽ More
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.
△ Less
Submitted 19 August, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Authors:
Brian Belgodere,
Pierre Dognin,
Adam Ivankay,
Igor Melnyk,
Youssef Mroueh,
Aleksandra Mojsilovic,
Jiri Navratil,
Apoorva Nitsure,
Inkit Padhi,
Mattia Rigotti,
Jerret Ross,
Yair Schiff,
Radhika Vedpathak,
Richard A. Young
Abstract:
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew…
▽ More
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with "TrustFormers" across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.
△ Less
Submitted 9 June, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
Authors:
Prasanna Sattigeri,
Soumya Ghosh,
Inkit Padhi,
Pierre Dognin,
Kush R. Varshney
Abstract:
In consequential decision-making applications, mitigating unwanted biases in machine learning models that yield systematic disadvantage to members of groups delineated by sensitive attributes such as race and gender is one key intervention to strive for equity. Focusing on demographic parity and equality of opportunity, in this paper we propose an algorithm that improves the fairness of a pre-trai…
▽ More
In consequential decision-making applications, mitigating unwanted biases in machine learning models that yield systematic disadvantage to members of groups delineated by sensitive attributes such as race and gender is one key intervention to strive for equity. Focusing on demographic parity and equality of opportunity, in this paper we propose an algorithm that improves the fairness of a pre-trained classifier by simply dropping carefully selected training data points. We select instances based on their influence on the fairness metric of interest, computed using an infinitesimal jackknife-based approach. The dropping of training points is done in principle, but in practice does not require the model to be refit. Crucially, we find that such an intervention does not substantially reduce the predictive performance of the model but drastically improves the fairness metric. Through careful experiments, we evaluate the effectiveness of the proposed approach on diverse tasks and find that it consistently improves upon existing alternatives.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Knowledge Graph Generation From Text
Authors:
Igor Melnyk,
Pierre Dognin,
Payel Das
Abstract:
In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the text. For each stage we consider several architectural choices that can be used d…
▽ More
In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the text. For each stage we consider several architectural choices that can be used depending on the available training resources. We evaluated the model on a recent WebNLG 2020 Challenge dataset, matching the state-of-the-art performance on text-to-RDF generation task, as well as on New York Times (NYT) and a large-scale TekGen datasets, showing strong overall performance, outperforming the existing baselines. We believe that the proposed system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches. Our code can be found at https://github.com/IBM/Grapher
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Cloud-Based Real-Time Molecular Screening Platform with MolFormer
Authors:
Brian Belgodere,
Vijil Chenthamarakshan,
Payel Das,
Pierre Dognin,
Toby Kurien,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff,
Richard A. Young
Abstract:
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The pla…
▽ More
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The platform currently supports three tasks: nearest neighbor retrieval, chemical space visualization, and property prediction. Based on the functionalities of this platform and results obtained, we believe that such a platform can play a pivotal role in automating chemistry and chemical engineering research, as well as assist in drug discovery and material design tasks. A demo of our platform is provided at \url{www.ibm.biz/molecular_demo}.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models
Authors:
Pierre L. Dognin,
Inkit Padhi,
Igor Melnyk,
Payel Das
Abstract:
Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning (RL) to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence gener…
▽ More
Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning (RL) to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff,
Richard A. Young,
Brian Belgodere
Abstract:
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on the…
▽ More
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems.
△ Less
Submitted 18 June, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Alleviating Noisy Data in Image Captioning with Cooperative Distillation
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff
Abstract:
Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose…
▽ More
Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Tabular Transformers for Modeling Multivariate Time Series
Authors:
Inkit Padhi,
Yair Schiff,
Igor Melnyk,
Mattia Rigotti,
Youssef Mroueh,
Pierre Dognin,
Jerret Ross,
Ravi Nair,
Erik Altman
Abstract:
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learn…
▽ More
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations. Code and data are available at https://github.com/IBM/TabFormer.
△ Less
Submitted 11 February, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.
-
DualTKB: A Dual Learning Bridge between Text and Knowledge Base
Authors:
Pierre L. Dognin,
Igor Melnyk,
Inkit Padhi,
Cicero Nogueira dos Santos,
Payel Das
Abstract:
In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model archite…
▽ More
In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model architectures, and evaluation metrics, proposing a novel Commonsense KB completion metric tailored for generative models. Extensive experimental results show that the proposed method compares very favorably to the existing baselines. This approach is a viable step towards a more advanced system for automatic KB construction/expansion and the reverse operation of KB conversion to coherent textual descriptions.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Learning Implicit Text Generation via Feature Matching
Authors:
Inkit Padhi,
Pierre Dognin,
Ke Bai,
Cicero Nogueira dos Santos,
Vijil Chenthamarakshan,
Youssef Mroueh,
Payel Das
Abstract:
Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks. In this paper, we present new GFMN formulations that are effective for sequential data. Our experimental results show the effectiveness of the proposed method, SeqGFMN, for three distinct generation tasks in English…
▽ More
Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks. In this paper, we present new GFMN formulations that are effective for sequential data. Our experimental results show the effectiveness of the proposed method, SeqGFMN, for three distinct generation tasks in English: unconditional text generation, class-conditional text generation, and unsupervised text style transfer. SeqGFMN is stable to train and outperforms various adversarial approaches for text generation and text style transfer.
△ Less
Submitted 8 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Learning Implicit Generative Models by Matching Perceptual Features
Authors:
Cicero Nogueira dos Santos,
Youssef Mroueh,
Inkit Padhi,
Pierre Dognin
Abstract:
Perceptual features (PFs) have been used with great success in tasks such as transfer learning, style transfer, and super-resolution. However, the efficacy of PFs as key source of information for learning generative models is not well studied. We investigate here the use of PFs in the context of learning implicit generative models through moment matching (MM). More specifically, we propose a new e…
▽ More
Perceptual features (PFs) have been used with great success in tasks such as transfer learning, style transfer, and super-resolution. However, the efficacy of PFs as key source of information for learning generative models is not well studied. We investigate here the use of PFs in the context of learning implicit generative models through moment matching (MM). More specifically, we propose a new effective MM approach that learns implicit generative models by performing mean and covariance matching of features extracted from pretrained ConvNets. Our proposed approach improves upon existing MM methods by: (1) breaking away from the problematic min/max game of adversarial learning; (2) avoiding online learning of kernel functions; and (3) being efficient with respect to both number of used moments and required minibatch size. Our experimental results demonstrate that, due to the expressiveness of PFs from pretrained deep ConvNets, our method achieves state-of-the-art results for challenging benchmarks.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Wasserstein Barycenter Model Ensembling
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Jerret Ross,
Cicero Dos Santos,
Tom Sercu
Abstract:
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement b…
▽ More
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Adversarial Semantic Alignment for Improved Image Captions
Authors:
Pierre L. Dognin,
Igor Melnyk,
Youssef Mroueh,
Jarret Ross,
Tom Sercu
Abstract:
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient…
▽ More
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training.
△ Less
Submitted 6 June, 2019; v1 submitted 30 April, 2018;
originally announced May 2018.