Skip to main content

Showing 1–20 of 20 results for author: Mansimov, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.12094  [pdf, other

    cs.AI cs.CL

    A Study on Leveraging Search and Self-Feedback for Agent Reasoning

    Authors: Karthikeyan K, Michelle Yuan, Elman Mansimov, Katerina Margatina, Anurag Pratik, Daniele Bonadiman, Monica Sunkara, Yi Zhang, Yassine Benajiba

    Abstract: Recent works have demonstrated that incorporating search during inference can significantly improve reasoning capabilities of language agents. Some approaches may make use of the ground truth or rely on model's own generated feedback. The search algorithm uses this feedback to then produce values that will update its criterion for exploring and exploiting various reasoning paths. In this study, we… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Under review

  2. arXiv:2401.05033  [pdf, other

    cs.CL cs.AI

    Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk

    Authors: Dennis Ulmer, Elman Mansimov, Kaixiang Lin, Justin Sun, Xibin Gao, Yi Zhang

    Abstract: Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging. Instructing tuning, i.e. tuning models on instruction and sample responses generated by humans (Ouyang et al., 2022), has proven as an effective method to do so, yet requires a number of data samples that a) might not be available or b) costly to generate. Fur… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  3. arXiv:2305.14827  [pdf, other

    cs.CL

    Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification

    Authors: Mujeen Sung, James Gung, Elman Mansimov, Nikolaos Pappas, Raphael Shu, Salvatore Romeo, Yi Zhang, Vittorio Castelli

    Abstract: Intent classification (IC) plays an important role in task-oriented dialogue systems. However, IC models often generalize poorly when training without sufficient annotated examples for each user intent. We propose a novel pre-training method for text encoders that uses contrastive learning with intent psuedo-labels to produce embeddings that are well-suited for IC tasks, reducing the need for manu… ▽ More

    Submitted 13 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  4. arXiv:2302.08362  [pdf, other

    cs.CL

    Conversation Style Transfer using Few-Shot Learning

    Authors: Shamik Roy, Raphael Shu, Nikolaos Pappas, Elman Mansimov, Yi Zhang, Saab Mansour, Dan Roth

    Abstract: Conventional text style transfer approaches focus on sentence-level style transfer without considering contextual information, and the style is described with attributes (e.g., formality). When applying style transfer in conversations such as task-oriented dialogues, existing approaches suffer from these limitations as context can play an important role and the style attributes are often difficult… ▽ More

    Submitted 21 September, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: IJCNLP-AACL'2023 Camera Ready Version

  5. arXiv:2302.02080  [pdf, other

    cs.CL

    Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion

    Authors: Yi-An Lai, Elman Mansimov, Yuqing Xie, Yi Zhang

    Abstract: When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: Camera-ready for EACL 2023 Findings

  6. arXiv:2301.10546  [pdf, other

    cs.LG cs.CL

    Backward Compatibility During Data Updates by Weight Interpolation

    Authors: Raphael Schumann, Elman Mansimov, Yi-An Lai, Nikolaos Pappas, Xibin Gao, Yi Zhang

    Abstract: Backward compatibility of model predictions is a desired property when updating a machine learning driven application. It allows to seamlessly improve the underlying model without introducing regression bugs. In classification tasks these bugs occur in the form of negative flips. This means an instance that was correctly classified by the old model is now classified incorrectly by the updated mode… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  7. arXiv:2212.09946  [pdf, other

    cs.CL

    Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

    Authors: Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, Dan Roth

    Abstract: Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  8. arXiv:2204.07128  [pdf, other

    cs.CL

    Label Semantic Aware Pre-training for Few-shot Text Classification

    Authors: Aaron Mueller, Jason Krone, Salvatore Romeo, Saab Mansour, Elman Mansimov, Yi Zhang, Dan Roth

    Abstract: In text classification tasks, useful information is encoded in the label names. Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction. However, use of label-semantics during pre-training has not been extensively explored. We therefore propose Label Semantic Aware Pre-training (LSAP) to improve the generalization… ▽ More

    Submitted 29 May, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: Accepted at ACL 2022

  9. arXiv:2202.02976  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Reducing Model Update Regression in Structured Prediction for NLP

    Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

    Abstract: Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled… ▽ More

    Submitted 8 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: NeurIPS2022

  10. arXiv:2109.14739  [pdf, other

    cs.CL

    Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

    Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, Yi Zhang

    Abstract: Pre-trained language models have been recently shown to benefit task-oriented dialogue (TOD) systems. Despite their success, existing methods often formulate this task as a cascaded generation problem which can lead to error accumulation across different sub-tasks and greater data annotation overhead. In this study, we present PPTOD, a unified plug-and-play model for task-oriented dialogue. In add… ▽ More

    Submitted 1 March, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: Camera-ready for ACL2022 main conference

  11. arXiv:2109.04500  [pdf, other

    cs.CL

    Semantic Parsing in Task-Oriented Dialog with Recursive Insertion-based Encoder

    Authors: Elman Mansimov, Yi Zhang

    Abstract: We introduce a Recursive INsertion-based Encoder (RINE), a novel approach for semantic parsing in task-oriented dialog. Our model consists of an encoder network that incrementally builds the semantic parse tree by predicting the non-terminal label and its positions in the linearized tree. At the generation time, the model constructs the semantic parse tree by recursively inserting the predicted no… ▽ More

    Submitted 20 March, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted to AAAI-22

  12. arXiv:2010.10648  [pdf, other

    cs.CL cs.CV cs.LG

    Towards End-to-End In-Image Neural Machine Translation

    Authors: Elman Mansimov, Mitchell Stern, Mia Chen, Orhan Firat, Jakob Uszkoreit, Puneet Jain

    Abstract: In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language. We propose an end-to-end neural model for this task inspired by recent approaches to neural machine translation, and demonstrate promising initial results based purely on pixel-level supe… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted as an oral presentation at EMNLP, NLP Beyond Text workshop, 2020

  13. arXiv:2003.05259  [pdf, other

    cs.CL

    Capturing document context inside sentence-level neural machine translation models with self-training

    Authors: Elman Mansimov, Gábor Melis, Lei Yu

    Abstract: Neural machine translation (NMT) has arguably achieved human level parity when trained and evaluated at the sentence-level. Document-level neural machine translation has received less attention and lags behind its sentence-level counterpart. The majority of the proposed document-level approaches investigate ways of conditioning the model on several source or target sentences to capture document co… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

  14. arXiv:1905.12790  [pdf, other

    cs.LG cs.CL stat.ML

    A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

    Authors: Elman Mansimov, Alex Wang, Sean Welleck, Kyunghyun Cho

    Abstract: Undirected neural sequence models such as BERT (Devlin et al., 2019) have received renewed interest due to their success on discriminative natural language understanding tasks such as question-answering and natural language inference. The problem of generating sequences directly from these models has received relatively little attention, in part because generating from undirected models departs si… ▽ More

    Submitted 7 February, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

  15. arXiv:1904.00314  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Molecular geometry prediction using a deep generative graph neural network

    Authors: Elman Mansimov, Omar Mahmood, Seokho Kang, Kyunghyun Cho

    Abstract: A molecule's geometry, also known as conformation, is one of a molecule's most important properties, determining the reactions it participates in, the bonds it forms, and the interactions it has with other molecules. Conventional conformation generation methods minimize hand-designed molecular force field energy functions that are often not well correlated with the true energy function of a molecu… ▽ More

    Submitted 16 December, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: 15 pages, 6 figures

    Journal ref: Scientific Reports 9: 20381, 2019

  16. arXiv:1802.06901  [pdf, other

    cs.LG cs.CL stat.ML

    Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

    Authors: Jason Lee, Elman Mansimov, Kyunghyun Cho

    Abstract: We propose a conditional non-autoregressive neural sequence model based on iterative refinement. The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. We extensively evaluate the proposed model on machine translation (En-De and En-Ro) and image caption generation, and observe that it… ▽ More

    Submitted 27 August, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

    Comments: Accepted to EMNLP'18

  17. arXiv:1708.05144  [pdf, other

    cs.LG

    Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

    Authors: Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

    Abstract: In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker… ▽ More

    Submitted 18 August, 2017; v1 submitted 17 August, 2017; originally announced August 2017.

    Comments: 14 pages, 9 figures; update github repo link

  18. arXiv:1511.02793  [pdf, other

    cs.LG cs.CV

    Generating Images from Captions with Attention

    Authors: Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov

    Abstract: Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate… ▽ More

    Submitted 29 February, 2016; v1 submitted 9 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

  19. arXiv:1503.07274  [pdf, other

    cs.CV cs.LG

    Initialization Strategies of Spatio-Temporal Convolutional Neural Networks

    Authors: Elman Mansimov, Nitish Srivastava, Ruslan Salakhutdinov

    Abstract: We propose a new way of incorporating temporal information present in videos into Spatial Convolutional Neural Networks (ConvNets) trained on images, that avoids training Spatio-Temporal ConvNets from scratch. We describe several initializations of weights in 3D Convolutional Layers of Spatio-Temporal ConvNet using 2D Convolutional Weights learned from ImageNet. We show that it is important to ini… ▽ More

    Submitted 24 March, 2015; originally announced March 2015.

    Comments: Technical Report

  20. arXiv:1502.04681  [pdf, other

    cs.LG cs.CV cs.NE

    Unsupervised Learning of Video Representations using LSTMs

    Authors: Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov

    Abstract: We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds o… ▽ More

    Submitted 3 January, 2016; v1 submitted 16 February, 2015; originally announced February 2015.

    Comments: Added link to code on github