Skip to main content

Showing 1–13 of 13 results for author: Merler, M

.
  1. arXiv:2505.13180  [pdf, ps, other

    cs.AI

    ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

    Authors: Matteo Merler, Nicola Dainese, Minttu Alakuijala, Giovanni Bonetta, Pietro Ferrazzi, Yu Tian, Bernardo Magnini, Pekka Marttinen

    Abstract: Integrating Large Language Models with symbolic planners is a promising direction for obtaining verifiable and grounded plans compared to planning in natural language, with recent works extending this idea to visual domains using Vision-Language Models (VLMs). However, rigorous comparison between VLM-grounded symbolic approaches and methods that plan directly with a VLM has been hindered by a lack… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 9 pages, 5 figures and 1 table in the main text; 43 pages, 9 figures and 16 tables including supplementary material

  2. arXiv:2405.15383  [pdf, other

    cs.AI

    Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

    Authors: Nicola Dainese, Matteo Merler, Minttu Alakuijala, Pekka Marttinen

    Abstract: In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions,… ▽ More

    Submitted 30 October, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2024, Main Track. 11 pages in main text, 40 pages including references and supplementary materials. 2 figures and 3 tables in the main text, 9 figures and 12 tables when including the supplementary materials. Website at https://sites.google.com/view/code-world-models/home

  3. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  4. In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery

    Authors: Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, Pekka Marttinen

    Abstract: State of the art Symbolic Regression (SR) methods currently build specialized models, while the application of Large Language Models (LLMs) remains largely unexplored. In this work, we introduce the first comprehensive framework that utilizes LLMs for the task of SR. We propose In-Context Symbolic Regression (ICSR), an SR method which iteratively refines a functional form with an LLM and determine… ▽ More

    Submitted 17 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 18 pages, 11 figures

    Journal ref: ACL Student Research Workshop 2024

  5. arXiv:2310.08797  [pdf, other

    cs.CL cs.AI

    A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

    Authors: Takuma Udagawa, Aashka Trivedi, Michele Merler, Bishwaranjan Bhattacharjee

    Abstract: Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Industry Track

  6. Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

    Authors: Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, Reyhaneh Jabbarvand

    Abstract: Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that en… ▽ More

    Submitted 16 January, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Published in ICSE 2024

  7. arXiv:2306.03316  [pdf, other

    cs.CL

    CoSiNES: Contrastive Siamese Network for Entity Standardization

    Authors: Jiaqing Yuan, Michele Merler, Mihir Choudhury, Raju Pavuluri, Munindar P. Singh, Maja Vukovic

    Abstract: Entity standardization maps noisy mentions from free-form text to standard entities in a knowledge base. The unique challenge of this task relative to other entity-related tasks is the lack of surrounding context and numerous variations in the surface form of the mentions, especially when it comes to generalization across domains where labeled data is scarce. Previous research mostly focuses on de… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by Matching Workshop at ACL2023

  8. arXiv:2303.09639  [pdf, other

    cs.CL

    Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

    Authors: Aashka Trivedi, Takuma Udagawa, Michele Merler, Rameswar Panda, Yousef El-Kurdi, Bishwaranjan Bhattacharjee

    Abstract: Large pretrained language models have achieved state-of-the-art results on a variety of downstream tasks. Knowledge Distillation (KD) into a smaller student model addresses their inefficiency, allowing for deployment in resource-constrained environments. However, KD can be ineffective when the student is manually selected from a set of existing options, since it can be a sub-optimal choice within… ▽ More

    Submitted 13 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 11 pages, 5 figures

  9. arXiv:2011.10608  [pdf, other

    cs.CV

    Large Scale Neural Architecture Search with Polyharmonic Splines

    Authors: Ulrich Finkler, Michele Merler, Rameswar Panda, Mayoore S. Jaiswal, Hui Wu, Kandan Ramakrishnan, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee

    Abstract: Neural Architecture Search (NAS) is a powerful tool to automatically design deep neural networks for many tasks, including image classification. Due to the significant computational burden of the search phase, most NAS methods have focused so far on small, balanced datasets. All attempts at conducting NAS at large scale have employed small proxy sets, and then transferred the learned architectures… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  10. arXiv:2006.13314  [pdf, other

    cs.CV cs.LG cs.NE

    NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

    Authors: Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee

    Abstract: Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using… ▽ More

    Submitted 11 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 19 pages, 19 Figures, 6 Tables

    MSC Class: 68T05 ACM Class: I.2.6; I.4

  11. arXiv:2002.02369  [pdf, other

    eess.IV cs.AI cs.CV

    Covering the News with (AI) Style

    Authors: Michele Merler, Cicero Nogueira dos Santos, Mauro Martino, Alfio M. Gliozzo, John R. Smith

    Abstract: We introduce a multi-modal discriminative and generative frame-work capable of assisting humans in producing visual content re-lated to a given theme, starting from a collection of documents(textual, visual, or both). This framework can be used by edit or to generate images for articles, as well as books or music album covers. Motivated by a request from the The New York Times (NYT) seeking help t… ▽ More

    Submitted 5 January, 2020; originally announced February 2020.

  12. arXiv:1901.10436  [pdf, other

    cs.CV

    Diversity in Faces

    Authors: Michele Merler, Nalini Ratha, Rogerio S. Feris, John R. Smith

    Abstract: Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint… ▽ More

    Submitted 8 April, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Updated statistics after slight modification to dataset due to inactive links and deletions

  13. arXiv:1707.07075  [pdf, other

    cs.CV cs.MM

    Automatic Curation of Golf Highlights using Multimodal Excitement Features

    Authors: Michele Merler, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, John R. Smith, Rogerio S. Feris

    Abstract: The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players' reactions (action recog… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.