Skip to main content

Showing 1–4 of 4 results for author: Farré, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.05299  [pdf, other

    cs.AI cs.CV

    SmolVLM: Redefining small and efficient multimodal models

    Authors: Andrés Marafioti, Orr Zohar, Miquel Farré, Merve Noyan, Elie Bakouch, Pedro Cuenca, Cyril Zakka, Loubna Ben Allal, Anton Lozhkov, Nouamane Tazi, Vaibhav Srivastav, Joshua Lochner, Hugo Larcher, Mathieu Morlon, Lewis Tunstall, Leandro von Werra, Thomas Wolf

    Abstract: Large Vision-Language Models (VLMs) deliver exceptional performance but require significant computational resources, limiting their deployment on mobile and edge devices. Smaller VLMs typically mirror design choices of larger models, such as extensive image tokenization, leading to inefficient GPU memory usage and constrained practicality for on-device applications. We introduce SmolVLM, a serie… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  2. arXiv:2503.11576  [pdf, other

    cs.CV

    SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

    Authors: Ahmed Nassar, Andres Marafioti, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Miquel Farré, Peter W. J. Staar

    Abstract: We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike existing approaches that rely on large foundational models, or ensemble solutions that rely on handcrafted pipeline… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 24 pages, 10 figures

  3. arXiv:2405.08813  [pdf, other

    cs.CV cs.LG cs.MM

    CinePile: A Long Video Question Answering Dataset and Benchmark

    Authors: Ruchit Rawal, Khalid Saifullah, Miquel Farré, Ronen Basri, David Jacobs, Gowthami Somepalli, Tom Goldstein

    Abstract: Current datasets for long-form video understanding often fall short of providing genuine long-form comprehension challenges, as many tasks derived from these datasets can be successfully tackled by analyzing just one or a few random frames from a video. To address this issue, we present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding. This… ▽ More

    Submitted 20 October, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Project page with all the artifacts - https://ruchitrawal.github.io/cinepile/. Updated version with adversarial refinement pipeline and more model evaluations

  4. Towards an Interoperability Roadmap for the Energy Transition

    Authors: Valerie Reif, Thomas I. Strasser, Joseba Jimeno, Marjolaine Farre, Oliver Genest, Amélie Gyrard, Mark McGranaghan, Gianluca Lipari, Johann Schütz, Mathias Uslar, Sebastian Vogel, Arsim Bytyqi, Rita Dornmair, Andreas Corusa, Gaurav Roy, Ferdinanda Ponci, Alberto Dognini, Antonello Monti

    Abstract: Smart grid interoperability is the means to achieve the twin green and digital transition but re-mains heterogeneous and fragmented to date. This work presents the first ideas and corner-stones of an Interoperability Roadmap for the Energy Transition that is being developed by the Horizon Europe int:net project. This roadmap builds on four cornerstones that address open interoperability issues. Th… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 12. (Hybrid) Symposium Communications for Energy Systems (ComForEn 2023)