Skip to main content

Showing 1–24 of 24 results for author: Richter, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.02865  [pdf, ps, other

    cs.AI

    Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

    Authors: Mathieu Andreux, Breno Baldas Skuk, Hamza Benchekroun, Emilien Biré, Antoine Bonnet, Riaz Bordie, Nathan Bout, Matthias Brunel, Pierre-Louis Cedoz, Antoine Chassang, Mickaël Chen, Alexandra D. Constantinou, Antoine d'Andigné, Hubert de La Jonquière, Aurélien Delfosse, Ludovic Denoyer, Alexis Deprez, Augustin Derupti, Michael Eickenberg, Mathïs Federico, Charles Kantor, Xavier Koegler, Yann Labbé, Matthew C. H. Lee, Erwan Le Jumeau de Kergaradec , et al. (19 additional authors not shown)

    Abstract: We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 t… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Alphabetical order

  2. arXiv:2502.17277  [pdf, other

    cs.CG

    Property Testing of Curve Similarity

    Authors: Peyman Afshani, Maike Buchin, Anne Driemel, Marena Richter, Sampson Wong

    Abstract: We propose sublinear algorithms for probabilistic testing of the discrete and continuous Fréchet distance - a standard similarity measure for curves. We assume the algorithm is given access to the input curves via a query oracle: a query returns the set of vertices of the curve that lie within a radius $δ$ of a specified vertex of the other curve. The goal is to use a small number of queries to de… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  3. arXiv:2501.12821  [pdf, other

    cs.CG

    Transforming Dogs on the Line: On the Fréchet Distance Under Translation or Scaling in 1D

    Authors: Lotte Blank, Jacobus Conradi, Anne Driemel, Benedikt Kolbe, André Nusser, Marena Richter

    Abstract: The Fréchet distance is a computational mainstay for comparing polygonal curves. The Fréchet distance under translation, which is a translation invariant version, considers the similarity of two curves independent of their location in space. It is defined as the minimum Fréchet distance that arises from allowing arbitrary translations of the input curves. This problem and numerous variants of the… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  4. arXiv:2412.10558  [pdf, other

    cs.CL cs.AI cs.LG

    Too Big to Fool: Resisting Deception in Language Models

    Authors: Mohammad Reza Samsami, Mats Leon Richter, Juan Rodriguez, Megh Thakkar, Sarath Chandar, Maxime Gasse

    Abstract: Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. This paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, sh… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

  5. arXiv:2412.04626  [pdf, other

    cs.LG cs.CL

    BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

    Authors: Juan Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, François Savard, Ahmed Masry, Shravan Nayak, Rabiul Awal, Mahsa Massoud, Amirhossein Abaskohi, Zichao Li, Suyuchen Wang, Pierre-André Noël, Mats Leon Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Sara Shanian, Ying Zhang, Noah Bolger, Kurt MacDonald, Simon Fauvel , et al. (18 additional authors not shown)

    Abstract: Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training da… ▽ More

    Submitted 17 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: The project is hosted at https://bigdocs.github.io

    Journal ref: ICLR 2025 https://openreview.net/forum?id=UTgNFcpk0j

  6. arXiv:2409.15022  [pdf, other

    cs.LG cs.AI cs.ET cs.NE

    A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence Processing

    Authors: Svea Marie Meyer, Philipp Weidel, Philipp Plank, Leobardo Campos-Macias, Sumit Bam Shrestha, Philipp Stratmann, Mathis Richter

    Abstract: Deep State-Space Models (SSM) demonstrate state-of-the art performance on long-range sequence modeling tasks. While the recurrent structure of SSMs can be efficiently implemented as a convolution or as a parallel scan during training, recurrent token-by-token processing cannot currently be implemented efficiently on GPUs. Here, we demonstrate efficient token-by-token inference of the SSM S4D on In… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 6 pages, 2 figures

  7. arXiv:2406.04940  [pdf, other

    cs.LG cs.AI

    CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

    Authors: Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

    Abstract: Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote… ▽ More

    Submitted 24 March, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 9 content pages, 11 reference pages, 9 appendix pages

  8. Overwhelmed Software Developers

    Authors: Lisa-Marie Michels, Aleksandra Petkova, Marcel Richter, Andreas Farley, Daniel Graziotin, Stefan Wagner

    Abstract: We have conducted a qualitative psychology study to explore the experience of feeling overwhelmed in the realm of software development. Through the candid confessions of two participants who have recently faced overwhelming challenges, we have identified seven distinct categories: communication-induced, disturbance-related, organizational, variety, technical, temporal, and positive overwhelm. Whil… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages. Published at IEEE Software. Based on the technical report arxiv:2401.02780

    Journal ref: IEEE Software (Volume: 41, Issue: 4, July-Aug. 2024), Page(s): 51 - 59

  9. arXiv:2405.20800  [pdf, other

    cs.LG cs.SC

    Shape Constraints in Symbolic Regression using Penalized Least Squares

    Authors: Viktor Martinek, Julia Reuter, Ophelia Frotscher, Sanaz Mostaghim, Markus Richter, Roland Herzog

    Abstract: We study the addition of shape constraints (SC) and their consideration during the parameter identification step of symbolic regression (SR). SC serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored SC in SR, we propose minimizing SC violations during parameter identification using gradient-based nume… ▽ More

    Submitted 6 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  10. arXiv:2403.08763  [pdf, other

    cs.LG cs.AI cs.CL

    Simple and Scalable Strategies to Continually Pre-train Large Language Models

    Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More

    Submitted 4 September, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  11. arXiv:2401.02780  [pdf, ps, other

    cs.SE cs.CY

    Overwhelmed software developers: An Interpretative Phenomenological Analysis

    Authors: Lisa-Marie Michels, Aleksandra Petkova, Marcel Richter, Andreas Farley, Daniel Graziotin, Stefan Wagner

    Abstract: In this paper, we report on an Interpretive Phenomenological Analysis (IPA) study on experiencing overwhelm in a software development context. The objectives of our study are, hence, to understand the experiences developers have when being overwhelmed, how this impacts their productivity and which role stress plays in the process. To this end, we interviewed two software developers who have experi… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 46 pages, technical report

  12. arXiv:2309.02805  [pdf, other

    cs.LG physics.data-an

    Introducing Thermodynamics-Informed Symbolic Regression -- A Tool for Thermodynamic Equations of State Development

    Authors: Viktor Martinek, Ophelia Frotscher, Markus Richter, Roland Herzog

    Abstract: Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development pr… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  13. arXiv:2308.04014  [pdf, other

    cs.CL cs.LG

    Continual Pre-Training of Large Language Models: How to (re)warm your model?

    Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  14. arXiv:2306.00637  [pdf, other

    cs.CV

    Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

    Authors: Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

    Abstract: We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly… ▽ More

    Submitted 29 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Corresponding to "Würstchen v2"

    Journal ref: The Twelfth International Conference on Learning Representations (ICLR), 2024

  15. arXiv:2211.14487  [pdf, other

    cs.CV cs.AI cs.LG

    Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

    Authors: Mats L. Richter, Christopher Pal

    Abstract: Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experi… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  16. arXiv:2208.00085  [pdf, other

    cs.CV cs.AI cs.LG

    Machine Learning and Computer Vision Techniques in Continuous Beehive Monitoring Applications: A survey

    Authors: Simon Bilik, Tomas Zemcik, Lukas Kratochvila, Dominik Ricanek, Milos Richter, Sebastian Zambanini, Karel Horak

    Abstract: Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. Th… ▽ More

    Submitted 14 September, 2023; v1 submitted 29 July, 2022; originally announced August 2022.

  17. arXiv:2202.05551  [pdf, other

    physics.med-ph cs.MS

    Exploration of Differentiability in a Proton Computed Tomography Simulation Framework

    Authors: Max Aehle, Johan Alme, Gergely Gábor Barnaföldi, Johannes Blühdorn, Tea Bodova, Vyacheslav Borshchov, Anthony van den Brink, Viljar Eikeland, Gregory Feofilov, Christoph Garth, Nicolas R. Gauger, Ola Grøttvik, Håvard Helstrup, Sergey Igolkin, Ralf Keidel, Chinorat Kobdaj, Tobias Kortus, Lisa Kusch, Viktor Leonhardt, Shruti Mehendale, Raju Ningappa Mulawade, Odd Harald Odland, George O'Neill, Gábor Papp, Thomas Peitzmann , et al. (25 additional authors not shown)

    Abstract: Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulati… ▽ More

    Submitted 12 May, 2023; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: 27 pages, 11 figures

  18. arXiv:2106.12307  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

    Authors: Mats L. Richter, Julius Schöning, Anna Wiedenroth, Ulf Krumnack

    Abstract: When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance. The features a convolutional layer can process are strictly limited by its re… ▽ More

    Submitted 5 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Preprint

  19. arXiv:2106.09526  [pdf, other

    cs.LG cs.AI

    Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

    Authors: Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, Ulf Krumnack

    Abstract: In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given… ▽ More

    Submitted 27 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  20. Size Matters

    Authors: Mats L. Richter, Wolf Byttner, Ulf Krumnack, Ludwdig Schallner, Justin Shenk

    Abstract: Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no sim… ▽ More

    Submitted 9 February, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: Preprint

    Journal ref: Artificial Neural Networks and Machine Learning ICANN 2021 133-144

  21. arXiv:2006.08679  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Feature Space Saturation during Training

    Authors: Mats L. Richter, Justin Shenk, Wolf Byttner, Anders Arpteg, Mikael Huss

    Abstract: We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we… ▽ More

    Submitted 22 November, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 45 pages, 41 figures; author order changed in v5 to reflect additional contribution; for code see http://github.com/MLRichter/phd-lab and http://github.com/delve-team/delve

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: British Machine Vision Conference (BMVC) 2021

  22. arXiv:1907.08589  [pdf, other

    cs.LG stat.ML

    Spectral Analysis of Latent Representations

    Authors: Justin Shenk, Mats L. Richter, Anders Arpteg, Mikael Huss

    Abstract: We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook fo… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: 13 pages, 16 figures, code: https://github.com/delve-team/delve

  23. arXiv:1907.07248  [pdf, ps, other

    cs.DC

    Crisis: Probabilistically Self Organizing Total Order in Unstructured P2P Networks

    Authors: Mirco Richter

    Abstract: A framework for asynchronous, signature free, fully local and probabilistically converging total order algorithms is developed, that may survive in high entropy, unstructured Peer-to-Peer networks with near optimal communication efficiency. Regarding the natural boundaries of the CAP-theorem, Crisis chooses different compromises for consistency and availability, depending on the severity of the at… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

    Comments: First draf. 31 pages

  24. arXiv:1210.3569  [pdf, other

    cs.NE

    Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics

    Authors: Sohrob Kazerounian, Matthew Luciw, Mathis Richter, Yulia Sandamirskaya

    Abstract: We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA(λ) for learning a behavioral sequence from delayed reward. DN-SARSA(λ) combines Dynamic Field Theory models of behavioral sequence representation, classical reinforcement learning, and a computational neuroscience model of working memory, called Item and Order working memory, which serves as an eligibility trace. DN-SARSA(λ)… ▽ More

    Submitted 14 May, 2013; v1 submitted 12 October, 2012; originally announced October 2012.

    Comments: Sohrob Kazerounian, Matthew Luciw are Joint first authors