-
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights
Authors:
Mathieu Andreux,
Breno Baldas Skuk,
Hamza Benchekroun,
Emilien Biré,
Antoine Bonnet,
Riaz Bordie,
Nathan Bout,
Matthias Brunel,
Pierre-Louis Cedoz,
Antoine Chassang,
Mickaël Chen,
Alexandra D. Constantinou,
Antoine d'Andigné,
Hubert de La Jonquière,
Aurélien Delfosse,
Ludovic Denoyer,
Alexis Deprez,
Augustin Derupti,
Michael Eickenberg,
Mathïs Federico,
Charles Kantor,
Xavier Koegler,
Yann Labbé,
Matthew C. H. Lee,
Erwan Le Jumeau de Kergaradec
, et al. (19 additional authors not shown)
Abstract:
We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 t…
▽ More
We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 tops generalist User Interface (UI) benchmarks as well as our new web UI localization benchmark, WebClick. When powered by Holo1, Surfer-H achieves a 92.2% state-of-the-art performance on WebVoyager, striking a Pareto-optimal balance between accuracy and cost-efficiency. To accelerate research advancement in agentic systems, we are open-sourcing both our WebClick evaluation dataset and the Holo1 model weights.
△ Less
Submitted 11 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Property Testing of Curve Similarity
Authors:
Peyman Afshani,
Maike Buchin,
Anne Driemel,
Marena Richter,
Sampson Wong
Abstract:
We propose sublinear algorithms for probabilistic testing of the discrete and continuous Fréchet distance - a standard similarity measure for curves. We assume the algorithm is given access to the input curves via a query oracle: a query returns the set of vertices of the curve that lie within a radius $δ$ of a specified vertex of the other curve. The goal is to use a small number of queries to de…
▽ More
We propose sublinear algorithms for probabilistic testing of the discrete and continuous Fréchet distance - a standard similarity measure for curves. We assume the algorithm is given access to the input curves via a query oracle: a query returns the set of vertices of the curve that lie within a radius $δ$ of a specified vertex of the other curve. The goal is to use a small number of queries to determine with constant probability whether the two curves are similar (i.e., their discrete Fréchet distance is at most $δ$) or they are ''$\varepsilon$-far'' (for $0 < \varepsilon < 2$) from being similar, i.e., more than an $\varepsilon$-fraction of the two curves must be ignored for them to become similar. We present two algorithms which are sublinear assuming that the curves are $t$-approximate shortest paths in the ambient metric space, for some $t\ll n$. The first algorithm uses $O(\frac{t}{\varepsilon}\log\frac{t}{\varepsilon})$ queries and is given the value of $t$ in advance. The second algorithm does not have explicit knowledge of the value of $t$ and therefore needs to gain implicit knowledge of the straightness of the input curves through its queries. We show that the discrete Fréchet distance can still be tested using roughly $O(\frac{t^3+t^2\log n}{\varepsilon})$ queries ignoring logarithmic factors in $t$. Our algorithms work in a matrix representation of the input and may be of independent interest to matrix testing. Our algorithms use a mild uniform sampling condition that constrains the edge lengths of the curves, similar to a polynomially bounded aspect ratio. Applied to testing the continuous Fréchet distance of $t$-straight curves, our algorithms can be used for $(1+\varepsilon')$-approximate testing using essentially the same bounds as stated above with an additional factor of poly$(\frac{1}{\varepsilon'})$.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Transforming Dogs on the Line: On the Fréchet Distance Under Translation or Scaling in 1D
Authors:
Lotte Blank,
Jacobus Conradi,
Anne Driemel,
Benedikt Kolbe,
André Nusser,
Marena Richter
Abstract:
The Fréchet distance is a computational mainstay for comparing polygonal curves. The Fréchet distance under translation, which is a translation invariant version, considers the similarity of two curves independent of their location in space. It is defined as the minimum Fréchet distance that arises from allowing arbitrary translations of the input curves. This problem and numerous variants of the…
▽ More
The Fréchet distance is a computational mainstay for comparing polygonal curves. The Fréchet distance under translation, which is a translation invariant version, considers the similarity of two curves independent of their location in space. It is defined as the minimum Fréchet distance that arises from allowing arbitrary translations of the input curves. This problem and numerous variants of the Fréchet distance under some transformations have been studied, with more work concentrating on the discrete Fréchet distance, leaving a significant gap between the discrete and continuous versions of the Fréchet distance under transformations. Our contribution is twofold: First, we present an algorithm for the Fréchet distance under translation on 1-dimensional curves of complexity n with a running time of $\mathcal{O}(n^{8/3} log^3 n)$. To achieve this, we develop a novel framework for the problem for 1-dimensional curves, which also applies to other scenarios and leads to our second contribution. We present an algorithm with the same running time of $\mathcal{O}(n^{8/3} \log^3 n)$ for the Fréchet distance under scaling for 1-dimensional curves. For both algorithms we match the running times of the discrete case and improve the previously best known bounds of $\tilde{\mathcal{O}}(n^4)$. Our algorithms rely on technical insights but are conceptually simple, essentially reducing the continuous problem to the discrete case across different length scales.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Too Big to Fool: Resisting Deception in Language Models
Authors:
Mohammad Reza Samsami,
Mats Leon Richter,
Juan Rodriguez,
Megh Thakkar,
Sarath Chandar,
Maxime Gasse
Abstract:
Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. This paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, sh…
▽ More
Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. This paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, showcasing an advanced ability to interpret and integrate prompt information with their internal knowledge. Furthermore, we find that larger models outperform smaller ones in following legitimate instructions, indicating that their resilience is not due to disregarding in-context information. We also show that this phenomenon is likely not a result of memorization but stems from the models' ability to better leverage implicit task-relevant information from the prompt alongside their internally stored knowledge.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Authors:
Juan Rodriguez,
Xiangru Jian,
Siba Smarak Panigrahi,
Tianyu Zhang,
Aarash Feizi,
Abhay Puri,
Akshay Kalkunte,
François Savard,
Ahmed Masry,
Shravan Nayak,
Rabiul Awal,
Mahsa Massoud,
Amirhossein Abaskohi,
Zichao Li,
Suyuchen Wang,
Pierre-André Noël,
Mats Leon Richter,
Saverio Vadacchino,
Shubham Agarwal,
Sanket Biswas,
Sara Shanian,
Ying Zhang,
Noah Bolger,
Kurt MacDonald,
Simon Fauvel
, et al. (18 additional authors not shown)
Abstract:
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training da…
▽ More
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .
△ Less
Submitted 17 March, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence Processing
Authors:
Svea Marie Meyer,
Philipp Weidel,
Philipp Plank,
Leobardo Campos-Macias,
Sumit Bam Shrestha,
Philipp Stratmann,
Mathis Richter
Abstract:
Deep State-Space Models (SSM) demonstrate state-of-the art performance on long-range sequence modeling tasks. While the recurrent structure of SSMs can be efficiently implemented as a convolution or as a parallel scan during training, recurrent token-by-token processing cannot currently be implemented efficiently on GPUs. Here, we demonstrate efficient token-by-token inference of the SSM S4D on In…
▽ More
Deep State-Space Models (SSM) demonstrate state-of-the art performance on long-range sequence modeling tasks. While the recurrent structure of SSMs can be efficiently implemented as a convolution or as a parallel scan during training, recurrent token-by-token processing cannot currently be implemented efficiently on GPUs. Here, we demonstrate efficient token-by-token inference of the SSM S4D on Intel's Loihi 2 state-of-the-art neuromorphic processor. We compare this first ever neuromorphic-hardware implementation of an SSM on sMNIST, psMNIST, and sCIFAR to a recurrent and a convolutional implementation of S4D on Jetson Orin Nano (Jetson). While we find Jetson to perform better in an offline sample-by-sample based batched processing mode, Loihi 2 outperforms during token-by-token based processing, where it consumes 1000 times less energy with a 75 times lower latency and a 75 times higher throughput compared to the recurrent implementation of S4D on Jetson. This opens up new avenues towards efficient real-time streaming applications of SSMs.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Authors:
Matthew Fortier,
Mats L. Richter,
Oliver Sonnentag,
Chris Pal
Abstract:
Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote…
▽ More
Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling.
△ Less
Submitted 24 March, 2025; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Overwhelmed Software Developers
Authors:
Lisa-Marie Michels,
Aleksandra Petkova,
Marcel Richter,
Andreas Farley,
Daniel Graziotin,
Stefan Wagner
Abstract:
We have conducted a qualitative psychology study to explore the experience of feeling overwhelmed in the realm of software development. Through the candid confessions of two participants who have recently faced overwhelming challenges, we have identified seven distinct categories: communication-induced, disturbance-related, organizational, variety, technical, temporal, and positive overwhelm. Whil…
▽ More
We have conducted a qualitative psychology study to explore the experience of feeling overwhelmed in the realm of software development. Through the candid confessions of two participants who have recently faced overwhelming challenges, we have identified seven distinct categories: communication-induced, disturbance-related, organizational, variety, technical, temporal, and positive overwhelm. While most types of overwhelm tend to deteriorate productivity and increase stress levels, developers sometimes perceive overwhelm as a catalyst for heightened focus, self-motivation, and productivity. Stress was often found to be a common companion of overwhelm. Our findings align with previous studies conducted in diverse disciplines. However, we believe that software developers possess unique traits that may enable them to navigate through the storm of overwhelm more effectively.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Shape Constraints in Symbolic Regression using Penalized Least Squares
Authors:
Viktor Martinek,
Julia Reuter,
Ophelia Frotscher,
Sanaz Mostaghim,
Markus Richter,
Roland Herzog
Abstract:
We study the addition of shape constraints (SC) and their consideration during the parameter identification step of symbolic regression (SR). SC serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored SC in SR, we propose minimizing SC violations during parameter identification using gradient-based nume…
▽ More
We study the addition of shape constraints (SC) and their consideration during the parameter identification step of symbolic regression (SR). SC serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored SC in SR, we propose minimizing SC violations during parameter identification using gradient-based numerical optimization. We test three algorithm variants to evaluate their performance in identifying three symbolic expressions from synthetically generated data sets. This paper examines two benchmark scenarios: one with varying noise levels and another with reduced amounts of training data. The results indicate that incorporating SC into the expression search is particularly beneficial when data is scarce. Compared to using SC only in the selection process, our approach of minimizing violations during parameter identification shows a statistically significant benefit in some of our test cases, without being significantly worse in any instance.
△ Less
Submitted 6 August, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Authors:
Adam Ibrahim,
Benjamin Thérien,
Kshitij Gupta,
Mats L. Richter,
Quentin Anthony,
Timothée Lesort,
Eugene Belilovsky,
Irina Rish
Abstract:
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati…
▽ More
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English$\rightarrow$English) and a stronger distribution shift (English$\rightarrow$German) at the $405$M parameter model scale with large dataset sizes (hundreds of billions of tokens). Selecting the weak but realistic shift for larger-scale experiments, we also find that our continual learning strategies match the re-training baseline for a 10B parameter LLM. Our results demonstrate that LLMs can be successfully updated via simple and scalable continual learning strategies, matching the re-training baseline using only a fraction of the compute. Finally, inspired by previous work, we propose alternatives to the cosine learning rate schedule that help circumvent forgetting induced by LR re-warming and that are not bound to a fixed token budget.
△ Less
Submitted 4 September, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Overwhelmed software developers: An Interpretative Phenomenological Analysis
Authors:
Lisa-Marie Michels,
Aleksandra Petkova,
Marcel Richter,
Andreas Farley,
Daniel Graziotin,
Stefan Wagner
Abstract:
In this paper, we report on an Interpretive Phenomenological Analysis (IPA) study on experiencing overwhelm in a software development context. The objectives of our study are, hence, to understand the experiences developers have when being overwhelmed, how this impacts their productivity and which role stress plays in the process. To this end, we interviewed two software developers who have experi…
▽ More
In this paper, we report on an Interpretive Phenomenological Analysis (IPA) study on experiencing overwhelm in a software development context. The objectives of our study are, hence, to understand the experiences developers have when being overwhelmed, how this impacts their productivity and which role stress plays in the process. To this end, we interviewed two software developers who have experienced overwhelm recently. Throughout a qualitative analysis of the shared experiences, we uncover seven categories of overwhelm (communication, disturbance, organizational, variety, technical, temporal, and positive overwhelm). While the first six themes all are related to negative outcomes, including low productivity and stress, the participants reported that overwhelm can sometimes be experienced to be positive and pleasant, and it can increase their mental focus, self ambition, and productivity. Stress was the most mentioned feeling experienced when overwhelmed. Our findings, for the most, are along the same direction of similar studies from other disciplines and with other participants. However, there may be unique attributes to software developers that mitigate the negative experiences of overwhelm.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Introducing Thermodynamics-Informed Symbolic Regression -- A Tool for Thermodynamic Equations of State Development
Authors:
Viktor Martinek,
Ophelia Frotscher,
Markus Richter,
Roland Herzog
Abstract:
Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development pr…
▽ More
Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development process, we introduce thermodynamics-informed symbolic regression (TiSR), a symbolic regression (SR) tool aimed at thermodynamic EOS modeling. TiSR is already a capable SR tool, which was used in the research of https://doi.org/10.1007/s10765-023-03197-z. It aims to combine an SR base with the extensions required to work with often strongly scattered experimental data, different residual pre- and post-processing options, and additional features required to consider thermodynamic EOS development. Although TiSR is not ready for end users yet, this paper is intended to report on its current state, showcase the progress, and discuss (distant and not so distant) future directions. TiSR is available at https://github.com/scoop-group/TiSR and can be cited as https://doi.org/10.5281/zenodo.8317547.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Authors:
Kshitij Gupta,
Benjamin Thérien,
Adam Ibrahim,
Mats L. Richter,
Quentin Anthony,
Eugene Belilovsky,
Irina Rish,
Timothée Lesort
Abstract:
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t…
▽ More
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
△ Less
Submitted 6 September, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Authors:
Pablo Pernias,
Dominic Rampas,
Mats L. Richter,
Christopher J. Pal,
Marc Aubreville
Abstract:
We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly…
▽ More
We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results. Our approach also improves the quality of text-conditioned image generation based on our user preference study. The training requirements of our approach consists of 24,602 A100-GPU hours - compared to Stable Diffusion 2.1's 200,000 GPU hours. Our approach also requires less training data to achieve these results. Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance. In a broader comparison against SOTA models our approach is substantially more efficient and compares favorably in terms of image quality. We believe that this work motivates more emphasis on the prioritization of both performance and computational accessibility.
△ Less
Submitted 29 September, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance
Authors:
Mats L. Richter,
Christopher Pal
Abstract:
Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experi…
▽ More
Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experiments. By further developing and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model. This allows us to optimize the parameter-efficiency of a given architecture at low cost. Our method is computationally simple and can be done in an automated manner or even manually with minimal effort for most common architectures. We demonstrate the effectiveness of this approach by increasing parameter efficiency across past and current top-performing CNN-architectures. Specifically, our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes, including: VGG Nets, MobileNetV1, MobileNetV3, NASNet A (mobile), MnasNet, EfficientNet, and ConvNeXt - leading to a new SOTA result for each model class.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Machine Learning and Computer Vision Techniques in Continuous Beehive Monitoring Applications: A survey
Authors:
Simon Bilik,
Tomas Zemcik,
Lukas Kratochvila,
Dominik Ricanek,
Milos Richter,
Sebastian Zambanini,
Karel Horak
Abstract:
Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. Th…
▽ More
Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. This is also connected with the introduction of the easily accessible hardware and development kits such as Arduino, or RaspberryPi family. In this paper, we survey 50 existing papers focusing on the methods of automated beehive monitoring methods using the computer vision techniques, particularly on the pollen and Varroa mite detection together with the bee traffic monitoring. Such systems could also be used for the monitoring of the honeybee colonies and for the inspection of their health state, which could identify potentially dangerous states before the situation is critical, or to better plan periodic bee colony inspections and therefore save significant costs. Later, we also include analysis of the research trends in this application field and we outline the possible direction of the new explorations. Our paper is aimed also at veterinary and apidology professionals and experts, who might not be familiar with machine learning to introduce them to its possibilities, therefore each family of applications is opened by a brief theoretical introduction and motivation related to its base method. We hope that this paper will inspire other scientists to use machine learning techniques for other applications in beehive monitoring.
△ Less
Submitted 14 September, 2023; v1 submitted 29 July, 2022;
originally announced August 2022.
-
Exploration of Differentiability in a Proton Computed Tomography Simulation Framework
Authors:
Max Aehle,
Johan Alme,
Gergely Gábor Barnaföldi,
Johannes Blühdorn,
Tea Bodova,
Vyacheslav Borshchov,
Anthony van den Brink,
Viljar Eikeland,
Gregory Feofilov,
Christoph Garth,
Nicolas R. Gauger,
Ola Grøttvik,
Håvard Helstrup,
Sergey Igolkin,
Ralf Keidel,
Chinorat Kobdaj,
Tobias Kortus,
Lisa Kusch,
Viktor Leonhardt,
Shruti Mehendale,
Raju Ningappa Mulawade,
Odd Harald Odland,
George O'Neill,
Gábor Papp,
Thomas Peitzmann
, et al. (25 additional authors not shown)
Abstract:
Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulati…
▽ More
Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications.
Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques.
Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path. Jumps in the MC function likely arose from changes in the control flow that affect the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem.
Significance. The MC and MBIR codes are ready for the integration of AD, and further research on surrogate models for the tracking subprocedure is necessary.
△ Less
Submitted 12 May, 2023; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis
Authors:
Mats L. Richter,
Julius Schöning,
Anna Wiedenroth,
Ulf Krumnack
Abstract:
When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance.
The features a convolutional layer can process are strictly limited by its re…
▽ More
When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs. By implication, these CNNs are unnecessarily resource intensive to train and deploy, with diminishing beneficial effects on the predictive performance.
The features a convolutional layer can process are strictly limited by its receptive field. By layer-wise analyzing the size of the receptive fields, we can reliably predict sequences of layers that will not contribute qualitatively to the test accuracy in the given CNN architecture. Based on this analysis, we propose design strategies based on a so-called border layer. This layer allows to identify unproductive convolutional layers and hence to resolve these inefficiencies, optimize the explainability and the computational performance of CNNs. Since neither the strategies nor the analysis requires training of the actual model, these insights allow for a very efficient design process of CNN architectures, which might be automated in the future.
△ Less
Submitted 5 October, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Exploring the Properties and Evolution of Neural Network Eigenspaces during Training
Authors:
Mats L. Richter,
Leila Malihi,
Anne-Kathrin Patricia Windler,
Ulf Krumnack
Abstract:
In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given…
▽ More
In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}. We show that problem difficulty and neural network capacity affect the predictive performance in an antagonistic manner, opening the possibility of detecting over- and under-parameterization of neural networks for a given task. We further show that the observed effects are independent from previously reported pathological patterns like the ``tail pattern'' described in \cite{featurespace_saturation}. Finally we are able to show that saturation patterns converge early during training, allowing for a quicker cycle time during analysis
△ Less
Submitted 27 October, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Size Matters
Authors:
Mats L. Richter,
Wolf Byttner,
Ulf Krumnack,
Ludwdig Schallner,
Justin Shenk
Abstract:
Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no sim…
▽ More
Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling. However, we find that fully convolutional image classifiers are not agnostic to the input size but rather show significant differences in performance: presenting the same image at different scales can result in different outcomes. A closer look reveals that there is no simple relationship between input size and model performance (no `bigger is better'), but that each each network has a preferred input size, for which it shows best results. We investigate this phenomenon by applying different methods, including spectral analysis of layer activations and probe classifiers, showing that there are characteristic features depending on the network architecture. From this we find that the size of discriminatory features is critically influencing how the inference process is distributed among the layers.
△ Less
Submitted 9 February, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Feature Space Saturation during Training
Authors:
Mats L. Richter,
Justin Shenk,
Wolf Byttner,
Anders Arpteg,
Mikael Huss
Abstract:
We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we…
▽ More
We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we derive layer saturation - the ratio between the eigenspace dimension and layer width. We show that saturation seems to indicate which layers contribute to network performance. We demonstrate how to alter layer saturation in a neural network by changing network depth, filter sizes and input resolution. Furthermore, we show that well-chosen input resolution increases network performance by distributing the inference process more evenly across the network.
△ Less
Submitted 22 November, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Spectral Analysis of Latent Representations
Authors:
Justin Shenk,
Mats L. Richter,
Anders Arpteg,
Mikael Huss
Abstract:
We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook fo…
▽ More
We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers. Saturation is based on spectral analysis and can be computed efficiently, making live analysis of the representations practical during training. We provide an outlook for future applications of this metric by outlining the behaviour of layer saturation in different neural architectures and problems. We further show that saturation is related to the generalization and predictive performance of neural networks.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Crisis: Probabilistically Self Organizing Total Order in Unstructured P2P Networks
Authors:
Mirco Richter
Abstract:
A framework for asynchronous, signature free, fully local and probabilistically converging total order algorithms is developed, that may survive in high entropy, unstructured Peer-to-Peer networks with near optimal communication efficiency. Regarding the natural boundaries of the CAP-theorem, Crisis chooses different compromises for consistency and availability, depending on the severity of the at…
▽ More
A framework for asynchronous, signature free, fully local and probabilistically converging total order algorithms is developed, that may survive in high entropy, unstructured Peer-to-Peer networks with near optimal communication efficiency. Regarding the natural boundaries of the CAP-theorem, Crisis chooses different compromises for consistency and availability, depending on the severity of the attack.
The family is parameterized by a few constants and external functions called voting-weight, incentivation \& punishement, difficulty oracle and quorum-selector. These functions are necessary to fine tune the dynamics and very different long term behavior might appear, depending on any actual choice.
△ Less
Submitted 13 July, 2019;
originally announced July 2019.
-
Autonomous Reinforcement of Behavioral Sequences in Neural Dynamics
Authors:
Sohrob Kazerounian,
Matthew Luciw,
Mathis Richter,
Yulia Sandamirskaya
Abstract:
We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA(λ) for learning a behavioral sequence from delayed reward. DN-SARSA(λ) combines Dynamic Field Theory models of behavioral sequence representation, classical reinforcement learning, and a computational neuroscience model of working memory, called Item and Order working memory, which serves as an eligibility trace. DN-SARSA(λ)…
▽ More
We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA(λ) for learning a behavioral sequence from delayed reward. DN-SARSA(λ) combines Dynamic Field Theory models of behavioral sequence representation, classical reinforcement learning, and a computational neuroscience model of working memory, called Item and Order working memory, which serves as an eligibility trace. DN-SARSA(λ) is implemented on both a simulated and real robot that must learn a specific rewarding sequence of elementary behaviors from exploration. Results show DN-SARSA(λ) performs on the level of the discrete SARSA(λ), validating the feasibility of general reinforcement learning without compromising neural dynamics.
△ Less
Submitted 14 May, 2013; v1 submitted 12 October, 2012;
originally announced October 2012.