Skip to main content

Showing 1–12 of 12 results for author: Siems, J

.
  1. arXiv:2502.10297  [pdf, ps, other

    cs.LG cs.CL cs.FL

    DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

    Authors: Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi

    Abstract: Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or m… ▽ More

    Submitted 19 June, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: v5: Characterization of DeltaProduct's state-tracking ability. Analysis of hidden state's effective rank. Improved scaling analysis. v6: Added analysis for products of RWKV-7 matrices

  2. arXiv:2411.12537  [pdf, other

    cs.LG cs.CL cs.FL

    Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

    Authors: Riccardo Grazzi, Julien Siems, Arber Zela, Jörg K. H. Franke, Frank Hutter, Massimiliano Pontil

    Abstract: Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which may impair performance in tasks such as code evaluation. In one forward pass, current architectures are unable to solve even parity, the simplest state-trackin… ▽ More

    Submitted 18 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: V2: Correction to Theorem 1 and 2 and to point 3 of Proposition 1. V3: ICLR Camera Ready, V4: ICLR Camera Ready, added figures to theory section, updated modular arithmetic with brackets results because previous results did not contain multiplication

  3. arXiv:2410.09385  [pdf, other

    cs.LG cs.AI

    Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models

    Authors: Sathya Kamesh Bhethanabhotla, Omar Swelam, Julien Siems, David Salinas, Frank Hutter

    Abstract: This paper introduces Mamba4Cast, a zero-shot foundation model for time series forecasting. Based on the Mamba architecture and inspired by Prior-data Fitted Networks (PFNs), Mamba4Cast generalizes robustly across diverse time series tasks without the need for dataset specific fine-tuning. Mamba4Cast's key innovation lies in its ability to achieve strong zero-shot performance on real-world dataset… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  4. arXiv:2410.04560  [pdf, other

    cs.LG stat.ML

    GAMformer: In-Context Learning for Generalized Additive Models

    Authors: Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter

    Abstract: Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to le… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 20 pages, 12 figures

  5. arXiv:2402.03170  [pdf, other

    cs.LG

    Is Mamba Capable of In-Context Learning?

    Authors: Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter

    Abstract: State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  6. arXiv:2305.11475  [pdf, other

    cs.LG stat.ML

    Curve Your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models

    Authors: Julien Siems, Konstantin Ditschuneit, Winfried Ripken, Alma Lindborg, Maximilian Schambach, Johannes S. Otterbach, Martin Genzel

    Abstract: Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity due to their interpretability, which arises from expressing the target value as a sum of non-linear transformations of the features. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the features - has hitherto been largely overlooked.… ▽ More

    Submitted 25 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  7. arXiv:2303.10382  [pdf, other

    cs.LG cs.MA

    Interpretable Reinforcement Learning via Neural Additive Models for Inventory Management

    Authors: Julien Siems, Maximilian Schambach, Sebastian Schulze, Johannes S. Otterbach

    Abstract: The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able… ▽ More

    Submitted 22 March, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

  8. arXiv:2208.14698  [pdf, other

    cs.LG cs.GT cs.MA

    Bayesian Optimization-based Combinatorial Assignment

    Authors: Jakob Weissteiner, Jakob Heiss, Julien Siems, Sven Seuken

    Abstract: We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, t… ▽ More

    Submitted 13 March, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence Vol 37 (2023)

  9. Monotone-Value Neural Networks: Exploiting Preference Monotonicity in Combinatorial Assignment

    Authors: Jakob Weissteiner, Jakob Heiss, Julien Siems, Sven Seuken

    Abstract: Many important resource allocation problems involve the combinatorial assignment of items, e.g., auctions or course allocation. Because the bundle space grows exponentially in the number of items, preference elicitation is a key challenge in these domains. Recently, researchers have proposed ML-based mechanisms that outperform traditional mechanisms while reducing preference elicitation costs for… ▽ More

    Submitted 11 March, 2023; v1 submitted 30 September, 2021; originally announced September 2021.

    Journal ref: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence Main Track (2022). Pages 541-548

  10. arXiv:2008.09777  [pdf, other

    cs.LG

    Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks

    Authors: Arber Zela, Julien Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, Frank Hutter

    Abstract: The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tab… ▽ More

    Submitted 14 April, 2022; v1 submitted 22 August, 2020; originally announced August 2020.

  11. arXiv:2001.10422  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search

    Authors: Arber Zela, Julien Siems, Frank Hutter

    Abstract: One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-sho… ▽ More

    Submitted 12 April, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: In: International Conference on Learning Representations (ICLR 2020); 19 pages, 17 figures

  12. Faster Training of Mask R-CNN by Focusing on Instance Boundaries

    Authors: Roland S. Zimmermann, Julien N. Siems

    Abstract: We present an auxiliary task to Mask R-CNN, an instance segmentation network, which leads to faster training of the mask head. Our addition to Mask R-CNN is a new prediction head, the Edge Agreement Head, which is inspired by the way human annotators perform instance segmentation. Human annotators copy the contour of an object instance and only indirectly the occupied instance area. Hence, the edg… ▽ More

    Submitted 10 August, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

    Comments: 9 pages, 7 figures, 5 tables

    MSC Class: 68T45