Search | arXiv e-print repository

Transductively Informed Inductive Program Synthesis

Authors: Janis Zenkner, Tobias Sesterhenn, Christian Bartelt

Abstract: Abstraction and reasoning in program synthesis has seen significant progress through both inductive and transductive paradigms. Inductive approaches generate a program or latent function from input-output examples, which can then be applied to new inputs. Transductive approaches directly predict output values for given inputs, effectively serving as the function themselves. Current approaches comb… ▽ More Abstraction and reasoning in program synthesis has seen significant progress through both inductive and transductive paradigms. Inductive approaches generate a program or latent function from input-output examples, which can then be applied to new inputs. Transductive approaches directly predict output values for given inputs, effectively serving as the function themselves. Current approaches combine inductive and transductive models via isolated ensembling, but they do not explicitly model the interaction between both paradigms. In this work, we introduce \acs{tiips}, a novel framework that unifies transductive and inductive strategies by explicitly modeling their interactions through a cooperative mechanism: an inductive model generates programs, while a transductive model constrains, guides, and refines the search to improve synthesis accuracy and generalization. We evaluate \acs{tiips} on two widely studied program synthesis domains: string and list manipulation. Our results show that \acs{tiips} solves more tasks and yields functions that more closely match optimal solutions in syntax and semantics, particularly in out-of-distribution settings, yielding state-of-the-art performance. We believe that explicitly modeling the synergy between inductive and transductive reasoning opens promising avenues for general-purpose program synthesis and broader applications. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2503.24365 [pdf, other]

Which LIME should I trust? Concepts, Challenges, and Solutions

Authors: Patrick Knab, Sascha Marton, Udo Schlegel, Christian Bartelt

Abstract: As neural networks become dominant in essential systems, Explainable Artificial Intelligence (XAI) plays a crucial role in fostering trust and detecting potential misbehavior of opaque models. LIME (Local Interpretable Model-agnostic Explanations) is among the most prominent model-agnostic approaches, generating explanations by approximating the behavior of black-box models around specific instanc… ▽ More As neural networks become dominant in essential systems, Explainable Artificial Intelligence (XAI) plays a crucial role in fostering trust and detecting potential misbehavior of opaque models. LIME (Local Interpretable Model-agnostic Explanations) is among the most prominent model-agnostic approaches, generating explanations by approximating the behavior of black-box models around specific instances. Despite its popularity, LIME faces challenges related to fidelity, stability, and applicability to domain-specific problems. Numerous adaptations and enhancements have been proposed to address these issues, but the growing number of developments can be overwhelming, complicating efforts to navigate LIME-related research. To the best of our knowledge, this is the first survey to comprehensively explore and collect LIME's foundational concepts and known limitations. We categorize and compare its various enhancements, offering a structured taxonomy based on intermediate steps and key issues. Our analysis provides a holistic overview of advancements in LIME, guiding future research and helping practitioners identify suitable approaches. Additionally, we provide a continuously updated interactive website (https://patrick-knab.github.io/which-lime-to-trust/), offering a concise and accessible overview of the survey. △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: Accepted at the 3rd World Conference on eXplainable Artificial Intelligence (XAI 2025)

arXiv:2503.09159 [pdf, other]

Unreflected Use of Tabular Data Repositories Can Undermine Research Quality

Authors: Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke, Frank Hutter, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue… ▽ More Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue that, despite great achievements in usability, the unreflected usage of datasets from data repositories may have led to reduced research quality and scientific rigor. We present examples from prominent recent studies that illustrate the problematic use of datasets from OpenML, a large data repository for tabular data. Our illustrations help users of data repositories avoid falling into the traps of (1) using suboptimal model selection strategies, (2) overlooking strong baselines, and (3) inappropriate preprocessing. In response, we discuss possible solutions for how data repositories can prevent the inappropriate use of datasets and become the cornerstones for improved overall quality of empirical research studies. △ Less

Submitted 12 March, 2025; originally announced March 2025.

arXiv:2503.08738 [pdf, other]

Shedding Light in Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model

Authors: Janis Zenkner, Tobias Sesterhenn, Christian Bartelt

Abstract: Task decomposition is a fundamental mechanism in program synthesis, enabling complex problems to be broken down into manageable subtasks. ExeDec, a state-of-the-art program synthesis framework, employs this approach by combining a Subgoal Model for decomposition and a Synthesizer Model for program generation to facilitate compositional generalization. In this work, we develop REGISM, an adaptation… ▽ More Task decomposition is a fundamental mechanism in program synthesis, enabling complex problems to be broken down into manageable subtasks. ExeDec, a state-of-the-art program synthesis framework, employs this approach by combining a Subgoal Model for decomposition and a Synthesizer Model for program generation to facilitate compositional generalization. In this work, we develop REGISM, an adaptation of ExeDec that removes decomposition guidance and relies solely on iterative execution-driven synthesis. By comparing these two exemplary approaches-ExeDec, which leverages task decomposition, and REGISM, which does not-we investigate the interplay between task decomposition and program generation. Our findings indicate that ExeDec exhibits significant advantages in length generalization and concept composition tasks, likely due to its explicit decomposition strategies. At the same time, REGISM frequently matches or surpasses ExeDec's performance across various scenarios, with its solutions often aligning more closely with ground truth decompositions. These observations highlight the importance of repeated execution-guided synthesis in driving task-solving performance, even within frameworks that incorporate explicit decomposition strategies. Our analysis suggests that task decomposition approaches like ExeDec hold significant potential for advancing program synthesis, though further work is needed to clarify when and why these strategies are most effective. △ Less

Submitted 20 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

Comments: Accepted at ICLR 2025 Workshop Deep Learning for Code

arXiv:2502.21068 [pdf, other]

GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping

Authors: Kristian Kolthoff, Felix Kretzer, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto

Abstract: GUI prototyping serves as one of the most valuable techniques for enhancing the elicitation of requirements and facilitating the visualization and refinement of customer needs. While GUI prototyping has a positive impact on the software development process, it simultaneously demands significant effort and resources. The emergence of Large Language Models (LLMs) with their impressive code generatio… ▽ More GUI prototyping serves as one of the most valuable techniques for enhancing the elicitation of requirements and facilitating the visualization and refinement of customer needs. While GUI prototyping has a positive impact on the software development process, it simultaneously demands significant effort and resources. The emergence of Large Language Models (LLMs) with their impressive code generation capabilities offers a promising approach for automating GUI prototyping. Despite their potential, there is a gap between current LLM-based prototyping solutions and traditional user-based GUI prototyping approaches which provide visual representations of the GUI prototypes and direct editing functionality. In contrast, LLMs and related generative approaches merely produce text sequences or non-editable image output, which lacks both mentioned aspects and therefore impede supporting GUI prototyping. Moreover, minor changes requested by the user typically lead to an inefficient regeneration of the entire GUI prototype when using LLMs directly. In this work, we propose GUIDE, a novel LLM-driven GUI generation decomposition approach seamlessly integrated into the popular prototyping framework Figma. Our approach initially decomposes high-level GUI descriptions into fine-granular GUI requirements, which are subsequently translated into Material Design GUI prototypes, enabling higher controllability and more efficient adaption of changes. To efficiently conduct prompting-based generation of Material Design GUI prototypes, we propose a retrieval-augmented generation approach to integrate the component library. Our preliminary evaluation demonstrates the effectiveness of GUIDE in bridging the gap between LLM generation capabilities and traditional GUI prototyping workflows, offering a more effective and controlled user-based approach to LLM-driven GUI prototyping. Video: https://youtu.be/C9RbhMxqpTU △ Less

Submitted 28 February, 2025; originally announced February 2025.

arXiv:2501.08925 [pdf, other]

Disentangling Exploration of Large Language Models by Optimal Exploitation

Authors: Tim Grams, Patrick Betz, Christian Bartelt

Abstract: Exploration is a crucial skill for self-improvement and open-ended problem-solving. However, it remains unclear if large language models can effectively explore the state-space within an unknown environment. This work isolates exploration as the sole objective, tasking the agent with delivering information that enhances future returns. Within this framework, we argue that measuring agent returns i… ▽ More Exploration is a crucial skill for self-improvement and open-ended problem-solving. However, it remains unclear if large language models can effectively explore the state-space within an unknown environment. This work isolates exploration as the sole objective, tasking the agent with delivering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation and decompose missing rewards into exploration and exploitation components based on the optimal achievable return. Comprehensive experiments with various models reveal that most struggle to sufficiently explore the state-space and weak exploration is insufficient. We observe a positive correlation between parameter count and exploration performance, with larger models demonstrating superior capabilities. Furthermore, we show that our decomposition provides insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks. △ Less

Submitted 3 February, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

arXiv:2501.06346 [pdf, other]

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Authors: Jannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller

Abstract: Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are multiple languages learned and encoded? In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages.… ▽ More Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are multiple languages learned and encoded? In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages. We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages. We use causal interventions to verify the multilingual nature of these representations; specifically, we show that ablating only multilingual features decreases classifier performance to near-chance across languages. We then use these features to precisely modify model behavior in a machine translation task; this demonstrates both the generality and selectivity of these feature's roles in the network. Our findings suggest that even models trained predominantly on English data can develop robust, cross-lingual abstractions of morphosyntactic concepts. △ Less

Submitted 23 May, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

arXiv:2412.11576 [pdf, other]

DCBM: Data-Efficient Visual Concept Bottleneck Models

Authors: Katharina Prasse, Patrick Knab, Sascha Marton, Christian Bartelt, Margret Keuper

Abstract: Concept Bottleneck Models (CBMs) enhance the interpretability of neural networks by basing predictions on human-understandable concepts. However, current CBMs typically rely on concept sets extracted from large language models or extensive image corpora, limiting their effectiveness in data-sparse scenarios. We propose Data-efficient CBMs (DCBMs), which reduce the need for large sample sizes durin… ▽ More Concept Bottleneck Models (CBMs) enhance the interpretability of neural networks by basing predictions on human-understandable concepts. However, current CBMs typically rely on concept sets extracted from large language models or extensive image corpora, limiting their effectiveness in data-sparse scenarios. We propose Data-efficient CBMs (DCBMs), which reduce the need for large sample sizes during concept generation while preserving interpretability. DCBMs define concepts as image regions detected by segmentation or detection foundation models, allowing each image to generate multiple concepts across different granularities. This removes reliance on textual descriptions and large-scale pre-training, making DCBMs applicable for fine-grained classification and out-of-distribution tasks. Attribution analysis using Grad-CAM demonstrates that DCBMs deliver visual concepts that can be localized in test images. By leveraging dataset-specific concepts instead of predefined ones, DCBMs enhance adaptability to new domains. △ Less

Submitted 4 February, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

arXiv:2412.11328 [pdf, other]

Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation

Authors: Kristian Kolthoff, Felix Kretzer, Lennart Fiebig, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto

Abstract: Graphical user interface (GUI) prototyping represents an essential activity in the development of interactive systems, which are omnipresent today. GUI prototypes facilitate elicitation of requirements and help to test, evaluate, and validate ideas with users and the development team. However, creating GUI prototypes is a time-consuming process and often requires extensive resources. While existin… ▽ More Graphical user interface (GUI) prototyping represents an essential activity in the development of interactive systems, which are omnipresent today. GUI prototypes facilitate elicitation of requirements and help to test, evaluate, and validate ideas with users and the development team. However, creating GUI prototypes is a time-consuming process and often requires extensive resources. While existing research for automatic GUI generation focused largely on resource-intensive training and fine-tuning of LLMs, mainly for low-fidelity GUIs, we investigate the potential and effectiveness of Zero-Shot (ZS) prompting for high-fidelity GUI generation. We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism based on a large-scale GUI repository. In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation. To evaluate the effectiveness of the proposed ZS prompting approaches for GUI generation, we extensively evaluated the accuracy and subjective satisfaction of the generated GUI prototypes. Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation, and provides valuable insights into the defects that are produced by the LLMs in the generated GUI prototypes. △ Less

Submitted 15 December, 2024; originally announced December 2024.

arXiv:2412.05114 [pdf, ps, other]

A*Net and NBFNet Learn Negative Patterns on Knowledge Graphs

Authors: Patrick Betz, Nathanael Stelzner, Christian Meilicke, Heiner Stuckenschmidt, Christian Bartelt

Abstract: In this technical report, we investigate the predictive performance differences of a rule-based approach and the GNN architectures NBFNet and A*Net with respect to knowledge graph completion. For the two most common benchmarks, we find that a substantial fraction of the performance difference can be explained by one unique negative pattern on each dataset that is hidden from the rule-based approac… ▽ More In this technical report, we investigate the predictive performance differences of a rule-based approach and the GNN architectures NBFNet and A*Net with respect to knowledge graph completion. For the two most common benchmarks, we find that a substantial fraction of the performance difference can be explained by one unique negative pattern on each dataset that is hidden from the rule-based approach. Our findings add a unique perspective on the performance difference of different model classes for knowledge graph completion: Models can achieve a predictive performance advantage by penalizing scores of incorrect facts opposed to providing high scores for correct facts. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2409.16388 [pdf, other]

doi 10.1145/3691620.3695350

Self-Elicitation of Requirements with Automated GUI Prototyping

Authors: Kristian Kolthoff, Christian Bartelt, Simone Paolo Ponzetto, Kurt Schneider

Abstract: Requirements Elicitation (RE) is a crucial activity especially in the early stages of software development. GUI prototyping has widely been adopted as one of the most effective RE techniques for user-facing software systems. However, GUI prototyping requires (i) the availability of experienced requirements analysts, (ii) typically necessitates conducting multiple joint sessions with customers and… ▽ More Requirements Elicitation (RE) is a crucial activity especially in the early stages of software development. GUI prototyping has widely been adopted as one of the most effective RE techniques for user-facing software systems. However, GUI prototyping requires (i) the availability of experienced requirements analysts, (ii) typically necessitates conducting multiple joint sessions with customers and (iii) creates considerable manual effort. In this work, we propose SERGUI, a novel approach enabling the Self-Elicitation of Requirements (SER) based on an automated GUI prototyping assistant. SERGUI exploits the vast prototyping knowledge embodied in a large-scale GUI repository through Natural Language Requirements (NLR) based GUI retrieval and facilitates fast feedback through GUI prototypes. The GUI retrieval approach is closely integrated with a Large Language Model (LLM) driving the prompting-based recommendation of GUI features for the current GUI prototyping context and thus stimulating the elicitation of additional requirements. We envision SERGUI to be employed in the initial RE phase, creating an initial GUI prototype specification to be used by the analyst as a means for communicating the requirements. To measure the effectiveness of our approach, we conducted a preliminary evaluation. Video presentation of SERGUI at: https://youtu.be/pzAAB9Uht80 △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.01713 [pdf, other]

Interpreting Outliers in Time Series Data through Decoding Autoencoder

Authors: Patrick Knab, Sascha Marton, Christian Bartelt, Robert Fuder

Abstract: Outlier detection is a crucial analytical tool in various fields. In critical systems like manufacturing, malfunctioning outlier detection can be costly and safety-critical. Therefore, there is a significant need for explainable artificial intelligence (XAI) when deploying opaque models in such environments. This study focuses on manufacturing time series data from a German automotive supply indus… ▽ More Outlier detection is a crucial analytical tool in various fields. In critical systems like manufacturing, malfunctioning outlier detection can be costly and safety-critical. Therefore, there is a significant need for explainable artificial intelligence (XAI) when deploying opaque models in such environments. This study focuses on manufacturing time series data from a German automotive supply industry. We utilize autoencoders to compress the entire time series and then apply anomaly detection techniques to its latent features. For outlier interpretation, we (i) adopt widely used XAI techniques to the autoencoder's encoder. Additionally, (ii) we propose AEE, Aggregated Explanatory Ensemble, a novel approach that fuses explanations of multiple XAI techniques into a single, more expressive interpretation. For evaluation of explanations, (iii) we propose a technique to measure the quality of encoder explanations quantitatively. Furthermore, we qualitatively assess the effectiveness of outlier explanations with domain expertise. △ Less

Submitted 3 February, 2025; v1 submitted 3 September, 2024; originally announced September 2024.

Comments: 14 pages, 8 figures, accepted at TempXAI @ ECML-PKDD, published in CEUR Workshop Proceedings, Vol. 3761. https://ceur-ws.org/Vol-3761/paper3.pdf

Journal ref: Workshop on TempXAI, CEUR Workshop Proceedings, Vol. 3761, 2024

arXiv:2408.14224 [pdf, other]

Fact Probability Vector Based Goal Recognition

Authors: Nils Wilken, Lea Cohausz, Christian Bartelt, Heiner Stuckenschmidt

Abstract: We present a new approach to goal recognition that involves comparing observed facts with their expected probabilities. These probabilities depend on a specified goal g and initial state s0. Our method maps these probabilities and observed facts into a real vector space to compute heuristic values for potential goals. These values estimate the likelihood of a given goal being the true objective of… ▽ More We present a new approach to goal recognition that involves comparing observed facts with their expected probabilities. These probabilities depend on a specified goal g and initial state s0. Our method maps these probabilities and observed facts into a real vector space to compute heuristic values for potential goals. These values estimate the likelihood of a given goal being the true objective of the observed agent. As obtaining exact expected probabilities for observed facts in an observation sequence is often practically infeasible, we propose and empirically validate a method for approximating these probabilities. Our empirical results show that the proposed approach offers improved goal recognition precision compared to state-of-the-art techniques while reducing computational complexity. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Will be presented at ECAI 2024

arXiv:2408.08761 [pdf, other]

Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization

Authors: Sascha Marton, Tim Grams, Florian Vogt, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challe… ▽ More Reinforcement learning (RL) has seen significant success across various domains, but its adoption is often limited by the black-box nature of neural network policies, making them difficult to interpret. In contrast, symbolic policies allow representing decision-making strategies in a compact and interpretable way. However, learning symbolic policies directly within on-policy methods remains challenging. In this paper, we introduce SYMPOL, a novel method for SYMbolic tree-based on-POLicy RL. SYMPOL employs a tree-based model integrated with a policy gradient method, enabling the agent to learn and adapt its actions while maintaining a high level of interpretability. We evaluate SYMPOL on a set of benchmark RL tasks, demonstrating its superiority over alternative tree-based RL approaches in terms of performance and interpretability. Unlike existing methods, it enables gradient-based, end-to-end learning of interpretable, axis-aligned decision trees within standard on-policy RL algorithms. Therefore, SYMPOL can become the foundation for a new class of interpretable RL based on decision trees. Our implementation is available under: https://github.com/s-marton/sympol △ Less

Submitted 11 March, 2025; v1 submitted 16 August, 2024; originally announced August 2024.

arXiv:2407.02112 [pdf, other]

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data

Authors: Andrej Tschalzev, Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of model-centric evaluation setups with overly standardized data preprocessing. This paper demonstrates that such model-centric evaluations are biased, as real-world modeling… ▽ More Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of model-centric evaluation setups with overly standardized data preprocessing. This paper demonstrates that such model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering. Therefore, we propose a data-centric evaluation framework. We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset. We conduct experiments with different preprocessing pipelines and hyperparameter optimization (HPO) regimes to quantify the impact of model selection, HPO, feature engineering, and test-time adaptation. Our main findings are: 1. After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces. 2. Recent models, despite their measurable progress, still significantly benefit from manual feature engineering. This holds true for both tree-based models and neural networks. 3. While tabular data is typically considered static, samples are often collected over time, and adapting to distribution shifts can be important even in supposedly static data. These insights suggest that research efforts should be directed toward a data-centric perspective, acknowledging that tabular data requires feature engineering and often exhibits temporal characteristics. Our framework is available under: https://github.com/atschalz/dc_tabeval. △ Less

Submitted 18 December, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01115 [pdf, other]

Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods

Authors: Andrej Tschalzev, Paul Nitschke, Lukas Kirchdorfer, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' have been proposed to improve generalization and… ▽ More Neural networks often assume independence among input data samples, disregarding correlations arising from inherent clustering patterns in real-world datasets (e.g., due to different sites or repeated measurements). Recently, mixed effects neural networks (MENNs) which separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' have been proposed to improve generalization and interpretability for clustered data. However, existing methods only allow for approximate quantification of cluster effects and are limited to regression and binary targets with only one clustering feature. We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks. We empirically demonstrate that MC-GMENN outperforms existing mixed effects deep learning models in terms of generalization performance, time complexity, and quantification of inter-cluster variance. Additionally, MC-GMENN is applicable to a wide range of datasets, including multi-class classification tasks with multiple high-cardinality categorical features. For these datasets, we show that MC-GMENN outperforms conventional encoding and embedding methods, simultaneously offering a principled methodology for interpreting the effects of clustering patterns. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.08120 [pdf, other]

Interlinking User Stories and GUI Prototyping: A Semi-Automatic LLM-based Approach

Authors: Kristian Kolthoff, Felix Kretzer, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto

Abstract: Interactive systems are omnipresent today and the need to create graphical user interfaces (GUIs) is just as ubiquitous. For the elicitation and validation of requirements, GUI prototyping is a well-known and effective technique, typically employed after gathering initial user requirements represented in natural language (NL) (e.g., in the form of user stories). Unfortunately, GUI prototyping ofte… ▽ More Interactive systems are omnipresent today and the need to create graphical user interfaces (GUIs) is just as ubiquitous. For the elicitation and validation of requirements, GUI prototyping is a well-known and effective technique, typically employed after gathering initial user requirements represented in natural language (NL) (e.g., in the form of user stories). Unfortunately, GUI prototyping often requires extensive resources, resulting in a costly and time-consuming process. Despite various easy-to-use prototyping tools in practice, there is often a lack of adequate resources for developing GUI prototypes based on given user requirements. In this work, we present a novel Large Language Model (LLM)-based approach providing assistance for validating the implementation of functional NL-based requirements in a GUI prototype embedded in a prototyping tool. In particular, our approach aims to detect functional user stories that are not implemented in a GUI prototype and provides recommendations for suitable GUI components directly implementing the requirements. We collected requirements for existing GUIs in the form of user stories and evaluated our proposed validation and recommendation approach with this dataset. The obtained results are promising for user story validation and we demonstrate feasibility for the GUI component recommendations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.17514 [pdf, other]

AbstractBeam: Enhancing Bottom-Up Program Synthesis using Library Learning

Authors: Janis Zenkner, Lukas Dierkes, Tobias Sesterhenn, Chrisitan Bartelt

Abstract: LambdaBeam is a state-of-the-art, execution-guided algorithm for program synthesis that utilizes higher-order functions, lambda functions, and iterative loops within a Domain-Specific Language (DSL). LambdaBeam generates each program from scratch but does not take advantage of the frequent recurrence of program blocks or subprograms commonly found in specific domains, such as loops for list traver… ▽ More LambdaBeam is a state-of-the-art, execution-guided algorithm for program synthesis that utilizes higher-order functions, lambda functions, and iterative loops within a Domain-Specific Language (DSL). LambdaBeam generates each program from scratch but does not take advantage of the frequent recurrence of program blocks or subprograms commonly found in specific domains, such as loops for list traversal. To address this limitation, we introduce AbstractBeam: a novel program synthesis framework designed to enhance LambdaBeam by leveraging Library Learning. AbstractBeam identifies and integrates recurring program structures into the DSL, optimizing the synthesis process. Our experimental evaluations demonstrate that AbstractBeam statistically significantly (p < 0.05) outperforms LambdaBeam in the integer list manipulation domain. Beyond solving more tasks, AbstractBeam's program synthesis is also more efficient, requiring less time and fewer candidate programs to generate a solution. Furthermore, our findings indicate that Library Learning effectively enhances program synthesis in domains that are not explicitly designed to showcase its advantages, thereby highlighting the broader applicability of Library Learning. △ Less

Submitted 12 September, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2403.07733 [pdf, other]

Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models

Authors: Patrick Knab, Sascha Marton, Christian Bartelt

Abstract: LIME (Local Interpretable Model-agnostic Explanations) is a popular XAI framework for unraveling decision-making processes in vision machine-learning models. The technique utilizes image segmentation methods to identify fixed regions for calculating feature importance scores as explanations. Therefore, poor segmentation can weaken the explanation and reduce the importance of segments, ultimately a… ▽ More LIME (Local Interpretable Model-agnostic Explanations) is a popular XAI framework for unraveling decision-making processes in vision machine-learning models. The technique utilizes image segmentation methods to identify fixed regions for calculating feature importance scores as explanations. Therefore, poor segmentation can weaken the explanation and reduce the importance of segments, ultimately affecting the overall clarity of interpretation. To address these challenges, we introduce the DSEG-LIME (Data-Driven Segmentation LIME) framework, featuring: i) a data-driven segmentation for human-recognized feature generation by foundation model integration, and ii) a user-steered granularity in the hierarchical segmentation procedure through composition. Our findings demonstrate that DSEG outperforms on several XAI metrics on pre-trained ImageNet models and improves the alignment of explanations with human-recognized concepts. The code is available under: https://github. com/patrick-knab/DSEG-LIME △ Less

Submitted 3 February, 2025; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.11917 [pdf, other]

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Authors: Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda, Christian Bartelt

Abstract: Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for behavioral studies. However, these studies do not provide insights into the internal mechanisms driving the observed capabilities. To improve our understanding of th… ▽ More Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for behavioral studies. However, these studies do not provide insights into the internal mechanisms driving the observed capabilities. To improve our understanding of the internal mechanisms of transformers, we present a comprehensive mechanistic analysis of a transformer trained on a synthetic reasoning task. We identify a set of interpretable mechanisms the model uses to solve the task, and validate our findings using correlational and causal evidence. Our results suggest that it implements a depth-bounded recurrent mechanisms that operates in parallel and stores intermediate results in selected token positions. We anticipate that the motifs we identified in our synthetic setting can provide valuable insights into the broader operating principles of transformers and thus provide a basis for understanding more complex models. △ Less

Submitted 29 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2309.17130 [pdf, other]

GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data

Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles,… ▽ More Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets. The method is available under: https://github.com/s-marton/GRANDE △ Less

Submitted 12 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2308.08168 [pdf, other]

Emergent Software Service Platform and its Application in a Smart Mobility Setting

Authors: Nils Wilken, Christoph Knieke, Eric Nyakam, Andreas Rausch, Christian Schindler, Christian Bartelt, Nikolaus Ziebura

Abstract: The development dynamics of digital innovations for industry, business, and society are producing complex system conglomerates that can no longer be designed centrally and hierarchically in classic development processes. Instead, systems are evolving in DevOps processes in which heterogeneous actors act together on an open platform. Influencing and controlling such dynamically and autonomously cha… ▽ More The development dynamics of digital innovations for industry, business, and society are producing complex system conglomerates that can no longer be designed centrally and hierarchically in classic development processes. Instead, systems are evolving in DevOps processes in which heterogeneous actors act together on an open platform. Influencing and controlling such dynamically and autonomously changing system landscapes is currently a major challenge and a fundamental interest of service users and providers, as well as operators of the platform infrastructures. In this paper, we propose an architecture for such an emergent software service platform. A software platform that implements this architecture with the underlying engineering methodology is demonstrated by a smart parking lot scenario. △ Less

Submitted 5 February, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: This paper was presented on The Fifteenth International Conference on Adaptive and Self-Adaptive Systems and Applications (ADAPTIVE 2023)

arXiv:2308.01948 [pdf, other]

A Multidimensional Analysis of Social Biases in Vision Transformers

Authors: Jannik Brinkmann, Paul Swoboda, Christian Bartelt

Abstract: The embedding spaces of image models have been shown to encode a range of social biases such as racism and sexism. Here, we investigate specific factors that contribute to the emergence of these biases in Vision Transformers (ViT). Therefore, we measure the impact of training data, model architecture, and training objectives on social biases in the learned representations of ViTs. Our findings ind… ▽ More The embedding spaces of image models have been shown to encode a range of social biases such as racism and sexism. Here, we investigate specific factors that contribute to the emergence of these biases in Vision Transformers (ViT). Therefore, we measure the impact of training data, model architecture, and training objectives on social biases in the learned representations of ViTs. Our findings indicate that counterfactual augmentation training using diffusion-based image editing can mitigate biases, but does not eliminate them. Moreover, we find that larger models are less biased than smaller models, and that models trained using discriminative objectives are less biased than those trained using generative objectives. In addition, we observe inconsistencies in the learned social biases. To our surprise, ViTs can exhibit opposite biases when trained on the same data set using different self-supervised objectives. Our findings give insights into the factors that contribute to the emergence of social biases and suggests that we could achieve substantial fairness improvements based on model design choices. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2306.15362 [pdf, other]

doi 10.1007/978-3-031-42608-7_19

Planning Landmark Based Goal Recognition Revisited: Does Using Initial State Landmarks Make Sense?

Authors: Nils Wilken, Lea Cohausz, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios, it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible. However, many early approaches in the area of Plan Recognition As Planning, require quite large amounts of computatio… ▽ More Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios, it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible. However, many early approaches in the area of Plan Recognition As Planning, require quite large amounts of computation time to calculate a solution. Mainly to address this issue, recently, Pereira et al. developed an approach that is based on planning landmarks and is much more computationally efficient than previous approaches. However, the approach, as proposed by Pereira et al., also uses trivial landmarks (i.e., facts that are part of the initial state and goal description are landmarks by definition). In this paper, we show that it does not provide any benefit to use landmarks that are part of the initial state in a planning landmark based goal recognition approach. The empirical results show that omitting initial state landmarks for goal recognition improves goal recognition performance. △ Less

Submitted 10 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Full publication: Wilken, N., Cohausz, L., Bartelt, C., Stuckenschmidt, H. (2023). Planning Landmark Based Goal Recognition Revisited: Does Using Initial State Landmarks Make Sense?. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. arXiv admin note: text overlap with arXiv:2301.10571

arXiv:2305.03515 [pdf, other]

GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent

Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable. Therefore, common approaches learn DTs using a greedy growth algorithm that minimizes the impurity locally at each internal node. Unfortunately, this greedy procedure ca… ▽ More Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable. Therefore, common approaches learn DTs using a greedy growth algorithm that minimizes the impurity locally at each internal node. Unfortunately, this greedy procedure can lead to inaccurate trees. In this paper, we present a novel approach for learning hard, axis-aligned DTs with gradient descent. The proposed method uses backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters. Our approach outperforms existing methods on binary classification benchmarks and achieves competitive results for multi-class tasks. The method is available under: https://github.com/s-marton/GradTree △ Less

Submitted 19 August, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2301.10571 [pdf, other]

Leveraging Planning Landmarks for Hybrid Online Goal Recognition

Authors: Nils Wilken, Lea Cohausz, Johannes Schaum, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible and with minimal domain knowledge. Hence, in this paper, we propose a hybrid method for online goal recognition that co… ▽ More Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible and with minimal domain knowledge. Hence, in this paper, we propose a hybrid method for online goal recognition that combines a symbolic planning landmark based approach and a data-driven goal recognition approach and evaluate it in a real-world cooking scenario. The empirical results show that the proposed method is not only significantly more efficient in terms of computation time than the state-of-the-art but also improves goal recognition performance. Furthermore, we show that the utilized planning landmark based approach, which was so far only evaluated on artificial benchmark domains, achieves also good recognition performance when applied to a real-world cooking scenario. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 9 pages. Presented at SPARK 2022 (https://icaps22.icaps-conference.org/workshops/SPARK/)

arXiv:2207.08414 [pdf, other]

Outlier Explanation via Sum-Product Networks

Authors: Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Outlier explanation is the task of identifying a set of features that distinguish a sample from normal data, which is important for downstream (human) decision-making. Existing methods are based on beam search in the space of feature subsets. They quickly becomes computationally expensive, as they require to run an outlier detection algorithm from scratch for each feature subset. To alleviate this… ▽ More Outlier explanation is the task of identifying a set of features that distinguish a sample from normal data, which is important for downstream (human) decision-making. Existing methods are based on beam search in the space of feature subsets. They quickly becomes computationally expensive, as they require to run an outlier detection algorithm from scratch for each feature subset. To alleviate this problem, we propose a novel outlier explanation algorithm based on Sum-Product Networks (SPNs), a class of probabilistic circuits. Our approach leverages the tractability of marginal inference in SPNs to compute outlier scores in feature subsets. By using SPNs, it becomes feasible to perform backwards elimination instead of the usual forward beam search, which is less susceptible to missing relevant features in an explanation, especially when the number of features is large. We empirically show that our approach achieves state-of-the-art results for outlier explanation, outperforming recent search-based as well as deep learning-based explanation methods △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.04891 [pdf, other]

doi 10.1007/s10994-023-06428-4

Explaining Neural Networks without Access to Training Data

Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt

Abstract: We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps netwo… ▽ More We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the $\mathcal{I}$-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding $\mathcal{I}$-Net output layers. Furthermore, we make $\mathcal{I}$-Nets applicable to real-world tasks by considering more realistic distributions when generating the $\mathcal{I}$-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Journal ref: Machine Learning (2024)

arXiv:2110.05165 [pdf, other]

Exchangeability-Aware Sum-Product Networks

Authors: Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

Abstract: Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which… ▽ More Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts. △ Less

Submitted 28 April, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: accepted at IJCAI 2022

arXiv:2012.06006 [pdf, other]

xRAI: Explainable Representations through AI

Authors: Christiann Bartelt, Sascha Marton, Heiner Stuckenschmidt

Abstract: We present xRAI an approach for extracting symbolic representations of the mathematical function a neural network was supposed to learn from the trained network. The approach is based on the idea of training a so-called interpretation network that receives the weights and biases of the trained network as input and outputs the numerical representation of the function the network was supposed to lea… ▽ More We present xRAI an approach for extracting symbolic representations of the mathematical function a neural network was supposed to learn from the trained network. The approach is based on the idea of training a so-called interpretation network that receives the weights and biases of the trained network as input and outputs the numerical representation of the function the network was supposed to learn that can be directly translated into a symbolic representation. We show that interpretation nets for different classes of functions can be trained on synthetic data offline using Boolean functions and low-order polynomials as examples. We show that the training is rather efficient and the quality of the results are promising. Our work aims to provide a contribution to the problem of better understanding neural decision making by making the target function explicit △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 8 pages, 6 figures, 3 tables

arXiv:1409.6587 [pdf]

doi 10.1109/ICGSE.2009.52

Orchestration of Global Software Engineering Projects

Authors: Christian Bartelt, Manfred Broy, Christoph Herrmann, Eric Knauss, Marco Kuhrmann, Andreas Rausch, Bernhard Rumpe, Kurt Schneider

Abstract: Global software engineering has become a fact in many companies due to real necessity in practice. In contrast to co-located projects global projects face a number of additional software engineering challenges. Among them quality management has become much more difficult and schedule and budget overruns can be observed more often. Compared to co-located projects global software engineering is even… ▽ More Global software engineering has become a fact in many companies due to real necessity in practice. In contrast to co-located projects global projects face a number of additional software engineering challenges. Among them quality management has become much more difficult and schedule and budget overruns can be observed more often. Compared to co-located projects global software engineering is even more challenging due to the need for integration of different cultures, different languages, and different time zones - across companies, and across countries. The diversity of development locations on several levels seriously endangers an effective and goal-oriented progress of projects. In this position paper we discuss reasons for global development, sketch settings for distribution and views of orchestration of dislocated companies in a global project that can be seen as a "virtual project environment". We also present a collection of questions, which we consider relevant for global software engineering. The questions motivate further discussion to derive a research agenda in global software engineering. △ Less

Submitted 22 September, 2014; originally announced September 2014.

Comments: 6 pages, 5 figures

Journal ref: Proceedings of the Third International Workshop on Tool Support Development and Management in Distributed Software Projects, collocated with the Fourth IEEE International Conference on Global Software Engineering ICGSE 2009

Showing 1–31 of 31 results for author: Bartelt, C