Search | arXiv e-print repository

arXiv:2506.09913 [pdf, ps, other]

A Note on the Reliability of Goal-Oriented Error Estimates for Galerkin Finite Element Methods with Nonlinear Functionals

Authors: Brian N. Granzow, Stephen D. Bond, D. Thomas Seidl, Bernhard Endtmayer

Abstract: We consider estimating the discretization error in a nonlinear functional $J(u)$ in the setting of an abstract variational problem: find $u \in \mathcal{V}$ such that $B(u,\varphi) = L(\varphi) \; \forall \varphi \in \mathcal{V}$, as approximated by a Galerkin finite element method. Here, $\mathcal{V}$ is a Hilbert space, $B(\cdot,\cdot)$ is a bilinear form, and $L(\cdot)$ is a linear functional.… ▽ More We consider estimating the discretization error in a nonlinear functional $J(u)$ in the setting of an abstract variational problem: find $u \in \mathcal{V}$ such that $B(u,\varphi) = L(\varphi) \; \forall \varphi \in \mathcal{V}$, as approximated by a Galerkin finite element method. Here, $\mathcal{V}$ is a Hilbert space, $B(\cdot,\cdot)$ is a bilinear form, and $L(\cdot)$ is a linear functional. We consider well-known error estimates $η$ of the form $J(u) - J(u_h) \approx η= L(z) - B(u_h, z)$, where $u_h$ denotes a finite element approximation to $u$, and $z$ denotes the solution to an auxiliary adjoint variational problem. We show that there exist nonlinear functionals for which error estimates of this form are not reliable, even in the presence of an exact adjoint solution solution $z$. An estimate $η$ is said to be reliable if there exists a constant $C \in \mathbb{R}_{>0}$ independent of $u_h$ such that $|J(u) - J(u_h)| \leq C|η|$. We present several example pairs of bilinear forms and nonlinear functionals where reliability of $η$ is not achieved. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: 6 pages

Report number: SAND2025-07143O

arXiv:2505.23798 [pdf, other]

My Answer Is NOT 'Fair': Mitigating Social Bias in Vision-Language Models via Fair and Biased Residuals

Authors: Jian Lan, Yifei Fu, Udo Schlegel, Gengyuan Zhang, Tanveer Hannan, Haokun Chen, Thomas Seidl

Abstract: Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model's response and probability distribution. To do so, we first evaluate four state-of… ▽ More Social bias is a critical issue in large vision-language models (VLMs), where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield social bias in generative responses. In this study, we focus on evaluating and mitigating social bias on both the model's response and probability distribution. To do so, we first evaluate four state-of-the-art VLMs on PAIRS and SocialCounterfactuals datasets with the multiple-choice selection task. Surprisingly, we find that models suffer from generating gender-biased or race-biased responses. We also observe that models are prone to stating their responses are fair, but indeed having mis-calibrated confidence levels towards particular social groups. While investigating why VLMs are unfair in this study, we observe that VLMs' hidden layers exhibit substantial fluctuations in fairness levels. Meanwhile, residuals in each layer show mixed effects on fairness, with some contributing positively while some lead to increased bias. Based on these findings, we propose a post-hoc method for the inference stage to mitigate social bias, which is training-free and model-agnostic. We achieve this by ablating bias-associated residuals while amplifying fairness-associated residuals on model hidden layers during inference. We demonstrate that our post-hoc method outperforms the competing training strategies, helping VLMs have fairer responses and more reliable confidence levels. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2503.19782 [pdf, other]

A comparative study of calibration techniques for finite strain elastoplasticity: Numerically-exact sensitivities for FEMU and VFM

Authors: Sanjeev Kumar, D. Thomas Seidl, Brian N. Granzow, Jin Yang, Jan N. Fuhg

Abstract: Accurate identification of material parameters is crucial for predictive modeling in computational mechanics. The two primary approaches in the experimental mechanics' community for calibration from full-field digital image correlation data are known as finite element model updating (FEMU) and the virtual fields method (VFM). In VFM, the objective function is a squared mismatch between internal an… ▽ More Accurate identification of material parameters is crucial for predictive modeling in computational mechanics. The two primary approaches in the experimental mechanics' community for calibration from full-field digital image correlation data are known as finite element model updating (FEMU) and the virtual fields method (VFM). In VFM, the objective function is a squared mismatch between internal and external virtual work or power. In FEMU, the objective function quantifies the weighted mismatch between model predictions and corresponding experimentally measured quantities of interest. It is minimized by iteratively updating the parameters of an FE model. While FEMU is seen as more flexible, VFM is commonly used instead of FEMU due to its considerably greater computational expense. However, comparisons between the two methods usually involve approximations of gradients or sensitivities with finite difference schemes, thereby making direct assessments difficult. Hence, in this study, we rigorously compare VFM and FEMU in the context of numerically-exact sensitivities obtained through local sensitivity analyses and the application of automatic differentiation software. To this end, both methods are tested on a finite strain elastoplasticity model. We conduct a series of test cases to assess both methods' robustness under practical challenges. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 44 pages, 15 figures

MSC Class: 74C15

arXiv:2503.00268 [pdf, other]

Input Specific Neural Networks

Authors: Asghar A. Jadoon, D. Thomas Seidl, Reese E. Jones, Jan N. Fuhg

Abstract: The black-box nature of neural networks limits the ability to encode or impose specific structural relationships between inputs and outputs. While various studies have introduced architectures that ensure the network's output adheres to a particular form in relation to certain inputs, the majority of these approaches impose constraints on only a single set of inputs. This paper introduces a novel… ▽ More The black-box nature of neural networks limits the ability to encode or impose specific structural relationships between inputs and outputs. While various studies have introduced architectures that ensure the network's output adheres to a particular form in relation to certain inputs, the majority of these approaches impose constraints on only a single set of inputs. This paper introduces a novel neural network architecture, termed the Input Specific Neural Network (ISNN), which extends this concept by allowing scalar-valued outputs to be subject to multiple constraints. Specifically, the ISNN can enforce convexity in some inputs, non-decreasing monotonicity combined with convexity with respect to others, and simple non-decreasing monotonicity or arbitrary relationships with additional inputs. The paper presents two distinct ISNN architectures, along with equations for the first and second derivatives of the output with respect to the inputs. These networks are broadly applicable. In this work, we restrict their usage to solving problems in computational mechanics. In particular, we show how they can be effectively applied to fitting data-driven constitutive models. We then embed our trained data-driven constitutive laws into a finite element solver where significant time savings can be achieved by using explicit manual differentiation using the derived equations as opposed to automatic differentiation. We also show how ISNNs can be used to learn structural relationships between inputs and outputs via a binary gating mechanism. Particularly, ISNNs are employed to model an anisotropic free energy potential to get the homogenized macroscopic response in a decoupled multiscale setting, where the network learns whether or not the potential should be modeled as polyconvex, and retains only the relevant layers while using the minimum number of inputs. △ Less

Submitted 28 February, 2025; originally announced March 2025.

arXiv:2502.10167 [pdf, other]

Modeling and Simulating Emerging Memory Technologies: A Tutorial

Authors: Yun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich

Abstract: Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a crucial role in architectural exploration and hardware-software co-design. This tutorial presents a simulation toolchain through four detailed case studies, showcasing its applicability to various domai… ▽ More Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a crucial role in architectural exploration and hardware-software co-design. This tutorial presents a simulation toolchain through four detailed case studies, showcasing its applicability to various domains of system design, including hybrid main-memory and cache, compute-in-memory, and wear-leveling design. These case studies provide the reader with practical insights on customizing the toolchain for their specific research needs. The source code is open-sourced. △ Less

Submitted 10 March, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

Comments: DFG Priority Program 2377 - Disruptive Memory Technologies

arXiv:2501.04584 [pdf, ps, other]

doi 10.1016/j.commatsci.2025.113885

A Direct-adjoint Approach for Material Point Model Calibration with Application to Plasticity

Authors: Ryan Yan, D. Thomas Seidl, Reese E. Jones, Panayiotis Papadopoulos

Abstract: This paper proposes a new approach for the calibration of material parameters in local elastoplastic constitutive models. The calibration is posed as a constrained optimization problem, where the constitutive model evolution equations for a single material point serve as constraints. The objective function quantifies the mismatch between the stress predicted by the model and corresponding experime… ▽ More This paper proposes a new approach for the calibration of material parameters in local elastoplastic constitutive models. The calibration is posed as a constrained optimization problem, where the constitutive model evolution equations for a single material point serve as constraints. The objective function quantifies the mismatch between the stress predicted by the model and corresponding experimental measurements. To improve calibration efficiency, a novel direct-adjoint approach is presented to compute the Hessian of the objective function, which enables the use of second-order optimization algorithms. Automatic differentiation is used for gradient and Hessian computations. Two numerical examples are employed to validate the Hessian matrices and to demonstrate that the Newton-Raphson algorithm consistently outperforms gradient-based algorithms such as L-BFGS-B. △ Less

Submitted 8 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

Report number: SAND2025-00046O

Journal ref: Computational Materials Science, Volume 255, 2025, 113885

arXiv:2411.14901 [pdf, other]

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Authors: Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu, Thomas Seidl, Gedas Bertasius

Abstract: Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal grounding. Specifically, these VLMs are constrained by frame limitations, often losing essential temporal details needed for accurate event localization in extended video content. We propose ReVisionLLM, a rec… ▽ More Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal grounding. Specifically, these VLMs are constrained by frame limitations, often losing essential temporal details needed for accurate event localization in extended video content. We propose ReVisionLLM, a recursive vision-language model designed to locate events in hour-long videos. Inspired by human search strategies, our model initially targets broad segments of interest, progressively revising its focus to pinpoint exact temporal boundaries. Our model can seamlessly handle videos of vastly different lengths, from minutes to hours. We also introduce a hierarchical training strategy that starts with short clips to capture distinct events and progressively extends to longer videos. To our knowledge, ReVisionLLM is the first VLM capable of temporal grounding in hour-long videos, outperforming previous state-of-the-art methods across multiple datasets by a significant margin (+2.6% [email protected] on MAD). The code is available at https://github.com/Tanveer81/ReVisionLLM. △ Less

Submitted 22 November, 2024; originally announced November 2024.

arXiv:2411.07310 [pdf, other]

Advancements in Constitutive Model Calibration: Leveraging the Power of Full-Field DIC Measurements and In-Situ Load Path Selection for Reliable Parameter Inference

Authors: Denielle Ricciardi, D. Tom Seidl, Brian Lester, Amanda Jones, Elizabeth Jones

Abstract: Accurate material characterization and model calibration are essential for computationally-supported engineering decisions. Current characterization and calibration methods (1) use simplified test specimen geometries and global data, (2) cannot guarantee that sufficient characterization data is collected for a specific model of interest, (3) use deterministic methods that provide best-fit paramete… ▽ More Accurate material characterization and model calibration are essential for computationally-supported engineering decisions. Current characterization and calibration methods (1) use simplified test specimen geometries and global data, (2) cannot guarantee that sufficient characterization data is collected for a specific model of interest, (3) use deterministic methods that provide best-fit parameter values with no uncertainty quantification, and (4) are sequential, inflexible, and time-consuming. This work brings together several recent advancements into an improved workflow called Interlaced Characterization and Calibration that advances the state-of-the-art in constitutive model calibration. The ICC paradigm (1) efficiently uses full-field data to calibrate a high-fidelity material model, (2) aligns the data needed with the data collected with an optimal experimental design protocol, (3) quantifies parameter uncertainty through Bayesian inference, and (4) incorporates these advances into a quasi real-time feedback loop. The ICC framework is demonstrated on the calibration of a material model using simulated full-field data for an aluminum cruciform specimen being deformed bi-axially. The cruciform is actively driven through the myopically optimal load path using Bayesian optimal experimental design, which selects load steps that yield the maximum expected information gain. To aid in numerical stability and preserve computational resources, the full-field data is dimensionally reduced via principal component analysis, and fast surrogate models which approximate the input-output relationships of the expensive finite element model are used. The tools demonstrated here show that high-fidelity constitutive models can be efficiently and reliably calibrated with quantified uncertainty, thus supporting credible decision-making and potentially increasing the agility of solid mechanics modeling. △ Less

Submitted 22 April, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

Comments: 53 pages, 37 figures

Report number: SAND2024-15320O

arXiv:2410.09491 [pdf, other]

Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters

Authors: Collin Leiber, Niklas Strauß, Matthias Schubert, Thomas Seidl

Abstract: Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of cluste… ▽ More Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting from a given upper bound, is able to estimate the number of clusters. To the best of our knowledge, it is the first method that can be easily combined with various deep clustering algorithms. We demonstrate the applicability of our approach by combining UNSEEN with the popular deep clustering algorithms DCN, DEC, and DKM and verify its effectiveness through an extensive experimental evaluation on several image and tabular datasets. Moreover, we perform numerous ablations to analyze our approach and show the importance of its components. The code is available at: https://github.com/collinleiber/UNSEEN △ Less

Submitted 12 October, 2024; originally announced October 2024.

Comments: Acceppted at the Sixth ICDM Workshop on Deep Learning and Clustering

arXiv:2409.18735 [pdf, other]

Autoregressive Policy Optimization for Constrained Allocation Tasks

Authors: David Winkel, Niklas Strauß, Maximilian Bernhard, Zongyue Li, Thomas Seidl, Matthias Schubert

Abstract: Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times… ▽ More Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30\% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark. Our code is available at: https://github.com/niklasdbs/paspo △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: Accepted at NeurIPS 2024

arXiv:2409.09061 [pdf, other]

Eliminating Timing Anomalies in Scheduling Periodic Segmented Self-Suspending Tasks with Release Jitter

Authors: Ching-Chi Lin, Mario Günzel, Junjie Shi, Tristan Taylan Seidl, Kuan-Hsun Chen, Jian-Jia Chen

Abstract: Ensuring timing guarantees for every individual tasks is critical in real-time systems. Even for periodic tasks, providing timing guarantees for tasks with segmented self-suspending behavior is challenging due to timing anomalies, i.e., the reduction of execution or suspension time of some jobs increases the response time of another job. The release jitter of tasks can add further complexity to th… ▽ More Ensuring timing guarantees for every individual tasks is critical in real-time systems. Even for periodic tasks, providing timing guarantees for tasks with segmented self-suspending behavior is challenging due to timing anomalies, i.e., the reduction of execution or suspension time of some jobs increases the response time of another job. The release jitter of tasks can add further complexity to the situation, affecting the predictability and timing guarantees of real-time systems. The existing worst-case response time analyses for sporadic self-suspending tasks are only over-approximations and lead to overly pessimistic results. In this work, we address timing anomalies without compromising the worst-case response time (WCRT) analysis when scheduling periodic segmented self-suspending tasks with release jitter. We propose two treatments: segment release time enforcement and segment priority modification, and prove their effectiveness in eliminating timing anomalies. Our evaluation demonstrates that the proposed treatments achieve higher acceptance ratios in terms of schedulability compared to state-of-the-art scheduling algorithms. Additionally, we implement the segment-level fixed-priority scheduling mechanism on RTEMS and verify the validity of our segment priority modification treatment. This work expands our previous conference publication at the 29th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2023), which considers only periodic segmented self-suspending tasks without release jitter. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: This is an extension from a previous conference publication at the 29th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2023)

arXiv:2406.17322 [pdf, other]

ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data

Authors: Valentin Margraf, Marcel Wever, Sandra Gilhuber, Gabriel Marques Tavares, Thomas Seidl, Eyke Hüllermeier

Abstract: In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms' efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized be… ▽ More In settings where only a budgeted amount of labeled data can be afforded, active learning seeks to devise query strategies for selecting the most informative data points to be labeled, aiming to enhance learning algorithms' efficiency and performance. Numerous such query strategies have been proposed and compared in the active learning literature. However, the community still lacks standardized benchmarks for comparing the performance of different query strategies. This particularly holds for the combination of query strategies with different learning algorithms into active learning pipelines and examining the impact of the learning algorithm choice. To close this gap, we propose ALPBench, which facilitates the specification, execution, and performance monitoring of active learning pipelines. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, ALPBench consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. To demonstrate its usefulness and broad compatibility with various learning algorithms and query strategies, we conduct an exemplary study evaluating 9 query strategies paired with 8 learning algorithms in 2 different settings. We provide ALPBench here: https://github.com/ValentinMargraf/ActiveLearningPipelines. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2404.10683 [pdf, other]

doi 10.3233/FAIA230573

Simplex Decomposition for Portfolio Allocation Constraints in Reinforcement Learning

Authors: David Winkel, Niklas Strauß, Matthias Schubert, Thomas Seidl

Abstract: Portfolio optimization tasks describe sequential decision problems in which the investor's wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio's exposure to a certain sector due to environmental concerns. Although methods for constrained Rei… ▽ More Portfolio optimization tasks describe sequential decision problems in which the investor's wealth is distributed across a set of assets. Allocation constraints are used to enforce minimal or maximal investments into particular subsets of assets to control for objectives such as limiting the portfolio's exposure to a certain sector due to environmental concerns. Although methods for constrained Reinforcement Learning (CRL) can optimize policies while considering allocation constraints, it can be observed that these general methods yield suboptimal results. In this paper, we propose a novel approach to handle allocation constraints based on a decomposition of the constraint action space into a set of unconstrained allocation problems. In particular, we examine this approach for the case of two constraints. For example, an investor may wish to invest at least a certain percentage of the portfolio into green technologies while limiting the investment in the fossil energy sector. We show that the action space of the task is equivalent to the decomposed action space, and introduce a new reinforcement learning (RL) approach CAOSD, which is built on top of the decomposition. The experimental evaluation on real-world Nasdaq-100 data demonstrates that our approach consistently outperforms state-of-the-art CRL benchmarks for portfolio optimization. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Journal ref: ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Krakow, Poland

arXiv:2312.06729 [pdf, other]

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos

Authors: Tanveer Hannan, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius

Abstract: Locating specific moments within long videos (20-120 minutes) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5-30 seconds) grounding methods to this problem yields poor performance. Since most real life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages… ▽ More Locating specific moments within long videos (20-120 minutes) presents a significant challenge, akin to finding a needle in a haystack. Adapting existing short video (5-30 seconds) grounding methods to this problem yields poor performance. Since most real life videos, such as those on YouTube and AR/VR, are lengthy, addressing this issue is crucial. Existing methods typically operate in two stages: clip retrieval and grounding. However, this disjoint process limits the retrieval module's fine-grained event understanding, crucial for specific moment detection. We propose RGNet which deeply integrates clip retrieval and grounding into a single network capable of processing long videos into multiple granular levels, e.g., clips and frames. Its core component is a novel transformer encoder, RG-Encoder, that unifies the two stages through shared features and mutual optimization. The encoder incorporates a sparse attention mechanism and an attention loss to model both granularity jointly. Moreover, we introduce a contrastive clip sampling technique to mimic the long video paradigm closely during training. RGNet surpasses prior methods, showcasing state-of-the-art performance on long video temporal grounding (LVTG) datasets MAD and Ego4D. △ Less

Submitted 13 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: The code is released at https://github.com/Tanveer81/RGNet

arXiv:2308.10702 [pdf, other]

Bayesian Optimal Experimental Design for Constitutive Model Calibration

Authors: Denielle Ricciardi, Tom Seidl, Brian Lester, Amanda Jones, Elizabeth Jones

Abstract: Computational simulation is increasingly relied upon for high-consequence engineering decisions, and a foundational element to solid mechanics simulations, such as finite element analysis (FEA), is a credible constitutive or material model. Calibration of these complex models is an essential step; however, the selection, calibration and validation of material models is often a discrete, multi-stag… ▽ More Computational simulation is increasingly relied upon for high-consequence engineering decisions, and a foundational element to solid mechanics simulations, such as finite element analysis (FEA), is a credible constitutive or material model. Calibration of these complex models is an essential step; however, the selection, calibration and validation of material models is often a discrete, multi-stage process that is decoupled from material characterization activities, which means the data collected does not always align with the data that is needed. To address this issue, an integrated workflow for delivering an enhanced characterization and calibration procedure (Interlaced Characterization and Calibration (ICC)) is introduced. This framework leverages Bayesian optimal experimental design (BOED) to select the optimal load path for a cruciform specimen in order to collect the most informative data for model calibration. The critical first piece of algorithm development is to demonstrate the active experimental design for a fast model with simulated data. For this demonstration, a material point simulator that models a plane stress elastoplastic material subject to bi-axial loading was chosen. The ICC framework is demonstrated on two exemplar problems in which BOED is used to determine which load step to take, e.g., in which direction to increment the strain, at each iteration of the characterization and calibration cycle. Calibration results from data obtained by adaptively selecting the load path within the ICC algorithm are compared to results from data generated under two naive static load paths that were chosen a priori based on human intuition. In these exemplar problems, data generated in an adaptive setting resulted in calibrated model parameters with reduced measures of uncertainty compared to the static settings. △ Less

Submitted 26 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 39 pages, 13 figures

arXiv:2308.08224 [pdf, other]

How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning

Authors: Sandra Gilhuber, Rasmus Hvingelby, Mang Ling Ada Fok, Thomas Seidl

Abstract: Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on… ▽ More Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on well-established benchmark datasets that can overestimate the external validity. However, the literature lacks sufficient research on the performance of active semi-supervised learning methods in realistic data scenarios, leaving a notable gap in our understanding. Therefore we present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity. These challenges can hurt SSL performance due to confirmation bias. We conduct experiments with SSL and AL on simulated data challenges and find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning. In contrast, we demonstrate that AL can overcome confirmation bias in SSL in these realistic settings. Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges, which is a promising direction for robust methods when learning with limited labeled data in real-world applications. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: Accepted @ ECML PKDD 2023. This is the author's version of the work. The definitive Version of Record will be published in the Proceedings of ECML PKDD 2023

arXiv:2308.00146 [pdf, other]

doi 10.1007/978-3-031-43412-9_5

DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification

Authors: Sandra Gilhuber, Julian Busch, Daniel Rotthues, Christian M. M. Frey, Thomas Seidl

Abstract: Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persisten… ▽ More Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: Accepted @ ECML PKDD 2023. This is the author's version of the work. The definitive Version of Record will be published in the Proceedings of ECML PKDD 2023

Journal ref: ECML PKDD 2023: Machine Learning and Knowledge Discovery in Databases: Research Track pp 75-91

arXiv:2305.17096 [pdf, other]

GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

Authors: Tanveer Hannan, Rajat Koner, Maximilian Bernhard, Suprosanna Shit, Bjoern Menze, Volker Tresp, Matthias Schubert, Thomas Seidl

Abstract: Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic… ▽ More Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as \textbf{GRAtt} block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods. Code is available at \url{https://github.com/Tanveer81/GRAttVIS}. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 14 pages, 5 tables, 9 figures

arXiv:2305.15285 [pdf, other]

doi 10.1016/j.cma.2023.116364

Linearization Errors in Discrete Goal-Oriented Error Estimation

Authors: Brian N. Granzow, D. Thomas Seidl, Stephen D. Bond

Abstract: This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlin… ▽ More This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlinear variational form and the nonlinear functional of interest which introduces linearization errors into the error estimate. In this paper, we investigate these linearization errors. In particular, we develop a novel discrete goal-oriented error estimate that accounts for traditionally neglected nonlinear terms at the expense of greater computational cost. We demonstrate how this error estimate can be used to drive mesh adaptivity. We show that accounting for linearization errors in the error estimate can improve its effectivity for several nonlinear model problems and quantities of interest. We also demonstrate that an adaptive strategy based on the newly proposed estimate can lead to more accurate approximations of the nonlinear functional with fewer degrees of freedom when compared to uniform refinement and traditional adjoint-based approaches. △ Less

Submitted 19 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Report number: SAND2023-03991O

arXiv:2208.10547 [pdf, other]

InstanceFormer: An Online Video Instance Segmentation Framework

Authors: Rajat Koner, Tanveer Hannan, Suprosanna Shit, Sahand Sharifzadeh, Matthias Schubert, Thomas Seidl, Volker Tresp

Abstract: Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transfor… ▽ More Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transformer-based efficient online VIS framework named InstanceFormer, which is especially suitable for long and challenging videos. We propose three novel components to model short-term and long-term dependency and temporal coherence. First, we propagate the representation, location, and semantic information of prior instances to model short-term changes. Second, we propose a novel memory cross-attention in the decoder, which allows the network to look into earlier instances within a certain temporal window. Finally, we employ a temporal contrastive loss to impose coherence in the representation of an instance across all frames. Memory attention and temporal coherence are particularly beneficial to long-range dependency modeling, including challenging scenarios like occlusion. The proposed InstanceFormer outperforms previous online benchmark methods by a large margin across multiple datasets. Most importantly, InstanceFormer surpasses offline approaches for challenging and long datasets such as YouTube-VIS-2021 and OVIS. Code is available at https://github.com/rajatkoner08/InstanceFormer. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Report number: InstanceFormer:08-22

Journal ref: Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-2023)

arXiv:2205.09803 [pdf, other]

Towards a Holistic View on Argument Quality Prediction

Authors: Michael Fromm, Max Berrendorf, Johanna Reiml, Isabelle Mayerhofer, Siddharth Bhargava, Evgeniy Faerman, Thomas Seidl

Abstract: Argumentation is one of society's foundational pillars, and, sparked by advances in NLP and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow: they focus on isolated datasets and neglect the inter… ▽ More Argumentation is one of society's foundational pillars, and, sparked by advances in NLP and the vast availability of text data, automated mining of arguments receives increasing attention. A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow: they focus on isolated datasets and neglect the interactions with related argument mining tasks, such as argument identification, evidence detection, or emotional appeal. In this work, we close this gap by approaching argument quality estimation from multiple different angles: Grounded on rich results from thorough empirical evaluations, we assess the generalization capabilities of argument quality estimation across diverse domains, the interplay with related argument mining tasks, and the impact of emotions on perceived argument strength. We find that generalization depends on a sufficient representation of different domains in the training part. In zero-shot transfer and multi-task experiments, we reveal that argument quality is among the more challenging tasks but can improve others. Finally, we show that emotions play a minor role in argument quality than is often assumed. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2109.11319 [pdf, other]

Active Learning for Argument Strength Estimation

Authors: Nataliia Kees, Michael Fromm, Evgeniy Faerman, Thomas Seidl

Abstract: High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficien… ▽ More High-quality arguments are an essential part of decision-making. Automatically predicting the quality of an argument is a complex task that recently got much attention in argument mining. However, the annotation effort for this task is exceptionally high. Therefore, we test uncertainty-based active learning (AL) methods on two popular argument-strength data sets to estimate whether sample-efficient learning can be enabled. Our extensive empirical evaluation shows that uncertainty-based acquisition functions can not surpass the accuracy reached with the random acquisition on these data sets. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2108.04962 [pdf, other]

Adaptive Multi-Resolution Attention with Linear Complexity

Authors: Yao Zhang, Yunpu Ma, Thomas Seidl, Volker Tresp

Abstract: Transformers have improved the state-of-the-art across numerous tasks in sequence modeling. Besides the quadratic computational and memory complexity w.r.t the sequence length, the self-attention mechanism only processes information at the same scale, i.e., all attention heads are in the same resolution, resulting in the limited power of the Transformer. To remedy this, we propose a novel and effi… ▽ More Transformers have improved the state-of-the-art across numerous tasks in sequence modeling. Besides the quadratic computational and memory complexity w.r.t the sequence length, the self-attention mechanism only processes information at the same scale, i.e., all attention heads are in the same resolution, resulting in the limited power of the Transformer. To remedy this, we propose a novel and efficient structure named Adaptive Multi-Resolution Attention (AdaMRA for short), which scales linearly to sequence length in terms of time and space. Specifically, we leverage a multi-resolution multi-head attention mechanism, enabling attention heads to capture long-range contextual information in a coarse-to-fine fashion. Moreover, to capture the potential relations between query representation and clues of different attention granularities, we leave the decision of which resolution of attention to use to query, which further improves the model's capacity compared to vanilla Transformer. In an effort to reduce complexity, we adopt kernel attention without degrading the performance. Extensive experiments on several benchmarks demonstrate the effectiveness and efficiency of our model by achieving a state-of-the-art performance-efficiency-memory trade-off. To facilitate AdaMRA utilization by the scientific community, the code implementation will be made publicly available. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: 11 pages

arXiv:2103.03939 [pdf, other]

doi 10.1145/3468791.3468814

NF-GNN: Network Flow Graph Neural Networks for Malware Detection and Classification

Authors: Julian Busch, Anton Kocheturov, Volker Tresp, Thomas Seidl

Abstract: Malicious software (malware) poses an increasing threat to the security of communication systems as the number of interconnected mobile devices increases exponentially. While some existing malware detection and classification approaches successfully leverage network traffic data, they treat network flows between pairs of endpoints independently and thus fail to leverage rich communication patterns… ▽ More Malicious software (malware) poses an increasing threat to the security of communication systems as the number of interconnected mobile devices increases exponentially. While some existing malware detection and classification approaches successfully leverage network traffic data, they treat network flows between pairs of endpoints independently and thus fail to leverage rich communication patterns present in the complete network. Our approach first extracts flow graphs and subsequently classifies them using a novel edge feature-based graph neural network model. We present three variants of our base model, which support malware detection and classification in supervised and unsupervised settings. We evaluate our approach on flow graphs that we extract from a recently published dataset for mobile malware detection that addresses several issues with previously available datasets. Experiments on four different prediction tasks consistently demonstrate the advantages of our approach and show that our graph neural network model can boost detection performance by a significant margin. △ Less

Submitted 4 June, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Journal ref: 33rd International Conference on Scientific and Statistical Database Management (SSDBM 2021)

arXiv:2012.07743 [pdf, other]

doi 10.5281/zenodo.4314390

Argument Mining Driven Analysis of Peer-Reviews

Authors: Michael Fromm, Evgeniy Faerman, Max Berrendorf, Siddharth Bhargava, Ruoxia Qi, Yao Zhang, Lukas Dennert, Sophia Selle, Yang Mao, Thomas Seidl

Abstract: Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all… ▽ More Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision. The process remains interpretable since the extracted arguments can be highlighted in a review without detaching them from their context. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2011.02177 [pdf, ps, other]

Diversity Aware Relevance Learning for Argument Search

Authors: Michael Fromm, Max Berrendorf, Sandra Obermeier, Thomas Seidl, Evgeniy Faerman

Abstract: In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not d… ▽ More In this work, we focus on the problem of retrieving relevant arguments for a query claim covering diverse aspects. State-of-the-art methods rely on explicit mappings between claims and premises, and thus are unable to utilize large available collections of premises without laborious and costly manual annotation. Their diversity approach relies on removing duplicates via clustering which does not directly ensure that the selected premises cover all aspects. This work introduces a new multi-step approach for the argument retrieval problem. Rather than relying on ground-truth assignments, our approach employs a machine learning model to capture semantic relationships between arguments. Beyond that, it aims to cover diverse facets of the query, instead of trying to identify duplicates explicitly. Our empirical evaluation demonstrates that our approach leads to a significant improvement in the argument retrieval task even though it requires less data. △ Less

Submitted 17 March, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

arXiv:2010.12316 [pdf, other]

Matching the Clinical Reality: Accurate OCT-Based Diagnosis From Few Labels

Authors: Valentyn Melnychuk, Evgeniy Faerman, Ilja Manakov, Thomas Seidl

Abstract: Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed MixMatch and FixMatch algor… ▽ More Unlabeled data is often abundant in the clinic, making machine learning methods based on semi-supervised learning a good match for this setting. Despite this, they are currently receiving relatively little attention in medical image analysis literature. Instead, most practitioners and researchers focus on supervised or transfer learning approaches. The recently proposed MixMatch and FixMatch algorithms have demonstrated promising results in extracting useful representations while requiring very few labels. Motivated by these recent successes, we apply MixMatch and FixMatch in an ophthalmological diagnostic setting and investigate how they fare against standard transfer learning. We find that both algorithms outperform the transfer learning baseline on all fractions of labelled data. Furthermore, our experiments show that exponential moving average (EMA) of model parameters, which is a component of both algorithms, is not needed for our classification problem, as disabling it leaves the outcome unchanged. Our code is available online: https://github.com/Valentyn1997/oct-diagn-semi-supervised △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: KDAH-CIKM-2020

arXiv:2010.03649 [pdf, other]

doi 10.1002/nme.6843

Calibration of Elastoplastic Constitutive Model Parameters from Full-field Data with Automatic Differentiation-based Sensitivities

Authors: Daniel Thomas Seidl, Brian Neal Granzow

Abstract: We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The… ▽ More We present a framework for calibration of parameters in elastoplastic constitutive models that is based on the use of automatic differentiation (AD). The model calibration problem is posed as a partial differential equation-constrained optimization problem where a finite element (FE) model of the coupled equilibrium equation and constitutive model evolution equations serves as the constraint. The objective function quantifies the mismatch between the displacement predicted by the FE model and full-field digital image correlation data, and the optimization problem is solved using gradient-based optimization algorithms. Forward and adjoint sensitivities are used to compute the gradient at considerably less cost than its calculation from finite difference approximations. Through the use of AD, we need only to write the constraints in terms of AD objects, where all of the derivatives required for the forward and inverse problems are obtained by appropriately seeding and evaluating these quantities. We present three numerical examples that verify the correctness of the gradient, demonstrate the AD approach's parallel computation capabilities via application to a large-scale FE model, and highlight the formulation's ease of extensibility to other classes of constitutive models. △ Less

Submitted 25 October, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Report number: SAND2021-13432 J

Journal ref: Int J Numer Methods Eng. 2021; 1-32

arXiv:2009.12875 [pdf, other]

Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering

Authors: Julian Busch, Evgeniy Faerman, Matthias Schubert, Thomas Seidl

Abstract: Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Sec… ▽ More Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. In particular, methods relying on the self-expressiveness property have recently proved especially successful. However, they suffer from two major shortcomings: First, a quadratic-size coefficient matrix is learned directly, preventing these methods from scaling beyond small datasets. Secondly, the trained models are transductive and thus cannot be used to cluster out-of-sample data unseen during training. Instead of learning self-expression coefficients directly, we propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Consequently, our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets. In addition, we can formally show that out model is still able to exactly recover subspace clusters given an independence assumption. The siamese architecture in combination with a novel geometric classifier further makes our model inductive, allowing it to cluster out-of-sample data. Additionally, non-linear clusters can be detected by simply adding an auto-encoder module to the architecture. The whole model can then be trained end-to-end in a self-supervised manner. This work in progress reports promising preliminary results on the MNIST dataset. In the spirit of reproducible research, me make all code publicly available. In future work we plan to investigate several extensions of our model and to expand experimental evaluation. △ Less

Submitted 17 December, 2020; v1 submitted 27 September, 2020; originally announced September 2020.

Journal ref: NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice

arXiv:2003.02228 [pdf, other]

doi 10.3233/FAIA200199

PushNet: Efficient and Adaptive Neural Message Passing

Authors: Julian Busch, Jiaxing Pi, Thomas Seidl

Abstract: Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds and consequently suffer from various shortcomings: Propagation schemes are inflexible since they are restricted to $k$-hop neighborhoods and insensitive to actual demands of in… ▽ More Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs. Existing methods perform synchronous message passing along all edges in multiple subsequent rounds and consequently suffer from various shortcomings: Propagation schemes are inflexible since they are restricted to $k$-hop neighborhoods and insensitive to actual demands of information propagation. Further, long-range dependencies cannot be modeled adequately and learned representations are based on correlations of fixed locality. These issues prevent existing methods from reaching their full potential in terms of prediction performance. Instead, we consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence. Our proposed algorithm can equivalently be formulated as a single synchronous message passing iteration using a suitable neighborhood function, thus sharing the advantages of existing methods while addressing their central issues. The resulting neural network utilizes a node-adaptive receptive field derived from meaningful sparse node neighborhoods. In addition, by learning and combining node representations over differently sized neighborhoods, our model is able to capture correlations on multiple scales. We further propose variants of our base model with different inductive bias. Empirical results are provided for semi-supervised node classification on five real-world datasets following a rigorous evaluation protocol. We find that our models outperform competitors on all datasets in terms of accuracy with statistical significance. In some cases, our models additionally provide faster runtime. △ Less

Submitted 17 December, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Journal ref: 24th European Conference on Artificial Intelligence (ECAI 2020)

arXiv:1911.08342 [pdf, ps, other]

doi 10.1007/978-3-030-45442-5_1

Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned

Authors: Max Berrendorf, Evgeniy Faerman, Valentyn Melnychuk, Volker Tresp, Thomas Seidl

Abstract: In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fu… ▽ More In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive. We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets. We believe that people interested in KG matching might profit from our work, as well as novices entering the field △ Less

Submitted 23 January, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

arXiv:1906.00923 [pdf, other]

doi 10.1145/3350546.3352506

TACAM: Topic And Context Aware Argument Mining

Authors: Michael Fromm, Evgeniy Faerman, Thomas Seidl

Abstract: In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them.… ▽ More In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them. The main challenge in the argument recognition task, which is also known as argument mining, is that often sentences containing arguments are structurally similar to purely informative sentences without any stance about the topic. In fact, they only differ semantically. Most approaches use topic or search term information only for the first search step and therefore assume that arguments can be classified independently of a topic. We argue that topic information is crucial for argument mining, since the topic defines the semantic context of an argument. Precisely, we propose different models for the classification of arguments, which take information about a topic of an argument into account. Moreover, to enrich the context of a topic and to let models understand the context of the potential argument better, we integrate information from different external sources such as Knowledge Graphs or pre-trained NLP models. Our evaluation shows that considering topic information, especially in connection with external information, provides a significant performance boost for the argument mining task. △ Less

Submitted 26 August, 2019; v1 submitted 26 May, 2019; originally announced June 2019.

arXiv:1407.3850 [pdf, other]

KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks

Authors: Stephan Günnemann, Hardy Kremer, Matthias Hannen, Thomas Seidl

Abstract: Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, term… ▽ More Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, termed KDD-SC. In contrast to existing subspace clustering toolkits, our solution neither is a standalone product nor is it tightly coupled to a specific KDD framework. Our extension is realized by a common codebase and easy-to-use plugins for three of the most popular KDD frameworks, namely KNIME, RapidMiner, and WEKA. KDD-SC extends these frameworks such that they offer a wide range of different subspace clustering functionalities. It provides a multitude of algorithms, data generators, evaluation measures, and visualization techniques specifically designed for subspace clustering. These functionalities integrate seamlessly with the frameworks' existing features such that they can be flexibly combined. KDD-SC is publicly available on our website. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: 8 pages, 8 figures

Showing 1–33 of 33 results for author: Seidl, T