-
The Automation Advantage in AI Red Teaming
Authors:
Rob Mulla,
Ads Dawson,
Vincent Abruzzon,
Brian Greunke,
Nick Landers,
Brad Palm,
Will Pearce
Abstract:
This paper analyzes Large Language Model (LLM) security vulnerabilities based on data from Crucible, encompassing 214,271 attack attempts by 1,674 users across 30 LLM challenges. Our findings reveal automated approaches significantly outperform manual techniques (69.5% vs 47.6% success rate), despite only 5.2% of users employing automation. We demonstrate that automated approaches excel in systema…
▽ More
This paper analyzes Large Language Model (LLM) security vulnerabilities based on data from Crucible, encompassing 214,271 attack attempts by 1,674 users across 30 LLM challenges. Our findings reveal automated approaches significantly outperform manual techniques (69.5% vs 47.6% success rate), despite only 5.2% of users employing automation. We demonstrate that automated approaches excel in systematic exploration and pattern matching challenges, while manual approaches retain speed advantages in certain creative reasoning scenarios, often solving problems 5x faster when successful. Challenge categories requiring systematic exploration are most effectively targeted through automation, while intuitive challenges sometimes favor manual techniques for time-to-solve metrics. These results illuminate how algorithmic testing is transforming AI red-teaming practices, with implications for both offensive security research and defensive measures. Our analysis suggests optimal security testing combines human creativity for strategy development with programmatic execution for thorough exploration.
△ Less
Submitted 28 April, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
Digital Ecosystem for FAIR Time Series Data Management in Environmental System Science
Authors:
J. Bumberger,
M. Abbrent,
N. Brinckmann,
J. Hemmen,
R. Kunkel,
C. Lorenz,
P. Lünenschloß,
B. Palm,
T. Schnicke,
C. Schulz,
H. van der Schaaf,
D. Schäfer
Abstract:
Addressing the challenges posed by climate change, biodiversity loss, and environmental pollution requires comprehensive monitoring and effective data management strategies that are applicable across various scales in environmental system science. This paper introduces a versatile and transferable digital ecosystem for managing time series data, designed to adhere to the FAIR principles (Findable,…
▽ More
Addressing the challenges posed by climate change, biodiversity loss, and environmental pollution requires comprehensive monitoring and effective data management strategies that are applicable across various scales in environmental system science. This paper introduces a versatile and transferable digital ecosystem for managing time series data, designed to adhere to the FAIR principles (Findable, Accessible, Interoperable, and Reusable). The system is highly adaptable, cloud-ready, and suitable for deployment in a wide range of settings, from small-scale projects to large-scale monitoring initiatives. The ecosystem comprises three core components: the Sensor Management System (SMS) for detailed metadata registration and management; time$.$IO, a platform for efficient time series data storage, transfer, and real-time visualization; and the System for Automated Quality Control (SaQC), which ensures data integrity through real-time analysis and quality assurance. The modular architecture, combined with standardized protocols and interfaces, ensures that the ecosystem can be easily transferred and deployed across different environments and institutions. This approach enhances data accessibility for a broad spectrum of stakeholders, including researchers, policymakers, and the public, while fostering collaboration and advancing scientific research in environmental monitoring.
△ Less
Submitted 17 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Improved Point Estimation for the Rayleigh Regression Model
Authors:
B. G. Palm,
F. M. Bayer,
R. J. Cintra
Abstract:
The Rayleigh regression model was recently proposed for modeling amplitude values of synthetic aperture radar (SAR) image pixels. However, inferences from such model are based on the maximum likelihood estimators, which can be biased for small signal lengths. The Rayleigh regression model for SAR images often takes into account small pixel windows, which may lead to inaccurate results. In this let…
▽ More
The Rayleigh regression model was recently proposed for modeling amplitude values of synthetic aperture radar (SAR) image pixels. However, inferences from such model are based on the maximum likelihood estimators, which can be biased for small signal lengths. The Rayleigh regression model for SAR images often takes into account small pixel windows, which may lead to inaccurate results. In this letter, we introduce bias-adjusted estimators tailored for the Rayleigh regression model based on: (i) the Cox and Snell's method; (ii) the Firth's scheme; and (iii) the parametric bootstrap method. We present numerical experiments considering synthetic and actual SAR data sets. The bias-adjusted estimators yield nearly unbiased estimates and accurate modeling results.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers
Authors:
B. G. Palm,
F. M. Bayer,
R. Machado,
M. I. Pettersson,
V. T. Vu,
R. J. Cintra
Abstract:
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter esti…
▽ More
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Prediction Intervals in the Beta Autoregressive Moving Average Model
Authors:
B. G. Palm,
F. M. Bayer,
R. J. Cintra
Abstract:
In this paper, we propose five prediction intervals for the beta autoregressive moving average model. This model is suitable for modeling and forecasting variables that assume values in the interval $(0,1)$. Two of the proposed prediction intervals are based on approximations considering the normal distribution and the quantile function of the beta distribution. We also consider bootstrap-based pr…
▽ More
In this paper, we propose five prediction intervals for the beta autoregressive moving average model. This model is suitable for modeling and forecasting variables that assume values in the interval $(0,1)$. Two of the proposed prediction intervals are based on approximations considering the normal distribution and the quantile function of the beta distribution. We also consider bootstrap-based prediction intervals, namely: (i) bootstrap prediction errors (BPE) interval; (ii) bias-corrected and acceleration (BCa) prediction interval; and (iii) percentile prediction interval based on the quantiles of the bootstrap-predicted values for two different bootstrapping schemes. The proposed prediction intervals were evaluated according to Monte Carlo simulations. The BCa prediction interval offered the best performance among the evaluated intervals, showing lower coverage rate distortion and small average length. We applied our methodology for predicting the water level of the Cantareira water supply system in São Paulo, Brazil.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Severe Damage Recovery in Evolving Soft Robots through Differentiable Programming
Authors:
Kazuya Horibe,
Kathryn Walker,
Rasmus Berg Palm,
Shyam Sudhakaran,
Sebastian Risi
Abstract:
Biological systems are very robust to morphological damage, but artificial systems (robots) are currently not. In this paper we present a system based on neural cellular automata, in which locomoting robots are evolved and then given the ability to regenerate their morphology from damage through gradient-based training. Our approach thus combines the benefits of evolution to discover a wide range…
▽ More
Biological systems are very robust to morphological damage, but artificial systems (robots) are currently not. In this paper we present a system based on neural cellular automata, in which locomoting robots are evolved and then given the ability to regenerate their morphology from damage through gradient-based training. Our approach thus combines the benefits of evolution to discover a wide range of different robot morphologies, with the efficiency of supervised training for robustness through differentiable update rules. The resulting neural cellular automata are able to grow virtual robots capable of regaining more than 80\% of their functionality, even after severe types of morphological damage.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Autoregressive Model for Multi-Pass SAR Change Detection Based on Image Stacks
Authors:
B. G. Palm,
D. I. Alves,
V. T. Vu,
M. I. Pettersson,
F. M. Bayer,
R. J. Cintra,
R. Machado,
P. Dammert,
H. Hellsten
Abstract:
Change detection is an important synthetic aperture radar (SAR) application, usually used to detect changes on the ground scene measurements in different moments in time. Traditionally, change detection algorithm (CDA) is mainly designed for two synthetic aperture radar (SAR) images retrieved at different instants. However, more images can be used to improve the algorithms performance, witch emerg…
▽ More
Change detection is an important synthetic aperture radar (SAR) application, usually used to detect changes on the ground scene measurements in different moments in time. Traditionally, change detection algorithm (CDA) is mainly designed for two synthetic aperture radar (SAR) images retrieved at different instants. However, more images can be used to improve the algorithms performance, witch emerges as a research topic on SAR change detection. Image stack information can be treated as a data series over time and can be modeled by autoregressive (AR) models. Thus, we present some initial findings on SAR change detection based on image stack considering AR models. Applying AR model for each pixel position in the image stack, we obtained an estimated image of the ground scene which can be used as a reference image for CDA. The experimental results reveal that ground scene estimates by the AR models is accurate and can be used for change detection applications.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Mario Plays on a Manifold: Generating Functional Content in Latent Space through Differential Geometry
Authors:
Miguel González-Duque,
Rasmus Berg Palm,
Søren Hauberg,
Sebastian Risi
Abstract:
Deep generative models can automatically create content of diverse types. However, there are no guarantees that such content will satisfy the criteria necessary to present it to end-users and be functional, e.g. the generated levels could be unsolvable or incoherent. In this paper we study this problem from a geometric perspective, and provide a method for reliable interpolation and random walks i…
▽ More
Deep generative models can automatically create content of diverse types. However, there are no guarantees that such content will satisfy the criteria necessary to present it to end-users and be functional, e.g. the generated levels could be unsolvable or incoherent. In this paper we study this problem from a geometric perspective, and provide a method for reliable interpolation and random walks in the latent spaces of Categorical VAEs based on Riemannian geometry. We test our method with "Super Mario Bros" and "The Legend of Zelda" levels, and against simpler baselines inspired by current practice. Results show that the geometry we propose is better able to interpolate and sample, reliably staying closer to parts of the latent space that decode to playable content.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Physical Neural Cellular Automata for 2D Shape Classification
Authors:
Kathryn Walker,
Rasmus Berg Palm,
Rodrigo Moreno Garcia,
Andres Faina,
Kasper Stoy,
Sebastian Risi
Abstract:
Materials with the ability to self-classify their own shape have the potential to advance a wide range of engineering applications and industries. Biological systems possess the ability not only to self-reconfigure but also to self-classify themselves to determine a general shape and function. Previous work into modular robotics systems has only enabled self-recognition and self-reconfiguration in…
▽ More
Materials with the ability to self-classify their own shape have the potential to advance a wide range of engineering applications and industries. Biological systems possess the ability not only to self-reconfigure but also to self-classify themselves to determine a general shape and function. Previous work into modular robotics systems has only enabled self-recognition and self-reconfiguration into a specific target shape, missing the inherent robustness present in nature to self-classify. In this paper we therefore take advantage of recent advances in deep learning and neural cellular automata, and present a simple modular 2D robotic system that can infer its own class of shape through the local communication of its components. Furthermore, we show that our system can be successfully transferred to hardware which thus opens opportunities for future self-classifying machines. Code available at https://github.com/kattwalker/projectcube. Video available at https://youtu.be/0TCOkE4keyc.
△ Less
Submitted 31 July, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Variational Neural Cellular Automata
Authors:
Rasmus Berg Palm,
Miguel González-Duque,
Shyam Sudhakaran,
Sebastian Risi
Abstract:
In nature, the process of cellular growth and differentiation has lead to an amazing diversity of organisms -- algae, starfish, giant sequoia, tardigrades, and orcas are all created by the same generative process. Inspired by the incredible diversity of this biological generative process, we propose a generative model, the Variational Neural Cellular Automata (VNCA), which is loosely inspired by t…
▽ More
In nature, the process of cellular growth and differentiation has lead to an amazing diversity of organisms -- algae, starfish, giant sequoia, tardigrades, and orcas are all created by the same generative process. Inspired by the incredible diversity of this biological generative process, we propose a generative model, the Variational Neural Cellular Automata (VNCA), which is loosely inspired by the biological processes of cellular growth and differentiation. Unlike previous related works, the VNCA is a proper probabilistic generative model, and we evaluate it according to best practices. We find that the VNCA learns to reconstruct samples well and that despite its relatively few parameters and simple local-only communication, the VNCA can learn to generate a large variety of output from information encoded in a common vector format. While there is a significant gap to the current state-of-the-art in terms of generative modeling performance, we show that the VNCA can learn a purely self-organizing generative process of data. Additionally, we show that the VNCA can learn a distribution of stable attractors that can recover from significant damage.
△ Less
Submitted 2 February, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Fast Game Content Adaptation Through Bayesian-based Player Modelling
Authors:
Miguel González-Duque,
Rasmus Berg Palm,
Sebastian Risi
Abstract:
In games, as well as many user-facing systems, adapting content to users' preferences and experience is an important challenge. This paper explores a novel method to realize this goal in the context of dynamic difficulty adjustment (DDA). Here the aim is to constantly adapt the content of a game to the skill level of the player, keeping them engaged by avoiding states that are either too difficult…
▽ More
In games, as well as many user-facing systems, adapting content to users' preferences and experience is an important challenge. This paper explores a novel method to realize this goal in the context of dynamic difficulty adjustment (DDA). Here the aim is to constantly adapt the content of a game to the skill level of the player, keeping them engaged by avoiding states that are either too difficult or too easy. Current systems for DDA rely on expensive data mining, or on hand-crafted rules designed for particular domains, and usually adapts to keep players in the flow, leaving no room for the designer to present content that is purposefully easy or difficult. This paper presents Fast Bayesian Content Adaption (FBCA), a system for DDA that is agnostic to the domain and that can target particular difficulties. We deploy this framework in two different domains: the puzzle game Sudoku, and a simple Roguelike game. By modifying the acquisition function's optimization, we are reliably able to present a content with a bespoke difficulty for players with different skill levels in less than five iterations for Sudoku and fifteen iterations for the simple Roguelike. Our method significantly outperforms simpler DDA heuristics with the added benefit of maintaining a model of the user. These results point towards a promising alternative for content adaption in a variety of different domains.
△ Less
Submitted 29 June, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
EvoCraft: A New Challenge for Open-Endedness
Authors:
Djordje Grbic,
Rasmus Berg Palm,
Elias Najarro,
Claire Glanois,
Sebastian Risi
Abstract:
This paper introduces EvoCraft, a framework for Minecraft designed to study open-ended algorithms. We introduce an API that provides an open-source Python interface for communicating with Minecraft to place and track blocks. In contrast to previous work in Minecraft that focused on learning to play the game, the grand challenge we pose here is to automatically search for increasingly complex artif…
▽ More
This paper introduces EvoCraft, a framework for Minecraft designed to study open-ended algorithms. We introduce an API that provides an open-source Python interface for communicating with Minecraft to place and track blocks. In contrast to previous work in Minecraft that focused on learning to play the game, the grand challenge we pose here is to automatically search for increasingly complex artifacts in an open-ended fashion. Compared to other environments used to study open-endedness, Minecraft allows the construction of almost any kind of structure, including actuated machines with circuits and mechanical components. We present initial baseline results in evolving simple Minecraft creations through both interactive and automated evolution. While evolution succeeds when tasked to grow a structure towards a specific target, it is unable to find a solution when rewarded for creating a simple machine that moves. Thus, EvoCraft offers a challenging new environment for automated search methods (such as evolution) to find complex artifacts that we hope will spur the development of more open-ended algorithms. A Python implementation of the EvoCraft framework is available at: https://github.com/real-itu/Evocraft-py.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Evolutionary Planning in Latent Space
Authors:
Thor V. A. N. Olesen,
Dennis T. T. Nguyen,
Rasmus Berg Palm,
Sebastian Risi
Abstract:
Planning is a powerful approach to reinforcement learning with several desirable properties. However, it requires a model of the world, which is not readily available in many real-life problems. In this paper, we propose to learn a world model that enables Evolutionary Planning in Latent Space (EPLS). We use a Variational Auto Encoder (VAE) to learn a compressed latent representation of individual…
▽ More
Planning is a powerful approach to reinforcement learning with several desirable properties. However, it requires a model of the world, which is not readily available in many real-life problems. In this paper, we propose to learn a world model that enables Evolutionary Planning in Latent Space (EPLS). We use a Variational Auto Encoder (VAE) to learn a compressed latent representation of individual observations and extend a Mixture Density Recurrent Neural Network (MDRNN) to learn a stochastic, multi-modal forward model of the world that can be used for planning. We use the Random Mutation Hill Climbing (RMHC) to find a sequence of actions that maximize expected reward in this learned model of the world. We demonstrate how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy using the learned world model. After a few iterations of this refinement, our planning agents are better than standard model-free reinforcement learning approaches demonstrating the viability of our approach.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Testing the Genomic Bottleneck Hypothesis in Hebbian Meta-Learning
Authors:
Rasmus Berg Palm,
Elias Najarro,
Sebastian Risi
Abstract:
Hebbian meta-learning has recently shown promise to solve hard reinforcement learning problems, allowing agents to adapt to some degree to changes in the environment. However, because each synapse in these approaches can learn a very specific learning rule, the ability to generalize to very different situations is likely reduced. We hypothesize that limiting the number of Hebbian learning rules th…
▽ More
Hebbian meta-learning has recently shown promise to solve hard reinforcement learning problems, allowing agents to adapt to some degree to changes in the environment. However, because each synapse in these approaches can learn a very specific learning rule, the ability to generalize to very different situations is likely reduced. We hypothesize that limiting the number of Hebbian learning rules through a "genomic bottleneck" can act as a regularizer leading to better generalization across changes to the environment. We test this hypothesis by decoupling the number of Hebbian learning rules from the number of synapses and systematically varying the number of Hebbian learning rules. The results in this paper suggest that simultaneously learning the Hebbian learning rules and their assignment to synapses is a difficult optimization problem, leading to poor performance in the environments tested. However, parallel research to ours finds that it is indeed possible to reduce the number of learning rules by clustering similar rules together. How to best implement a "genomic bottleneck" algorithm is thus an important research direction that warrants further investigation.
△ Less
Submitted 23 June, 2021; v1 submitted 13 November, 2020;
originally announced November 2020.
-
Finding Game Levels with the Right Difficulty in a Few Trials through Intelligent Trial-and-Error
Authors:
Miguel González-Duque,
Rasmus Berg Palm,
David Ha,
Sebastian Risi
Abstract:
Methods for dynamic difficulty adjustment allow games to be tailored to particular players to maximize their engagement. However, current methods often only modify a limited set of game features such as the difficulty of the opponents, or the availability of resources. Other approaches, such as experience-driven Procedural Content Generation (PCG), can generate complete levels with desired propert…
▽ More
Methods for dynamic difficulty adjustment allow games to be tailored to particular players to maximize their engagement. However, current methods often only modify a limited set of game features such as the difficulty of the opponents, or the availability of resources. Other approaches, such as experience-driven Procedural Content Generation (PCG), can generate complete levels with desired properties such as levels that are neither too hard nor too easy, but require many iterations. This paper presents a method that can generate and search for complete levels with a specific target difficulty in only a few trials. This advance is enabled by through an Intelligent Trial-and-Error algorithm, originally developed to allow robots to adapt quickly. Our algorithm first creates a large variety of different levels that vary across predefined dimensions such as leniency or map coverage. The performance of an AI playing agent on these maps gives a proxy for how difficult the level would be for another AI agent (e.g. one that employs Monte Carlo Tree Search instead of Greedy Tree Search); using this information, a Bayesian Optimization procedure is deployed, updating the difficulty of the prior map to reflect the ability of the agent. The approach can reliably find levels with a specific target difficulty for a variety of planning agents in only a few trials, while maintaining an understanding of their skill landscape.
△ Less
Submitted 25 June, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Attend, Copy, Parse -- End-to-end information extraction from documents
Authors:
Rasmus Berg Palm,
Florian Laws,
Ole Winther
Abstract:
Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available, at no additional cost. Unfortunately, state-of-the-art word classification methods for information extract…
▽ More
Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available, at no additional cost. Unfortunately, state-of-the-art word classification methods for information extraction cannot use this data, instead requiring word-level labels which are expensive to create and consequently not available for many real life tasks. In this paper we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on end-to-end data, bypassing the need for word-level labels. We evaluate the proposed architecture on a large diverse set of invoices, and outperform a state-of-the-art production system based on word classification. We believe our proposed architecture can be used on many real life information extraction tasks where word classification cannot be used due to a lack of the required word-level labels.
△ Less
Submitted 23 April, 2021; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Recurrent Relational Networks
Authors:
Rasmus Berg Palm,
Ulrich Paquet,
Ole Winther
Abstract:
This paper is concerned with learning to solve tasks that require a chain of interdependent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational network, a general purpose module that operates on a graph representation…
▽ More
This paper is concerned with learning to solve tasks that require a chain of interdependent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational network, a general purpose module that operates on a graph representation of objects. As a generalization of Santoro et al. [2017]'s relational network, it can augment any neural network model with the capacity to do many-step relational reasoning. We achieve state of the art results on the bAbI textual question-answering dataset with the recurrent relational network, consistently solving 20/20 tasks. As bAbI is not particularly challenging from a relational reasoning point of view, we introduce Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the Pretty-CLEVR set-up, we can vary the question to control for the number of relational reasoning steps that are required to obtain the answer. Using Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational and recurrent relational networks. Finally, we show how recurrent relational networks can learn to solve Sudoku puzzles from supervised training data, a challenging task requiring upwards of 64 steps of relational reasoning. We achieve state-of-the-art results amongst comparable methods by solving 96.6% of the hardest Sudoku puzzles.
△ Less
Submitted 29 November, 2018; v1 submitted 21 November, 2017;
originally announced November 2017.
-
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
Authors:
Rasmus Berg Palm,
Ole Winther,
Florian Laws
Abstract:
We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic…
▽ More
We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.
△ Less
Submitted 24 August, 2017;
originally announced August 2017.
-
End-to-End Information Extraction without Token-Level Supervision
Authors:
Rasmus Berg Palm,
Dirk Hovy,
Florian Laws,
Ole Winther
Abstract:
Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw…
▽ More
Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels. We propose an E2E model based on pointer networks, which can be trained directly on pairs of raw input and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction without the need for token-level labels. This opens up new possibilities, as for many tasks currently addressed by human extractors, raw input and output data are available, but not token-level labels.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.