Search | arXiv e-print repository

Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach

Authors: Mohamed Hassouna, Clara Holzhüter, Malte Lehna, Matthijs de Jong, Jan Viebahn, Bernhard Sick, Christoph Scholz

Abstract: The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion man… ▽ More The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion management. In this work, we contribute to this research by introducing a novel Imitation Learning (IL) approach that leverages soft labels derived from simulated topological action outcomes, thereby capturing multiple viable actions per state. Unlike traditional IL methods that rely on hard labels to enforce a single optimal action, our method constructs soft labels that capture the effectiveness of actions that prove suitable in resolving grid congestion. To further enhance decision-making, we integrate Graph Neural Networks (GNNs) to encode the structural properties of power grids, ensuring that the topology-aware representations contribute to better agent performance. Our approach significantly outperforms its hard-label counterparts as well as state-of-the-art Deep Reinforcement Learning (DRL) baseline agents. Most notably, it achieves a 17% better performance compared to the greedy expert agent from which the imitation targets were derived. △ Less

Submitted 18 June, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

Comments: Accepted at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML) - Applied Data Science Track

arXiv:2501.07186 [pdf, other]

Generalizable Graph Neural Networks for Robust Power Grid Topology Control

Authors: Matthijs de Jong, Jan Viebahn, Yuliya Shapovalova

Abstract: The energy transition necessitates new congestion management methods. One such method is controlling the grid topology with machine learning (ML). This approach has gained popularity following the Learning to Run a Power Network (L2RPN) competitions. Graph neural networks (GNNs) are a class of ML models that reflect graph structure in their computation, which makes them suitable for power grid mod… ▽ More The energy transition necessitates new congestion management methods. One such method is controlling the grid topology with machine learning (ML). This approach has gained popularity following the Learning to Run a Power Network (L2RPN) competitions. Graph neural networks (GNNs) are a class of ML models that reflect graph structure in their computation, which makes them suitable for power grid modeling. Various GNN approaches for topology control have thus been proposed. We propose the first GNN model for grid topology control that uses only GNN layers. Additionally, we identify the busbar information asymmetry problem that the popular homogeneous graph representation suffers from, and propose a heterogeneous graph representation to resolve it. We train both homogeneous and heterogeneous GNNs and fully connected neural networks (FCNN) baselines on an imitation learning task. We evaluate the models according to their classification accuracy and grid operation ability. We find that the heterogeneous GNNs perform best on in-distribution networks, followed by the FCNNs, and lastly, the homogeneous GNNs. We also find that both GNN types generalize better to out-of-distribution networks than FCNNs. △ Less

Submitted 18 February, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

arXiv:2407.19865 [pdf, other]

Imitation Learning for Intra-Day Power Grid Operation through Topology Actions

Authors: Matthijs de Jong, Jan Viebahn, Yuliya Shapovalova

Abstract: Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actio… ▽ More Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actions. In particular, we consider two rule-based expert agents: a greedy agent and a N-1 agent. While the latter is more computationally expensive since it takes N-1 safety considerations into account, it exhibits a much higher operational performance. We train a fully-connected neural network (FCNN) on expert state-action pairs and evaluate it in two ways. First, we find that classification accuracy is limited despite extensive hyperparameter tuning, due to class imbalance and class overlap. Second, as a power system agent, the FCNN performs only slightly worse than expert agents. Furthermore, hybrid agents, which incorporate minimal additional simulations, match expert agents' performance with significantly lower computational cost. Consequently, imitation learning shows promise for developing fast, high-performing power grid agents, motivating its further exploration in future L2RPN studies. △ Less

Submitted 18 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

Comments: To be presented at the Machine Learning for Sustainable Power Systems 2024 workshop and to be published in the corresponding Springer Communications in Computer and Information Science proceedings

arXiv:2407.07896 [pdf, other]

Pentagonal Photonic Crystal Mirrors: Scalable Lightsails with Enhanced Acceleration via Neural Topology Optimization

Authors: L. Norder, S. Yin, M. J. de Jong, F. Stallone, H. Aydogmus, P. M. Sberna, M. A. Bessa, R. A. Norte

Abstract: The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engine… ▽ More The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engineering. Unlike the microchip payload, which must be minimized in every dimension, such lightsails need meter-scale dimensions with nanoscale thickness and billions of nanoscale holes to enhance reflectivity and reduce mass. Our study employs neural topology optimization, revealing a novel pentagonal lattice-based photonic crystal (PhC) reflector. The optimized designs shorten acceleration times, therefore lowering launch costs significantly. Crucially, these designs also enable lightsail material fabrication with orders-of-magnitude reduction in costs. We have fabricated a 60 x 60 mm$^2$, 200nm thick, single-layer reflector perforated with over a billion nanoscale features; the highest aspect-ratio nanophotonic element to date. We achieve this with nearly 9,000 times cost reduction per m$^2$. Starshot lightsails will have several stringent requirements but will ultimately be driven by costs to build at scale. Here we highlight challenges and possible solutions in developing lightsail materials - showcasing the potential of scaling nanophotonics for cost-effective next-generation space exploration. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2308.14903 [pdf, other]

MEMORY-VQ: Compression for Tractable Internet-Scale Memory

Authors: Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie

Abstract: Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce stor… ▽ More Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2306.10231 [pdf, other]

GLIMMER: generalized late-interaction memory reranker

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie

Abstract: Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder. We propose GLIMMER, which improves on this approach th… ▽ More Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder. We propose GLIMMER, which improves on this approach through 1) exploiting free access to the powerful memory representations by applying a shallow reranker on top of memory to drastically improve retrieval quality at low cost, and 2) incorporating multi-task training to learn a general and higher quality memory and live encoder. GLIMMER achieves strong gains in performance at faster speeds compared to LUMEN and FiD on the KILT benchmark of knowledge-intensive tasks. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.13245 [pdf, other]

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Authors: Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai

Abstract: Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and… ▽ More Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads. We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA. △ Less

Submitted 23 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted at EMNLP 2023. Added to related work

arXiv:2303.09752 [pdf, other]

CoLT5: Faster Long-Range Transformers with Conditional Computation

Authors: Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

Abstract: Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this in… ▽ More Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length. △ Less

Submitted 23 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted at EMNLP 2023

arXiv:2301.10448 [pdf, other]

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs… ▽ More Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size. △ Less

Submitted 2 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2212.08153 [pdf, other]

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

Authors: Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while… ▽ More Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large. △ Less

Submitted 2 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: ACL Findings 2023

arXiv:2209.14899 [pdf, other]

Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing

Authors: Yury Zemlyanskiy, Michiel de Jong, Joshua Ainslie, Panupong Pasupat, Peter Shaw, Linlu Qiu, Sumit Sanghai, Fei Sha

Abstract: A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of que… ▽ More A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandRfirst generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: To appear in the proceedings of COLING 2022

arXiv:2207.00630 [pdf, other]

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

Authors: Wenhu Chen, William W. Cohen, Michiel De Jong, Nitish Gupta, Alessandro Presta, Pat Verga, John Wieting

Abstract: In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queri… ▽ More In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving "multi-hop" inferences. However, unlike a traditional KB, this information store is well-aligned with common user information needs. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2204.04581 [pdf, other]

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

Authors: Wenhu Chen, Pat Verga, Michiel de Jong, John Wieting, William Cohen

Abstract: Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at… ▽ More Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems. △ Less

Submitted 23 January, 2023; v1 submitted 9 April, 2022; originally announced April 2022.

Comments: Accepted by EACL 2023

arXiv:2110.06176 [pdf, other]

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, William Cohen

Abstract: Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a tab… ▽ More Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a table of dense vector representations of every entity mention in a corpus. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information within a single Transformer model. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks HoVer and FEVER and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we demonstrate that the model can generalize to new unseen entities by updating the memory without retraining. △ Less

Submitted 19 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:2108.04809 [pdf, other]

doi 10.1002/adma.202106248

Spiderweb nanomechanical resonators via Bayesian optimization: inspired by nature and guided by machine learning

Authors: Dongil Shin, Andrea Cupertino, Matthijs H. J. de Jong, Peter G. Steeneken, Miguel A. Bessa, Richard A. Norte

Abstract: From ultra-sensitive detectors of fundamental forces to quantum networks and sensors, mechanical resonators are enabling next-generation technologies to operate in room temperature environments. Currently, silicon nitride nanoresonators stand as a leading microchip platform in these advances by allowing for mechanical resonators whose motion is remarkably isolated from ambient thermal noise. Howev… ▽ More From ultra-sensitive detectors of fundamental forces to quantum networks and sensors, mechanical resonators are enabling next-generation technologies to operate in room temperature environments. Currently, silicon nitride nanoresonators stand as a leading microchip platform in these advances by allowing for mechanical resonators whose motion is remarkably isolated from ambient thermal noise. However, to date, human intuition has remained the driving force behind design processes. Here, inspired by nature and guided by machine learning, a spiderweb nanomechanical resonator is developed that exhibits vibration modes which are isolated from ambient thermal environments via a novel "torsional soft-clamping" mechanism discovered by the data-driven optimization algorithm. This bio-inspired resonator is then fabricated; experimentally confirming a new paradigm in mechanics with quality factors above 1 billion in room temperature environments. In contrast to other state-of-the-art resonators, this milestone is achieved with a compact design which does not require sub-micron lithographic features or complex phononic bandgaps, making it significantly easier and cheaper to manufacture at large scales. Here we demonstrate the ability of machine learning to work in tandem with human intuition to augment creative possibilities and uncover new strategies in computing and nanotechnology. △ Less

Submitted 13 December, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Journal ref: Shin, D., Cupertino, A., de, M. H. J., Steeneken, P. G., Bessa, M. A., Norte, R. A., Spiderweb Nanomechanical Resonators via Bayesian Optimization: Inspired by Nature and Guided by Machine Learning. Adv. Mater. 2021, 2106248

arXiv:2107.01979 [pdf, other]

Machine Learning for Fraud Detection in E-Commerce: A Research Agenda

Authors: Niek Tax, Kees Jan de Vries, Mathijs de Jong, Nikoleta Dosoula, Bram van den Akker, Jon Smith, Olivier Thuong, Lucas Bernardi

Abstract: Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational m… ▽ More Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational model of the anti-fraud departments in e-commerce organizations. We derive 6 research topics and 12 practical challenges for fraud detection from this operational model. We summarize the state of the literature for each research topic, discuss potential solutions to the practical challenges, and identify 22 open research challenges. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted and to appear in the proceedings of the KDD 2021 co-located workshop: the 2nd International Workshop on Deployable Machine Learning for Security Defense (MLHat)

arXiv:2106.01607 [pdf, other]

Grounding Complex Navigational Instructions Using Scene Graphs

Authors: Michiel de Jong, Satyapriya Krishna, Anuva Agarwal

Abstract: Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use… ▽ More Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in \citet{gatedattention} to train an agent to carry out these more complex language instructions. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: arXiv admin note: text overlap with arXiv:1706.07230 by other authors

arXiv:2105.04241 [pdf, other]

ReadTwice: Reading Very Large Documents with Memories

Authors: Yury Zemlyanskiy, Joshua Ainslie, Michiel de Jong, Philip Pham, Ilya Eckstein, Fei Sha

Abstract: Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwice, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizi… ▽ More Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwice, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizing each segment into a memory table to be used in a second read of the text. We show that the method outperforms models of comparable size on several question answering (QA) datasets and sets a new state of the art on the challenging NarrativeQA task, with questions about entire books. Source code and pre-trained checkpoints for ReadTwice can be found at https://goo.gle/research-readtwice. △ Less

Submitted 11 May, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: To appear in the proceedings of NAACL 2021

arXiv:1907.02597 [pdf, ps, other]

Multi-dimensional interpolations in C++

Authors: Maarten de Jong

Abstract: A C++ software design is presented that can be used to interpolate data in any number of dimensions. The design is based on a combination of templates of functional collections of elements and so-called type lists. The design allows for different search methodologies and interpolation techniques in each dimension. It is also possible to expand and reduce the number of dimensions, to interpolate co… ▽ More A C++ software design is presented that can be used to interpolate data in any number of dimensions. The design is based on a combination of templates of functional collections of elements and so-called type lists. The design allows for different search methodologies and interpolation techniques in each dimension. It is also possible to expand and reduce the number of dimensions, to interpolate composite data types and to produce on-the-fly additional values such as derivatives of the interpolating function. △ Less

Submitted 3 July, 2019; originally announced July 2019.

arXiv:1906.06805 [pdf, other]

Neural Theorem Provers Do Not Learn Rules Without Exploration

Authors: Michiel de Jong, Fei Sha

Abstract: Neural symbolic processing aims to combine the generalization of logical learning approaches and the performance of neural networks. The Neural Theorem Proving (NTP) model by Rocktaschel et al (2017) learns embeddings for concepts and performs logical unification. While NTP is promising and effective in predicting facts accurately, we have little knowledge how well it can extract true relationship… ▽ More Neural symbolic processing aims to combine the generalization of logical learning approaches and the performance of neural networks. The Neural Theorem Proving (NTP) model by Rocktaschel et al (2017) learns embeddings for concepts and performs logical unification. While NTP is promising and effective in predicting facts accurately, we have little knowledge how well it can extract true relationship among data. To this end, we create synthetic logical datasets with injected relationships, which can be generated on-the-fly, to test neural-based relation learning algorithms including NTP. We show that it has difficulty recovering relationships in all but the simplest settings. Critical analysis and diagnostic experiments suggest that the optimization algorithm suffers from poor local minima due to its greedy winner-takes-all strategy in identifying the most informative structure (proof path) to pursue. We alter the NTP algorithm to increase exploration, which sharply improves performance. We argue and demonstate that it is insightful to benchmark with synthetic data with ground-truth relationships, for both evaluating models and revealing algorithmic issues. △ Less

Submitted 16 June, 2019; originally announced June 2019.

arXiv:1812.02253 [pdf, other]

Weighted Global Normalization for Multiple Choice Reading Comprehension over Long Documents

Authors: Aditi Chaudhary, Bhargavi Paranjape, Michiel de Jong

Abstract: Motivated by recent evidence pointing out the fragility of high-performing span prediction models, we direct our attention to multiple choice reading comprehension. In particular, this work introduces a novel method for improving answer selection on long documents through weighted global normalization of predictions over portions of the documents. We show that applying our method to a span predict… ▽ More Motivated by recent evidence pointing out the fragility of high-performing span prediction models, we direct our attention to multiple choice reading comprehension. In particular, this work introduces a novel method for improving answer selection on long documents through weighted global normalization of predictions over portions of the documents. We show that applying our method to a span prediction model adapted for answer selection helps model performance on long summaries from NarrativeQA, a challenging reading comprehension dataset with an answer selection task, and we strongly improve on the task baseline performance by +36.2 Mean Reciprocal Rank. △ Less

Submitted 25 November, 2021; v1 submitted 5 December, 2018; originally announced December 2018.

arXiv:1805.07885 [pdf]

doi 10.3390/en11051277

The Governance of Risks in Ridesharing: A Revelatory Case from Singapore

Authors: Yanwei Li, Araz Taeihagh, Martin de Jong

Abstract: Recently we have witnessed the worldwide adoption of many different types of innovative technologies, such as crowdsourcing, ridesharing, open and big data, aiming at delivering public services more efficiently and effectively. Among them, ridesharing has received substantial attention from decision-makers around the world. Because of the multitude of currently understood or potentially unknown ri… ▽ More Recently we have witnessed the worldwide adoption of many different types of innovative technologies, such as crowdsourcing, ridesharing, open and big data, aiming at delivering public services more efficiently and effectively. Among them, ridesharing has received substantial attention from decision-makers around the world. Because of the multitude of currently understood or potentially unknown risks associated with ridesharing (unemployment, insurance, information privacy, and environmental risk), governments in different countries apply different strategies to address such risks. Some governments prohibit the adoption of ridesharing altogether, while other governments promote it. In this article, we address the question of how risks involved in ridesharing are governed over time. We present an in-depth single case study on Singapore and examine how the Singaporean government has addressed risks in ridesharing over time. The Singaporean government has a strong ambition to become an innovation hub, and many innovative technologies have been adopted and promoted to that end. At the same time, decision-makers in Singapore are reputed for their proactive style of social governance. The example of Singapore can be regarded as a revelatory case study, helping us further to explore governance practices in other countries. Keywords: risk; ridesharing; transport; governance; innovative technologies; case study; Singapore △ Less

Submitted 21 May, 2018; originally announced May 2018.

Journal ref: Energies 11, no. 5: 1277 (2018)

arXiv:1708.06293 [pdf, ps, other]

Neville's algorithm revisited

Authors: M. de Jong

Abstract: Neville's algorithm is known to provide an efficient and numerically stable solution for polynomial interpolations. In this paper, an extension of this algorithm is presented which includes the derivatives of the interpolating polynomial. Neville's algorithm is known to provide an efficient and numerically stable solution for polynomial interpolations. In this paper, an extension of this algorithm is presented which includes the derivatives of the interpolating polynomial. △ Less

Submitted 16 August, 2017; originally announced August 2017.

Comments: 3 pages

Showing 1–23 of 23 results for author: de Jong, M