-
Learning Topology Actions for Power Grid Control: A Graph-Based Soft-Label Imitation Learning Approach
Authors:
Mohamed Hassouna,
Clara Holzhüter,
Malte Lehna,
Matthijs de Jong,
Jan Viebahn,
Bernhard Sick,
Christoph Scholz
Abstract:
The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion man…
▽ More
The rising proportion of renewable energy in the electricity mix introduces significant operational challenges for power grid operators. Effective power grid management demands adaptive decision-making strategies capable of handling dynamic conditions. With the increase in complexity, more and more Deep Learning (DL) approaches have been proposed to find suitable grid topologies for congestion management. In this work, we contribute to this research by introducing a novel Imitation Learning (IL) approach that leverages soft labels derived from simulated topological action outcomes, thereby capturing multiple viable actions per state. Unlike traditional IL methods that rely on hard labels to enforce a single optimal action, our method constructs soft labels that capture the effectiveness of actions that prove suitable in resolving grid congestion. To further enhance decision-making, we integrate Graph Neural Networks (GNNs) to encode the structural properties of power grids, ensuring that the topology-aware representations contribute to better agent performance. Our approach significantly outperforms its hard-label counterparts as well as state-of-the-art Deep Reinforcement Learning (DRL) baseline agents. Most notably, it achieves a 17% better performance compared to the greedy expert agent from which the imitation targets were derived.
△ Less
Submitted 18 June, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Generalizable Graph Neural Networks for Robust Power Grid Topology Control
Authors:
Matthijs de Jong,
Jan Viebahn,
Yuliya Shapovalova
Abstract:
The energy transition necessitates new congestion management methods. One such method is controlling the grid topology with machine learning (ML). This approach has gained popularity following the Learning to Run a Power Network (L2RPN) competitions. Graph neural networks (GNNs) are a class of ML models that reflect graph structure in their computation, which makes them suitable for power grid mod…
▽ More
The energy transition necessitates new congestion management methods. One such method is controlling the grid topology with machine learning (ML). This approach has gained popularity following the Learning to Run a Power Network (L2RPN) competitions. Graph neural networks (GNNs) are a class of ML models that reflect graph structure in their computation, which makes them suitable for power grid modeling. Various GNN approaches for topology control have thus been proposed. We propose the first GNN model for grid topology control that uses only GNN layers. Additionally, we identify the busbar information asymmetry problem that the popular homogeneous graph representation suffers from, and propose a heterogeneous graph representation to resolve it. We train both homogeneous and heterogeneous GNNs and fully connected neural networks (FCNN) baselines on an imitation learning task. We evaluate the models according to their classification accuracy and grid operation ability. We find that the heterogeneous GNNs perform best on in-distribution networks, followed by the FCNNs, and lastly, the homogeneous GNNs. We also find that both GNN types generalize better to out-of-distribution networks than FCNNs.
△ Less
Submitted 18 February, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Imitation Learning for Intra-Day Power Grid Operation through Topology Actions
Authors:
Matthijs de Jong,
Jan Viebahn,
Yuliya Shapovalova
Abstract:
Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actio…
▽ More
Power grid operation is becoming increasingly complex due to the increase in generation of renewable energy. The recent series of Learning To Run a Power Network (L2RPN) competitions have encouraged the use of artificial agents to assist human dispatchers in operating power grids. In this paper we study the performance of imitation learning for day-ahead power grid operation through topology actions. In particular, we consider two rule-based expert agents: a greedy agent and a N-1 agent. While the latter is more computationally expensive since it takes N-1 safety considerations into account, it exhibits a much higher operational performance. We train a fully-connected neural network (FCNN) on expert state-action pairs and evaluate it in two ways. First, we find that classification accuracy is limited despite extensive hyperparameter tuning, due to class imbalance and class overlap. Second, as a power system agent, the FCNN performs only slightly worse than expert agents. Furthermore, hybrid agents, which incorporate minimal additional simulations, match expert agents' performance with significantly lower computational cost. Consequently, imitation learning shows promise for developing fast, high-performing power grid agents, motivating its further exploration in future L2RPN studies.
△ Less
Submitted 18 August, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
Pentagonal Photonic Crystal Mirrors: Scalable Lightsails with Enhanced Acceleration via Neural Topology Optimization
Authors:
L. Norder,
S. Yin,
M. J. de Jong,
F. Stallone,
H. Aydogmus,
P. M. Sberna,
M. A. Bessa,
R. A. Norte
Abstract:
The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engine…
▽ More
The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engineering. Unlike the microchip payload, which must be minimized in every dimension, such lightsails need meter-scale dimensions with nanoscale thickness and billions of nanoscale holes to enhance reflectivity and reduce mass. Our study employs neural topology optimization, revealing a novel pentagonal lattice-based photonic crystal (PhC) reflector. The optimized designs shorten acceleration times, therefore lowering launch costs significantly. Crucially, these designs also enable lightsail material fabrication with orders-of-magnitude reduction in costs. We have fabricated a 60 x 60 mm$^2$, 200nm thick, single-layer reflector perforated with over a billion nanoscale features; the highest aspect-ratio nanophotonic element to date. We achieve this with nearly 9,000 times cost reduction per m$^2$. Starshot lightsails will have several stringent requirements but will ultimately be driven by costs to build at scale. Here we highlight challenges and possible solutions in developing lightsail materials - showcasing the potential of scaling nanophotonics for cost-effective next-generation space exploration.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
Authors:
Yury Zemlyanskiy,
Michiel de Jong,
Luke Vilnis,
Santiago Ontañón,
William W. Cohen,
Sumit Sanghai,
Joshua Ainslie
Abstract:
Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations.
We propose MEMORY-VQ, a new method to reduce stor…
▽ More
Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations.
We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
GLIMMER: generalized late-interaction memory reranker
Authors:
Michiel de Jong,
Yury Zemlyanskiy,
Nicholas FitzGerald,
Sumit Sanghai,
William W. Cohen,
Joshua Ainslie
Abstract:
Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder.
We propose GLIMMER, which improves on this approach th…
▽ More
Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder.
We propose GLIMMER, which improves on this approach through 1) exploiting free access to the powerful memory representations by applying a shallow reranker on top of memory to drastically improve retrieval quality at low cost, and 2) incorporating multi-task training to learn a general and higher quality memory and live encoder. GLIMMER achieves strong gains in performance at faster speeds compared to LUMEN and FiD on the KILT benchmark of knowledge-intensive tasks.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Authors:
Joshua Ainslie,
James Lee-Thorp,
Michiel de Jong,
Yury Zemlyanskiy,
Federico Lebrón,
Sumit Sanghai
Abstract:
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and…
▽ More
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads. We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA.
△ Less
Submitted 23 December, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
CoLT5: Faster Long-Range Transformers with Conditional Computation
Authors:
Joshua Ainslie,
Tao Lei,
Michiel de Jong,
Santiago Ontañón,
Siddhartha Brahma,
Yury Zemlyanskiy,
David Uthus,
Mandy Guo,
James Lee-Thorp,
Yi Tay,
Yun-Hsuan Sung,
Sumit Sanghai
Abstract:
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this in…
▽ More
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.
△ Less
Submitted 23 October, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute
Authors:
Michiel de Jong,
Yury Zemlyanskiy,
Nicholas FitzGerald,
Joshua Ainslie,
Sumit Sanghai,
Fei Sha,
William Cohen
Abstract:
Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs…
▽ More
Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size.
△ Less
Submitted 2 June, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Authors:
Michiel de Jong,
Yury Zemlyanskiy,
Joshua Ainslie,
Nicholas FitzGerald,
Sumit Sanghai,
Fei Sha,
William Cohen
Abstract:
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while…
▽ More
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.
△ Less
Submitted 2 June, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing
Authors:
Yury Zemlyanskiy,
Michiel de Jong,
Joshua Ainslie,
Panupong Pasupat,
Peter Shaw,
Linlu Qiu,
Sumit Sanghai,
Fei Sha
Abstract:
A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of que…
▽ More
A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandRfirst generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
QA Is the New KR: Question-Answer Pairs as Knowledge Bases
Authors:
Wenhu Chen,
William W. Cohen,
Michiel De Jong,
Nitish Gupta,
Alessandro Presta,
Pat Verga,
John Wieting
Abstract:
In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queri…
▽ More
In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving "multi-hop" inferences. However, unlike a traditional KB, this information store is well-aligned with common user information needs.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Authors:
Wenhu Chen,
Pat Verga,
Michiel de Jong,
John Wieting,
William Cohen
Abstract:
Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at…
▽ More
Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems.
△ Less
Submitted 23 January, 2023; v1 submitted 9 April, 2022;
originally announced April 2022.
-
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention
Authors:
Michiel de Jong,
Yury Zemlyanskiy,
Nicholas FitzGerald,
Fei Sha,
William Cohen
Abstract:
Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a tab…
▽ More
Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a table of dense vector representations of every entity mention in a corpus. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information within a single Transformer model. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks HoVer and FEVER and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we demonstrate that the model can generalize to new unseen entities by updating the memory without retraining.
△ Less
Submitted 19 April, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Spiderweb nanomechanical resonators via Bayesian optimization: inspired by nature and guided by machine learning
Authors:
Dongil Shin,
Andrea Cupertino,
Matthijs H. J. de Jong,
Peter G. Steeneken,
Miguel A. Bessa,
Richard A. Norte
Abstract:
From ultra-sensitive detectors of fundamental forces to quantum networks and sensors, mechanical resonators are enabling next-generation technologies to operate in room temperature environments. Currently, silicon nitride nanoresonators stand as a leading microchip platform in these advances by allowing for mechanical resonators whose motion is remarkably isolated from ambient thermal noise. Howev…
▽ More
From ultra-sensitive detectors of fundamental forces to quantum networks and sensors, mechanical resonators are enabling next-generation technologies to operate in room temperature environments. Currently, silicon nitride nanoresonators stand as a leading microchip platform in these advances by allowing for mechanical resonators whose motion is remarkably isolated from ambient thermal noise. However, to date, human intuition has remained the driving force behind design processes. Here, inspired by nature and guided by machine learning, a spiderweb nanomechanical resonator is developed that exhibits vibration modes which are isolated from ambient thermal environments via a novel "torsional soft-clamping" mechanism discovered by the data-driven optimization algorithm. This bio-inspired resonator is then fabricated; experimentally confirming a new paradigm in mechanics with quality factors above 1 billion in room temperature environments. In contrast to other state-of-the-art resonators, this milestone is achieved with a compact design which does not require sub-micron lithographic features or complex phononic bandgaps, making it significantly easier and cheaper to manufacture at large scales. Here we demonstrate the ability of machine learning to work in tandem with human intuition to augment creative possibilities and uncover new strategies in computing and nanotechnology.
△ Less
Submitted 13 December, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Machine Learning for Fraud Detection in E-Commerce: A Research Agenda
Authors:
Niek Tax,
Kees Jan de Vries,
Mathijs de Jong,
Nikoleta Dosoula,
Bram van den Akker,
Jon Smith,
Olivier Thuong,
Lucas Bernardi
Abstract:
Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational m…
▽ More
Fraud detection and prevention play an important part in ensuring the sustained operation of any e-commerce business. Machine learning (ML) often plays an important role in these anti-fraud operations, but the organizational context in which these ML models operate cannot be ignored. In this paper, we take an organization-centric view on the topic of fraud detection by formulating an operational model of the anti-fraud departments in e-commerce organizations. We derive 6 research topics and 12 practical challenges for fraud detection from this operational model. We summarize the state of the literature for each research topic, discuss potential solutions to the practical challenges, and identify 22 open research challenges.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Grounding Complex Navigational Instructions Using Scene Graphs
Authors:
Michiel de Jong,
Satyapriya Krishna,
Anuva Agarwal
Abstract:
Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use…
▽ More
Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in \citet{gatedattention} to train an agent to carry out these more complex language instructions.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
ReadTwice: Reading Very Large Documents with Memories
Authors:
Yury Zemlyanskiy,
Joshua Ainslie,
Michiel de Jong,
Philip Pham,
Ilya Eckstein,
Fei Sha
Abstract:
Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwice, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizi…
▽ More
Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwice, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizing each segment into a memory table to be used in a second read of the text. We show that the method outperforms models of comparable size on several question answering (QA) datasets and sets a new state of the art on the challenging NarrativeQA task, with questions about entire books. Source code and pre-trained checkpoints for ReadTwice can be found at https://goo.gle/research-readtwice.
△ Less
Submitted 11 May, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Multi-dimensional interpolations in C++
Authors:
Maarten de Jong
Abstract:
A C++ software design is presented that can be used to interpolate data in any number of dimensions. The design is based on a combination of templates of functional collections of elements and so-called type lists. The design allows for different search methodologies and interpolation techniques in each dimension. It is also possible to expand and reduce the number of dimensions, to interpolate co…
▽ More
A C++ software design is presented that can be used to interpolate data in any number of dimensions. The design is based on a combination of templates of functional collections of elements and so-called type lists. The design allows for different search methodologies and interpolation techniques in each dimension. It is also possible to expand and reduce the number of dimensions, to interpolate composite data types and to produce on-the-fly additional values such as derivatives of the interpolating function.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
Neural Theorem Provers Do Not Learn Rules Without Exploration
Authors:
Michiel de Jong,
Fei Sha
Abstract:
Neural symbolic processing aims to combine the generalization of logical learning approaches and the performance of neural networks. The Neural Theorem Proving (NTP) model by Rocktaschel et al (2017) learns embeddings for concepts and performs logical unification. While NTP is promising and effective in predicting facts accurately, we have little knowledge how well it can extract true relationship…
▽ More
Neural symbolic processing aims to combine the generalization of logical learning approaches and the performance of neural networks. The Neural Theorem Proving (NTP) model by Rocktaschel et al (2017) learns embeddings for concepts and performs logical unification. While NTP is promising and effective in predicting facts accurately, we have little knowledge how well it can extract true relationship among data. To this end, we create synthetic logical datasets with injected relationships, which can be generated on-the-fly, to test neural-based relation learning algorithms including NTP. We show that it has difficulty recovering relationships in all but the simplest settings. Critical analysis and diagnostic experiments suggest that the optimization algorithm suffers from poor local minima due to its greedy winner-takes-all strategy in identifying the most informative structure (proof path) to pursue. We alter the NTP algorithm to increase exploration, which sharply improves performance. We argue and demonstate that it is insightful to benchmark with synthetic data with ground-truth relationships, for both evaluating models and revealing algorithmic issues.
△ Less
Submitted 16 June, 2019;
originally announced June 2019.
-
Weighted Global Normalization for Multiple Choice Reading Comprehension over Long Documents
Authors:
Aditi Chaudhary,
Bhargavi Paranjape,
Michiel de Jong
Abstract:
Motivated by recent evidence pointing out the fragility of high-performing span prediction models, we direct our attention to multiple choice reading comprehension. In particular, this work introduces a novel method for improving answer selection on long documents through weighted global normalization of predictions over portions of the documents. We show that applying our method to a span predict…
▽ More
Motivated by recent evidence pointing out the fragility of high-performing span prediction models, we direct our attention to multiple choice reading comprehension. In particular, this work introduces a novel method for improving answer selection on long documents through weighted global normalization of predictions over portions of the documents. We show that applying our method to a span prediction model adapted for answer selection helps model performance on long summaries from NarrativeQA, a challenging reading comprehension dataset with an answer selection task, and we strongly improve on the task baseline performance by +36.2 Mean Reciprocal Rank.
△ Less
Submitted 25 November, 2021; v1 submitted 5 December, 2018;
originally announced December 2018.
-
The Governance of Risks in Ridesharing: A Revelatory Case from Singapore
Authors:
Yanwei Li,
Araz Taeihagh,
Martin de Jong
Abstract:
Recently we have witnessed the worldwide adoption of many different types of innovative technologies, such as crowdsourcing, ridesharing, open and big data, aiming at delivering public services more efficiently and effectively. Among them, ridesharing has received substantial attention from decision-makers around the world. Because of the multitude of currently understood or potentially unknown ri…
▽ More
Recently we have witnessed the worldwide adoption of many different types of innovative technologies, such as crowdsourcing, ridesharing, open and big data, aiming at delivering public services more efficiently and effectively. Among them, ridesharing has received substantial attention from decision-makers around the world. Because of the multitude of currently understood or potentially unknown risks associated with ridesharing (unemployment, insurance, information privacy, and environmental risk), governments in different countries apply different strategies to address such risks. Some governments prohibit the adoption of ridesharing altogether, while other governments promote it. In this article, we address the question of how risks involved in ridesharing are governed over time. We present an in-depth single case study on Singapore and examine how the Singaporean government has addressed risks in ridesharing over time. The Singaporean government has a strong ambition to become an innovation hub, and many innovative technologies have been adopted and promoted to that end. At the same time, decision-makers in Singapore are reputed for their proactive style of social governance. The example of Singapore can be regarded as a revelatory case study, helping us further to explore governance practices in other countries. Keywords: risk; ridesharing; transport; governance; innovative technologies; case study; Singapore
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
Neville's algorithm revisited
Authors:
M. de Jong
Abstract:
Neville's algorithm is known to provide an efficient and numerically stable solution for polynomial interpolations. In this paper, an extension of this algorithm is presented which includes the derivatives of the interpolating polynomial.
Neville's algorithm is known to provide an efficient and numerically stable solution for polynomial interpolations. In this paper, an extension of this algorithm is presented which includes the derivatives of the interpolating polynomial.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.