Search | arXiv e-print repository

Data Scaling Laws for Radiology Foundation Models

Authors: Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

Abstract: Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two v… ▽ More Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data. △ Less

Submitted 16 September, 2025; originally announced September 2025.

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2505.03214 [pdf, other]

DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

Authors: Qiang Sun, Sirui Li, Tingting Bi, Du Huynh, Mark Reynolds, Yuanyi Luo, Wei Liu

Abstract: Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive docum… ▽ More Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive document annotation platform, designed to address the challenge of extracting structured information from domain-specific, image-based document collections. Our spiral design establishes an iterative cycle in which human annotations train models that progressively require less manual intervention. DocSpiral integrates document format normalization, comprehensive annotation interfaces, evaluation metrics dashboard, and API endpoints for the development of AI / ML models into a unified workflow. Experiments demonstrate that our framework reduces annotation time by at least 41\% while showing consistent performance gains across three iterations during model training. By making this annotation platform freely accessible, we aim to lower barriers to AI/ML models development in document processing, facilitating the adoption of large language models in image-based, document-intensive fields such as geoscience and healthcare. The system is freely available at: https://app.ai4wa.com. The demonstration video is available: https://app.ai4wa.com/docs/docspiral/demo. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2501.04343 [pdf, other]

TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs

Authors: Qiang Sun, Sirui Li, Du Huynh, Mark Reynolds, Wei Liu

Abstract: Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with \textbf{TimelineKGQA}, a universal temporal QA generator applicable to any TKGs. The… ▽ More Question answering over temporal knowledge graphs (TKGs) is crucial for understanding evolving facts and relationships, yet its development is hindered by limited datasets and difficulties in generating custom QA pairs. We propose a novel categorization framework based on timeline-context relationships, along with \textbf{TimelineKGQA}, a universal temporal QA generator applicable to any TKGs. The code is available at: \url{https://github.com/PascalSun/TimelineKGQA} as an open source Python package. △ Less

Submitted 8 January, 2025; originally announced January 2025.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2302.10390 [pdf, other]

DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Authors: Ke Yu, Li Sun, Junxiang Chen, Max Reynolds, Tigmanshu Chaudhary, Kayhan Batmanghelich

Abstract: Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoder… ▽ More Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D medical imaging to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale computer tomography (CT) datasets of lung images show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework. △ Less

Submitted 15 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Added some recent references

arXiv:2208.13464 [pdf, other]

Price of MEV: Towards a Game Theoretical Approach to MEV

Authors: Bruno Mazorra, Michael Reynolds, Vanesa Daza

Abstract: Maximal (also miner) extractable value, or MEV, usually refers to the value that privileged players can extract by strategically ordering, censoring, and placing transactions in a blockchain. Each blockchain network, which we refer to as a domain, has its own consensus, ordering, and block-creation mechanisms, which gives rise to different optimal strategies to extract MEV. The strategic behaviour… ▽ More Maximal (also miner) extractable value, or MEV, usually refers to the value that privileged players can extract by strategically ordering, censoring, and placing transactions in a blockchain. Each blockchain network, which we refer to as a domain, has its own consensus, ordering, and block-creation mechanisms, which gives rise to different optimal strategies to extract MEV. The strategic behaviour of rational players, known as searchers, lead to MEV games that have different impacts and externalities in each domain. Several ordering mechanisms, which determine the inclusion and position of transactions in a block, have been considered to construct alternative games to organise MEV extraction, and minimize negative externalities; examples include sealed bid auctions, first input first output, and private priority gas auctions. However, to date, no sufficiently formal and abstract definition of MEV games have been made. In this paper, we take a step toward the formalization of MEV games and compare different ordering mechanisms and their externalities. In particular, we attempt to formalize games that arise from common knowledge MEV opportunities, such as arbitrage and sandwich attacks. In defining these games, we utilise a theoretical framework that provides groundwork for several important roles and concepts, such as the searcher, sequencer, domain, and bundle. We also introduce the price of MEV as the price of anarchy of MEV games, a measure that provides formal comparison between different ordering mechanisms. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2206.13737 [pdf, other]

Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation

Authors: Yanwu Xu, Shaoan Xie, Maxwell Reynolds, Matthew Ragoza, Mingming Gong, Kayhan Batmanghelich

Abstract: An organ segmentation method that can generalize to unseen contrasts and scanner settings can significantly reduce the need for retraining of deep learning models. Domain Generalization (DG) aims to achieve this goal. However, most DG methods for segmentation require training data from multiple domains during training. We propose a novel adversarial domain generalization method for organ segmentat… ▽ More An organ segmentation method that can generalize to unseen contrasts and scanner settings can significantly reduce the need for retraining of deep learning models. Domain Generalization (DG) aims to achieve this goal. However, most DG methods for segmentation require training data from multiple domains during training. We propose a novel adversarial domain generalization method for organ segmentation trained on data from a \emph{single} domain. We synthesize the new domains via learning an adversarial domain synthesizer (ADS) and presume that the synthetic domains cover a large enough area of plausible distributions so that unseen domains can be interpolated from synthetic domains. We propose a mutual information regularizer to enforce the semantic consistency between images from the synthetic domains, which can be estimated by patch-level contrastive learning. We evaluate our method for various organ segmentation for unseen modalities, scanning protocols, and scanner sites. △ Less

Submitted 29 June, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: MICCAI2022 accpted

arXiv:2204.14198 [pdf, other]

Flamingo: a Visual Language Model for Few-Shot Learning

Authors: Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals , et al. (2 additional authors not shown)

Abstract: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i… ▽ More Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data. △ Less

Submitted 15 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: 54 pages. In Proceedings of Neural Information Processing Systems (NeurIPS) 2022

arXiv:2204.00997 [pdf, other]

Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using DeepONets

Authors: Subhayan De, Matthew Reynolds, Malik Hassanaly, Ryan N. King, Alireza Doostan

Abstract: Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, know… ▽ More Recent advances in modeling large-scale complex physical systems have shifted research focuses towards data-driven techniques. However, generating datasets by simulating complex systems can require significant computational resources. Similarly, acquiring experimental datasets can prove difficult as well. For these systems, often computationally inexpensive, but in general inaccurate, models, known as the low-fidelity models, are available. In this paper, we propose a bi-fidelity modeling approach for complex physical systems, where we model the discrepancy between the true system's response and low-fidelity response in the presence of a small training dataset from the true system's response using a deep operator network (DeepONet), a neural network architecture suitable for approximating nonlinear operators. We apply the approach to model systems that have parametric uncertainty and are partially unknown. Three numerical examples are used to show the efficacy of the proposed approach to model uncertain and partially unknown complex physical systems. △ Less

Submitted 18 August, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: 20 pages, 15 figures

arXiv:2102.02926 [pdf, other]

Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents

Authors: Jane X. Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charlie Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, Francis Song, Gavin Buttimore, David P. Reichert, Neil Rabinowitz, Loic Matthey, Demis Hassabis, Alexander Lerchner, Matthew Botvinick

Abstract: There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled anal… ▽ More There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, emphasizing transparency and potential for in-depth analysis as well as structural richness. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories. △ Less

Submitted 20 October, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: Published in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021

arXiv:2012.08508 [pdf, other]

Attention over learned object embeddings enables complex visual reasoning

Authors: David Ding, Felix Hill, Adam Santoro, Malcolm Reynolds, Matt Botvinick

Abstract: Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted a… ▽ More Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds. △ Less

Submitted 26 October, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 22 pages, 5 figures

arXiv:2011.01758 [pdf, other]

Representation Matters: Improving Perception and Exploration for Robotics

Authors: Markus Wulfmeier, Arunkumar Byravan, Tim Hertweck, Irina Higgins, Ankush Gupta, Tejas Kulkarni, Malcolm Reynolds, Denis Teplyashin, Roland Hafner, Thomas Lampe, Martin Riedmiller

Abstract: Projecting high-dimensional environment observations into lower-dimensional structured representations can considerably improve data-efficiency for reinforcement learning in domains with limited data such as robotics. Can a single generally useful representation be found? In order to answer this question, it is important to understand how the representation will be used by the agent and what prope… ▽ More Projecting high-dimensional environment observations into lower-dimensional structured representations can considerably improve data-efficiency for reinforcement learning in domains with limited data such as robotics. Can a single generally useful representation be found? In order to answer this question, it is important to understand how the representation will be used by the agent and what properties such a 'good' representation should have. In this paper we systematically evaluate a number of common learnt and hand-engineered representations in the context of three robotics tasks: lifting, stacking and pushing of 3D blocks. The representations are evaluated in two use-cases: as input to the agent, or as a source of auxiliary tasks. Furthermore, the value of each representation is evaluated in terms of three properties: dimensionality, observability and disentanglement. We can significantly improve performance in both use-cases and demonstrate that some representations can perform commensurate to simulator states as agent inputs. Finally, our results challenge common intuitions by demonstrating that: 1) dimensionality strongly matters for task generation, but is negligible for inputs, 2) observability of task-relevant aspects mostly affects the input representation use-case, and 3) disentanglement leads to better auxiliary tasks, but has only limited benefits for input representations. This work serves as a step towards a more systematic understanding of what makes a 'good' representation for control in robotics, enabling practitioners to make more informed choices for developing new learned or hand-engineered representations. △ Less

Submitted 21 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Published at ICRA 2021

arXiv:2010.05507 [pdf, other]

Scene Gated Social Graph: Pedestrian Trajectory Prediction Based on Dynamic Social Graphs and Scene Constraints

Authors: Hao Xue, Du Q. Huynh, Mark Reynolds

Abstract: Pedestrian trajectory prediction is valuable for understanding human motion behaviors and it is challenging because of the social influence from other pedestrians, the scene constraints and the multimodal possibilities of predicted trajectories. Most existing methods only focus on two of the above three key elements. In order to jointly consider all these elements, we propose a novel trajectory pr… ▽ More Pedestrian trajectory prediction is valuable for understanding human motion behaviors and it is challenging because of the social influence from other pedestrians, the scene constraints and the multimodal possibilities of predicted trajectories. Most existing methods only focus on two of the above three key elements. In order to jointly consider all these elements, we propose a novel trajectory prediction method named Scene Gated Social Graph (SGSG). In the proposed SGSG, dynamic graphs are used to describe the social relationship among pedestrians. The social and scene influences are taken into account through the scene gated social graph features which combine the encoded social graph features and semantic scene features. In addition, a VAE module is incorporated to learn the scene gated social feature and sample latent variables for generating multiple trajectories that are socially and environmentally acceptable. We compare our SGSG against twenty state-of-the-art pedestrian trajectory prediction methods and the results show that the proposed method achieves superior performance on two widely used trajectory prediction benchmarks. △ Less

Submitted 12 October, 2020; originally announced October 2020.

arXiv:2009.00373 [pdf, other]

doi 10.1109/TKDE.2022.3151095

Top-k Socio-Spatial Co-engaged Location Selection for Social Users

Authors: Nur Al Hasan Haldar, Jianxin Li, Mohammed Eunus Ali, Taotao Cai, Timos Sellis, Mark Reynolds

Abstract: With the advent of location-based social networks, users can tag their daily activities in different locations through check-ins. These check-in locations signify user preferences for various socio-spatial activities and can be used to build their profiles to improve the quality of services in some applications such as recommendation systems, advertising, and group formation. To support such appli… ▽ More With the advent of location-based social networks, users can tag their daily activities in different locations through check-ins. These check-in locations signify user preferences for various socio-spatial activities and can be used to build their profiles to improve the quality of services in some applications such as recommendation systems, advertising, and group formation. To support such applications, in this paper, we formulate a new problem of identifying top-k Socio-Spatial co-engaged Location Selection (SSLS) for users in a social graph, that selects the best set of k locations from a large number of location candidates relating to the user and her friends. The selected locations should be (i) spatially and socially relevant to the user and her friends, and (ii) diversified in both spatially and socially to maximize the coverage of friends in the spatial space. This problem has been proved as NP-hard. To address the challenging problem, we first develop a branch-and-bound based Exact solution by designing some pruning strategies based on the derived bounds on diversity. To make the solution scalable for large datasets, we also develop an approximate solution by deriving the relaxed bounds and advanced termination rules to filter out insignificant intermediate results. To further accelerate the efficiency, we present one fast exact approach and a meta-heuristic approximate approach by avoiding the repeated computation of diversity at the running time. Finally, we have performed extensive experiments to evaluate the performance of our proposed models and algorithms against the adapted existing methods using four real-world large datasets. △ Less

Submitted 14 September, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

arXiv:2007.08973 [pdf, other]

AlignNet: Unsupervised Entity Alignment

Authors: Antonia Creswell, Kyriacos Nikiforou, Oriol Vinyals, Andre Saraiva, Rishabh Kabra, Loic Matthey, Chris Burgess, Malcolm Reynolds, Richard Tanburn, Marta Garnelo, Murray Shanahan

Abstract: Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step… ▽ More Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step correspond (or align) to those at a later time-step. The alignment (or correspondence) problem has impeded progress towards using object representations in downstream tasks. In this paper we take steps towards solving the alignment problem, presenting the AlignNet, an unsupervised alignment module. △ Less

Submitted 21 July, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

arXiv:2004.09760 [pdf, other]

Take a NAP: Non-Autoregressive Prediction for Pedestrian Trajectories

Authors: Hao Xue, Du. Q. Huynh, Mark Reynolds

Abstract: Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This me… ▽ More Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) nature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance. △ Less

Submitted 21 April, 2020; originally announced April 2020.

arXiv:2002.04495 [pdf, other]

On transfer learning of neural networks using bi-fidelity data for uncertainty propagation

Authors: Subhayan De, Jolene Britton, Matthew Reynolds, Ryan Skinner, Kenneth Jansen, Alireza Doostan

Abstract: Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for mapping inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly par… ▽ More Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for mapping inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly parameterized construction of neural networks, especially deep neural networks, accurate training often requires large amounts of simulation data that may not be available in the case of computationally expensive systems. In this paper, to alleviate this issue for uncertainty propagation, we explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models. We explore two strategies for coupling these two datasets during the training procedure, namely, the standard transfer learning and the bi-fidelity weighted learning. In the former approach, a neural network model mapping the inputs to the outputs of interest is trained based on the low-fidelity data. The high-fidelity data is then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model. In the latter approach, the entire low-fidelity network parameters are updated using data generated via a Gaussian process model trained with a small high-fidelity dataset. The parameter updates are performed via a variant of stochastic gradient descent with learning rates given by the Gaussian process model. Using three numerical examples, we illustrate the utility of these bi-fidelity transfer learning methods where we focus on accuracy improvement achieved by transfer learning over standard training approaches. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1906.11883 [pdf, other]

Unsupervised Learning of Object Keypoints for Perception and Control

Authors: Tejas Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, Volodymyr Mnih

Abstract: The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for d… ▽ More The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods. Furthermore, consistent long-term tracking enables two notable results in control domains -- (1) using the keypoint co-ordinates and corresponding image features as inputs enables highly sample-efficient reinforcement learning; (2) learning to explore by controlling keypoint locations drastically reduces the search space, enabling deep exploration (leading to states unreachable through random action exploration) without any extrinsic rewards. △ Less

Submitted 19 November, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: In NeurIPS 2019. Code https://github.com/deepmind/deepmind-research/tree/master/transporter

arXiv:1812.11741 [pdf, ps, other]

A modal aleatoric calculus for probabilistic reasoning: extended version

Authors: Tim French, Andrew Gozzard, Mark Reynolds

Abstract: We consider multi-agent systems where agents actions and beliefs are determined aleatorically, or "by the throw of dice". This system consists of possible worlds that assign distributions to independent random variables, and agents who assign probabilities to these possible worlds. We present a novel syntax and semantics for such system, and show that they generalise Modal Logic. We also give a so… ▽ More We consider multi-agent systems where agents actions and beliefs are determined aleatorically, or "by the throw of dice". This system consists of possible worlds that assign distributions to independent random variables, and agents who assign probabilities to these possible worlds. We present a novel syntax and semantics for such system, and show that they generalise Modal Logic. We also give a sound and complete calculus for reasoning in the base semantics, and a sound calculus for the full modal semantics, that we conjecture to be complete. Finally we discuss some application to reasoning about game playing agents. △ Less

Submitted 31 December, 2018; originally announced December 2018.

Comments: Long version of paper accepted to appear at the 2019 Indian Conference on Logic and Applictaions

arXiv:1810.13129 [pdf, ps, other]

Efficient LTL Decentralized Monitoring Framework Using Formula Simplification Table

Authors: Omar Al-Bataineh, David Rosenblum, Mark Reynolds

Abstract: This paper presents a new technique for optimizing formal analysis of propositional logic formulas and Linear Temporal Logic (LTL) formulas, namely the formula simplification table. A formula simplification table is a mathematical table that shows all possible simplifications of the formula under different truth assignments of its variables. The advantages of constructing a simplification table of… ▽ More This paper presents a new technique for optimizing formal analysis of propositional logic formulas and Linear Temporal Logic (LTL) formulas, namely the formula simplification table. A formula simplification table is a mathematical table that shows all possible simplifications of the formula under different truth assignments of its variables. The advantages of constructing a simplification table of a formula are two-fold. First, it can be used to compute the logical influence weight of each variable in the formula, which is a metric that shows the importance of the variable in affecting the outcome of the formula. Second, it can be used to identify variables that have the highest logical influences on the outcome of the formula. %The simplification table can be used to optimize %existing solutions for several interesting %LTL verification problems. We demonstrate the effectiveness of formula simplification table in the context of software verification by developing efficient framework to the well-known decentralized LTL monitoring problem. △ Less

Submitted 31 October, 2018; originally announced October 2018.

arXiv:1809.03101 [pdf, other]

doi 10.4204/EPTCS.277.13

One-Pass and Tree-Shaped Tableau Systems for TPTL and TPTLb+Past

Authors: Luca Geatti, Nicola Gigante, Angelo Montanari, Mark Reynolds

Abstract: In this paper, we propose a novel one-pass and tree-shaped tableau method for Timed Propositional Temporal Logic and for a bounded variant of its extension with past operators. Timed Propositional Temporal Logic (TPTL) is a real-time temporal logic, with an EXPSPACE-complete satisfiability problem, which has been successfully applied to the verification of real-time systems. In contrast to LTL, ad… ▽ More In this paper, we propose a novel one-pass and tree-shaped tableau method for Timed Propositional Temporal Logic and for a bounded variant of its extension with past operators. Timed Propositional Temporal Logic (TPTL) is a real-time temporal logic, with an EXPSPACE-complete satisfiability problem, which has been successfully applied to the verification of real-time systems. In contrast to LTL, adding past operators to TPTL makes the satisfiability problem for the resulting logic (TPTL+P) non-elementary. In this paper, we devise a one-pass and tree-shaped tableau for both TPTL and bounded TPTL+P (TPTLb+P), a syntactic restriction introduced to encode timeline-based planning problems, which recovers the EXPSPACE-complete complexity. The tableau systems for TPTL and TPTLb+P are presented in a unified way, being very similar to each other, providing a common skeleton that is then specialised to each logic. In doing that, we characterise the semantics of TPTLb+P in terms of a purely syntactic fragment of TPTL+P, giving a translation that embeds the former into the latter. Soundness and completeness of the system are proved fully. In particular, we give a greatly simplified model-theoretic completeness proof, which sidesteps the complex combinatorial argument used by known proofs for the one-pass and tree-shaped tableau systems for LTL and LTL+P. △ Less

Submitted 9 September, 2018; originally announced September 2018.

Comments: In Proceedings GandALF 2018, arXiv:1809.02416

Journal ref: EPTCS 277, 2018, pp. 176-190

arXiv:1807.04837 [pdf, other]

doi 10.4230/LIPIcs.TIME.2018.13

A game-theoretic approach to timeline-based planning with uncertainty

Authors: Nicola Gigante, Angelo Montanari, Marta Cialdea Mayer, Andrea Orlandini, Mark Reynolds

Abstract: In timeline-based planning, domains are described as sets of independent, but interacting, components, whose behaviour over time (the set of timelines) is governed by a set of temporal constraints. A distinguishing feature of timeline-based planning systems is the ability to integrate planning with execution by synthesising control strategies for flexible plans. However, flexible plans can only re… ▽ More In timeline-based planning, domains are described as sets of independent, but interacting, components, whose behaviour over time (the set of timelines) is governed by a set of temporal constraints. A distinguishing feature of timeline-based planning systems is the ability to integrate planning with execution by synthesising control strategies for flexible plans. However, flexible plans can only represent temporal uncertainty, while more complex forms of nondeterminism are needed to deal with a wider range of realistic problems. In this paper, we propose a novel game-theoretic approach to timeline-based planning problems, generalising the state of the art while uniformly handling temporal uncertainty and nondeterminism. We define a general concept of timeline-based game and we show that the notion of winning strategy for these games is strictly more general than that of control strategy for dynamically controllable flexible plans. Moreover, we show that the problem of establishing the existence of such winning strategies is decidable using a doubly exponential amount of space. △ Less

Submitted 27 May, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

Comments: Published in Proceedings of TIME 2018 (https://time2018.ipipan.waw.pl)

arXiv:1803.10760 [pdf, other]

Unsupervised Predictive Memory in a Goal-Directed Agent

Authors: Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Abstract: Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l… ▽ More Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences. △ Less

Submitted 28 March, 2018; originally announced March 2018.

arXiv:1709.02101 [pdf, other]

doi 10.4204/EPTCS.256.12

A Parallel Linear Temporal Logic Tableau

Authors: John C. McCabe-Dansted, Mark Reynolds

Abstract: For many applications, we are unable to take full advantage of the potential massive parallelisation offered by supercomputers or cloud computing because it is too hard to work out how to divide up the computation task between processors in such a way to minimise the need for communication. However, a recently developed branch-independent tableaux for the common LTL temporal logic should intuiti… ▽ More For many applications, we are unable to take full advantage of the potential massive parallelisation offered by supercomputers or cloud computing because it is too hard to work out how to divide up the computation task between processors in such a way to minimise the need for communication. However, a recently developed branch-independent tableaux for the common LTL temporal logic should intuitively be easy to parallelise as each branch can be developed independently. Here we describe a simple technique for partitioning such a tableau such that each partition can be processed independently without need for interprocess communication. We investigate the extent to which this technique improves the performance of the LTL tableau on standard benchmarks and random formulas. △ Less

Submitted 7 September, 2017; originally announced September 2017.

Comments: In Proceedings GandALF 2017, arXiv:1709.01761

ACM Class: F.4.1

Journal ref: EPTCS 256, 2017, pp. 166-179

arXiv:1703.08956

Trace Expressiveness of Timed and Probabilistic Automata

Authors: Valentin Bura, Tim French, Mark Reynolds

Abstract: Automata expressiveness is an essential feature in understanding which of the formalisms available should be chosen for modelling a particular problem. Probabilistic and stochastic automata are suitable for modelling systems exhibiting probabilistic behavior and their expressiveness has been studied relative to non-probabilistic transition systems and Markov chains. In this paper, we consider prev… ▽ More Automata expressiveness is an essential feature in understanding which of the formalisms available should be chosen for modelling a particular problem. Probabilistic and stochastic automata are suitable for modelling systems exhibiting probabilistic behavior and their expressiveness has been studied relative to non-probabilistic transition systems and Markov chains. In this paper, we consider previous formalisms of Timed, Probabilistic and Stochastic Timed Automata, we present our new model of Timed Automata with Polynomial Delay, we introduce a measure of expressiveness for automata we call trace expressiveness and we characterize the expressiveness of these models relative to each other under this new measure. △ Less

Submitted 17 March, 2019; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: possibly incorrect, or unsubstantiated results and claims

arXiv:1703.08936

Timed Automata with Polynomial Delay and their Expressiveness

Authors: Valentin Bura, Tim French, Mark Reynolds

Abstract: We consider previous models of Timed, Probabilistic and Stochastic Timed Automata, we introduce our model of Timed Automata with Polynomial Delay and we characterize the expressiveness of these models relative to each other. We consider previous models of Timed, Probabilistic and Stochastic Timed Automata, we introduce our model of Timed Automata with Polynomial Delay and we characterize the expressiveness of these models relative to each other. △ Less

Submitted 17 March, 2019; v1 submitted 27 March, 2017; originally announced March 2017.

Comments: possibly incorrect, or unsubstantiated results and claims

arXiv:1610.09795 [pdf, ps, other]

Finding Minimum and Maximum Termination Time of Timed Automata Models with Cyclic Behaviour

Authors: Omar Al-Bataineh, Mark Reynolds, Tim French

Abstract: The paper presents a novel algorithm for computing best and worst case execution times (BCET/WCET) of timed automata models with cyclic behaviour. The algorithms can work on any arbitrary diagonal-free TA and can handle more cases than previously existing algorithms for BCET/WCET computations, as it can handle cycles in TA and decide whether they lead to an infinite WCET. We show soundness of the… ▽ More The paper presents a novel algorithm for computing best and worst case execution times (BCET/WCET) of timed automata models with cyclic behaviour. The algorithms can work on any arbitrary diagonal-free TA and can handle more cases than previously existing algorithms for BCET/WCET computations, as it can handle cycles in TA and decide whether they lead to an infinite WCET. We show soundness of the proposed algorithm and study its complexity. To our knowledge, this is the first model checking algorithm that addresses comprehensively the BCET/WCET problem of systems with cyclic behaviour. Behrmann et al. provide an algorithm for computing the minimum cost/time of reaching a goal state in priced timed automata (PTA). The algorithm has been implemented in the well-known model checking tool UPPAAL to compute the minimum time for termination of an automaton. However, we show that in certain circumstances, when infinite cycles exist, the algorithm implemented in UPPAAL may not terminate, and we provide examples which UPPAAL fails to verify. △ Less

Submitted 31 October, 2016; originally announced October 2016.

arXiv:1610.09607 [pdf, ps, other]

Termination of Monotone Programs

Authors: Omar Al-Bataineh, Xie Xiaofei, Mark Reynolds

Abstract: We present an efficient approach to prove termination of monotone programs with integer variables, an expressive class of loops that is often encountered in computer programs. Our approach is based on a lightweight static analysis method and takes advantage of simple %nice properties of monotone functions. Our preliminary implementation %beats shows that our tool has an advantage over existing too… ▽ More We present an efficient approach to prove termination of monotone programs with integer variables, an expressive class of loops that is often encountered in computer programs. Our approach is based on a lightweight static analysis method and takes advantage of simple %nice properties of monotone functions. Our preliminary implementation %beats shows that our tool has an advantage over existing tools and can prove termination for a high percentage of loops for a class of benchmarks. △ Less

Submitted 4 February, 2017; v1 submitted 30 October, 2016; originally announced October 2016.

arXiv:1609.08080 [pdf, other]

Swipe Mosaics from Video

Authors: Malcolm Reynolds, Tom S. F. Haines, Gabriel J. Brostow

Abstract: A panoramic image mosaic is an attractive visualization for viewing many overlapping photos, but its images must be both captured and processed correctly to produce an acceptable composite. We propose Swipe Mosaics, an interactive visualization that places the individual video frames on a 2D planar map that represents the layout of the physical scene. Compared to traditional panoramic mosaics, our… ▽ More A panoramic image mosaic is an attractive visualization for viewing many overlapping photos, but its images must be both captured and processed correctly to produce an acceptable composite. We propose Swipe Mosaics, an interactive visualization that places the individual video frames on a 2D planar map that represents the layout of the physical scene. Compared to traditional panoramic mosaics, our capture is easier because the user can both translate the camera center and film moving subjects. Processing and display degrade gracefully if the footage lacks distinct, overlapping, non-repeating texture. Our proposed visual odometry algorithm produces a distribution over (x,y) translations for image pairs. Inferring a distribution of possible camera motions allows us to better cope with parallax, lack of texture, dynamic scenes, and other phenomena that hurt deterministic reconstruction techniques. Robustness is obtained by training on synthetic scenes with known camera motions. We show that Swipe Mosaics are easy to generate, support a wide range of difficult scenes, and are useful for documenting a scene for closer inspection. △ Less

Submitted 26 September, 2016; originally announced September 2016.

arXiv:1609.04102 [pdf, other]

doi 10.4204/EPTCS.226.20

A New Rule for LTL Tableaux

Authors: Mark Reynolds

Abstract: Propositional linear time temporal logic (LTL) is the standard temporal logic for computing applications and many reasoning techniques and tools have been developed for it. Tableaux for deciding satisfiability have existed since the 1980s. However, the tableaux for this logic do not look like traditional tree-shaped tableau systems and their processing is often quite complicated. In this paper,… ▽ More Propositional linear time temporal logic (LTL) is the standard temporal logic for computing applications and many reasoning techniques and tools have been developed for it. Tableaux for deciding satisfiability have existed since the 1980s. However, the tableaux for this logic do not look like traditional tree-shaped tableau systems and their processing is often quite complicated. In this paper, we introduce a novel style of tableau rule which supports a new simple traditional-style tree-shaped tableau for LTL. We prove that it is sound and complete. As well as being simple to understand, to introduce to students and to use, it is also simple to implement and is competitive against state of the art systems. It is particularly suitable for parallel implementations. △ Less

Submitted 13 September, 2016; originally announced September 2016.

Comments: In Proceedings GandALF 2016, arXiv:1609.03648

Journal ref: EPTCS 226, 2016, pp. 287-301

arXiv:1606.02580 [pdf, other]

Convolution by Evolution: Differentiable Pattern Producing Networks

Authors: Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau, Max Jaderberg, Marc Lanctot, Daan Wierstra

Abstract: In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN. Unlike a standard CPPN, the topology of a DPPN is evolved but the weights are learned. A Lamarckian algorithm, that combines evolution and learning, produces DPPNs to reconstruct an image. Our main result is that DPPNs can be evolved/trained to compress the weights of a denoising aut… ▽ More In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN. Unlike a standard CPPN, the topology of a DPPN is evolved but the weights are learned. A Lamarckian algorithm, that combines evolution and learning, produces DPPNs to reconstruct an image. Our main result is that DPPNs can be evolved/trained to compress the weights of a denoising autoencoder from 157684 to roughly 200 parameters, while achieving a reconstruction accuracy comparable to a fully connected network with more than two orders of magnitude more parameters. The regularization ability of the DPPN allows it to rediscover (approximate) convolutional network architectures embedded within a fully connected architecture. Such convolutional architectures are the current state of the art for many computer vision applications, so it is satisfying that DPPNs are capable of discovering this structure rather than having to build it in by design. DPPNs exhibit better generalization when tested on the Omniglot dataset after being trained on MNIST, than directly encoded fully connected autoencoders. DPPNs are therefore a new framework for integrating learning and evolution. △ Less

Submitted 8 June, 2016; originally announced June 2016.

arXiv:1604.03962 [pdf, other]

A traditional tree-style tableau for LTL

Authors: Mark Reynolds

Abstract: Propositional linear time temporal logic (LTL) is the standard temporal logic for computing applications and many reasoning techniques and tools have been developed for it. Tableaux for deciding satisfiability have existed since the 1980s. However, the tableaux for this logic do not look like traditional tree-shaped tableau systems and their processing is often quite complicated. We present a new… ▽ More Propositional linear time temporal logic (LTL) is the standard temporal logic for computing applications and many reasoning techniques and tools have been developed for it. Tableaux for deciding satisfiability have existed since the 1980s. However, the tableaux for this logic do not look like traditional tree-shaped tableau systems and their processing is often quite complicated. We present a new simple traditional-style tree-shaped tableau for LTL and prove that it is sound and complete. As well as being simple to understand, to introduce to students and to use manually, it also seems simple to implement and promises to be competitive in its automation. It is particularly suitable for parallel implementations. △ Less

Submitted 13 April, 2016; originally announced April 2016.

Comments: 33 pages

arXiv:1408.5965 [pdf, other]

doi 10.4204/EPTCS.161.16

Hourglass Automata

Authors: Yuki Osada, Tim French, Mark Reynolds, Harry Smallbone

Abstract: In this paper, we define the class of hourglass automata, which are timed automata with bounded clocks that can be made to progress backwards as well as forwards at a constant rate. We then introduce a new clock update for timed automata that allows hourglass automata to be expressed. This allows us to show that language emptiness remains decidable with this update when the number of clocks is two… ▽ More In this paper, we define the class of hourglass automata, which are timed automata with bounded clocks that can be made to progress backwards as well as forwards at a constant rate. We then introduce a new clock update for timed automata that allows hourglass automata to be expressed. This allows us to show that language emptiness remains decidable with this update when the number of clocks is two or less. This is done by showing that we can construct a finite untimed graph using clock regions from any timed automaton that use this new update. △ Less

Submitted 25 August, 2014; originally announced August 2014.

Comments: In Proceedings GandALF 2014, arXiv:1408.5560

Journal ref: EPTCS 161, 2014, pp. 175-188

arXiv:1309.4416 [pdf, other]

Specifying Robustness

Authors: John C. McCabe-Dansted, Tim French, Mark Reynolds, Sophie Pinchinat

Abstract: This paper proposes a new logic RoCTL* to model robustness in concurrent systems. RoCTL* extends CTL* with the addition of Obligatory and Robustly operators, which quantify over failure-free paths and paths with one more failure respectively. We present a number of examples of problems to which RoCTL* can be applied. The core result of this paper is to show that RoCTL* is expressively equivalent t… ▽ More This paper proposes a new logic RoCTL* to model robustness in concurrent systems. RoCTL* extends CTL* with the addition of Obligatory and Robustly operators, which quantify over failure-free paths and paths with one more failure respectively. We present a number of examples of problems to which RoCTL* can be applied. The core result of this paper is to show that RoCTL* is expressively equivalent to CTL* but is non-elementarily more succinct. We present a translation from RoCTL* into CTL* that preserves truth but may result in non-elementary growth in the length of the translated formula as each nested Robustly operator may result in an extra exponential blowup. However, we show that this translation is optimal in the sense that any equivalence preserving translation will require an extra exponential growth per nested Robustly. We also compare RoCTL* to Quantified CTL* (QCTL*) and hybrid logics. △ Less

Submitted 13 September, 2013; originally announced September 2013.

arXiv:1307.4468 [pdf, ps, other]

doi 10.4204/EPTCS.119.7

A Faster Tableau for CTL*

Authors: Mark Reynolds

Abstract: There have been several recent suggestions for tableau systems for deciding satisfiability in the practically important branching time temporal logic known as CTL*. In this paper we present a streamlined and more traditional tableau approach built upon the author's earlier theoretical work. Soundness and completeness results are proved. A prototype implementation demonstrates the significantly i… ▽ More There have been several recent suggestions for tableau systems for deciding satisfiability in the practically important branching time temporal logic known as CTL*. In this paper we present a streamlined and more traditional tableau approach built upon the author's earlier theoretical work. Soundness and completeness results are proved. A prototype implementation demonstrates the significantly improved performance of the new approach on a range of test formulas. We also see that it compares favourably to state of the art, game and automata based decision procedures. △ Less

Submitted 16 July, 2013; originally announced July 2013.

Comments: In Proceedings GandALF 2013, arXiv:1307.4162

ACM Class: F.3.1

Journal ref: EPTCS 119, 2013, pp. 50-63

arXiv:1201.3416 [pdf, ps, other]

Verifying Real-time Commit Protocols Using Dense-time Model Checking Technology

Authors: Omar I. Al-Bataineh, Mark Reynolds

Abstract: The timed-based automata model, introduced by Alur and Dill, provides a useful formalism for describing real-time systems. Over the last two decades, several dense-time model checking tools have been developed based on that model. The paper considers the verification of real-time distributed commit protocols using dense-time model checking technology. More precisely, we model and verify the well-k… ▽ More The timed-based automata model, introduced by Alur and Dill, provides a useful formalism for describing real-time systems. Over the last two decades, several dense-time model checking tools have been developed based on that model. The paper considers the verification of real-time distributed commit protocols using dense-time model checking technology. More precisely, we model and verify the well-known timed two phase commit protocol in three different state-of-the-art real-time model checkers: UPPAAL, Rabbit, and RED, and compare the results. △ Less

Submitted 10 May, 2018; v1 submitted 16 January, 2012; originally announced January 2012.

arXiv:cs/9910012 [pdf, ps, other]

The Complexity of Temporal Logic over the Reals

Authors: M. Reynolds

Abstract: It is shown that the decision problem for the temporal logic with until and since connectives over real-numbers time is PSPACE-complete. It is shown that the decision problem for the temporal logic with until and since connectives over real-numbers time is PSPACE-complete. △ Less

Submitted 13 October, 1999; originally announced October 1999.

ACM Class: F4.1; F2.2

Showing 1–39 of 39 results for author: Reynolds, M