Search | arXiv e-print repository

Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs

Authors: Ryan Synk, Monte Hoover, John Kirchenbauer, Neel Jain, Alex Stein, Manli Shu, Josue Melendez Sanchez, Ramani Duraiswami, Tom Goldstein

Abstract: There is growing demand for performing inference with hundreds of thousands of input tokens on trained transformer models. Inference at this extreme scale demands significant computational resources, hindering the application of transformers at long contexts on commodity (i.e not data center scale) hardware. To address the inference time costs associated with running self-attention based transform… ▽ More There is growing demand for performing inference with hundreds of thousands of input tokens on trained transformer models. Inference at this extreme scale demands significant computational resources, hindering the application of transformers at long contexts on commodity (i.e not data center scale) hardware. To address the inference time costs associated with running self-attention based transformer language models on long contexts and enable their adoption on widely available hardware, we propose a tunable mechanism that reduces the cost of the forward pass by attending to only the most relevant tokens at every generation step using a top-k selection mechanism. We showcase the efficiency gains afforded by our method by performing inference on context windows up to 1M tokens using approximately 16GB of GPU RAM. Our experiments reveal that models are capable of handling the sparsity induced by the reduced number of keys and values. By attending to less than 2% of input tokens, we achieve over 95% of model performance on common benchmarks (RULER, AlpacaEval, and Open LLM Leaderboard). △ Less

Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: 9 pages, 9 figures, 2 tables in main body

arXiv:2401.06518 [pdf, other]

doi 10.1109/OJITS.2024.3521449

Transitional Grid Maps: Joint Modeling of Static and Dynamic Occupancy

Authors: José Manuel Gaspar Sánchez, Leonard Bruns, Jana Tumova, Patric Jensfelt, Martin Törngren

Abstract: Autonomous agents rely on sensor data to construct representations of their environments, essential for predicting future events and planning their actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in highly dynamic environments. This work proposes a probabilistic framework to jointly infer which parts of an environm… ▽ More Autonomous agents rely on sensor data to construct representations of their environments, essential for predicting future events and planning their actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in highly dynamic environments. This work proposes a probabilistic framework to jointly infer which parts of an environment are statically and which parts are dynamically occupied. We formulate the problem as a Bayesian network and introduce minimal assumptions that significantly reduce the complexity of the problem. Based on those, we derive Transitional Grid Maps (TGMs), an efficient analytical solution. Using real data, we demonstrate how this approach produces better maps by keeping track of both static and dynamic elements and, as a side effect, can help improve existing SLAM algorithms. △ Less

Submitted 4 November, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2309.15501 [pdf, other]

Overcoming the Fear of the Dark: Occlusion-Aware Model-Predictive Planning for Automated Vehicles Using Risk Fields

Authors: Chris van der Ploeg, Truls Nyberg, José Manuel Gaspar Sánchez, Emilia Silvas, Nathan van de Wouw

Abstract: As vehicle automation advances, motion planning algorithms face escalating challenges in achieving safe and efficient navigation. Existing Advanced Driver Assistance Systems (ADAS) primarily focus on basic tasks, leaving unexpected scenarios for human intervention, which can be error-prone. Motion planning approaches for higher levels of automation in the state-of-the-art are primarily oriented to… ▽ More As vehicle automation advances, motion planning algorithms face escalating challenges in achieving safe and efficient navigation. Existing Advanced Driver Assistance Systems (ADAS) primarily focus on basic tasks, leaving unexpected scenarios for human intervention, which can be error-prone. Motion planning approaches for higher levels of automation in the state-of-the-art are primarily oriented toward the use of risk- or anti-collision constraints, using over-approximates of the shapes and sizes of other road users to prevent collisions. These methods however suffer from conservative behavior and the risk of infeasibility in high-risk initial conditions. In contrast, our work introduces a novel multi-objective trajectory generation approach. We propose an innovative method for constructing risk fields that accommodates diverse entity shapes and sizes, which allows us to also account for the presence of potentially occluded objects. This methodology is integrated into an occlusion-aware trajectory generator, enabling dynamic and safe maneuvering through intricate environments while anticipating (potentially hidden) road users and traveling along the infrastructure toward a specific goal. Through theoretical underpinnings and simulations, we validate the effectiveness of our approach. This paper bridges crucial gaps in motion planning for automated vehicles, offering a pathway toward safer and more adaptable autonomous navigation in complex urban contexts. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Submitted to the IEEE Transactions on Intelligent Transportation Systems (T-ITS); 14 pages, 11 figures, 1 tables

arXiv:2112.00619 [pdf, other]

Edge computing for cyber-physical systems: A systematic mapping study emphasizing trustworthiness

Authors: José Manuel Gaspar Sánchez, Nils Jörgensen, Martin Törngren, Rafia Inam, Andrii Berezovskyi, Lei Feng, Elena Fersman, Muhammad Rusyadi Ramli, Kaige Tan

Abstract: Edge computing is projected to have profound implications in the coming decades, proposed to provide solutions for applications such as augmented reality, predictive functionalities, and collaborative Cyber-Physical Systems (CPS). For such applications, edge computing addresses the new computational needs, as well as privacy, availability, and real-time constraints, by providing local high-perform… ▽ More Edge computing is projected to have profound implications in the coming decades, proposed to provide solutions for applications such as augmented reality, predictive functionalities, and collaborative Cyber-Physical Systems (CPS). For such applications, edge computing addresses the new computational needs, as well as privacy, availability, and real-time constraints, by providing local high-performance computing capabilities to deal with the limitations and constraints of cloud and embedded systems. Our interests lie in the applications of edge computing as part of CPS, where several properties (or attributes) of trustworthiness, including safety, security, and predictability/availability are of particular concern, each facing challenges for the introduction of edge-based CPS. We present the results of a systematic mapping study, a kind of systematic literature survey, investigating the use of edge computing for CPS with a special emphasis on trustworthiness. The main contributions of this study are a detailed description of the current research efforts in edge-based CPS and the identification and discussion of trends and research gaps. The results show that the main body of research in edge-based CPS only to a very limited extent consider key attributes of system trustworthiness, despite many efforts referring to critical CPS and applications like intelligent transportation. More research and industrial efforts will be needed on aspects of trustworthiness of future edge-based CPS including their experimental evaluation. Such research needs to consider the multiple interrelated attributes of trustworthiness including safety, security, and predictability, and new methodologies and architectures to address them. It is further important to provide bridges and collaboration between edge computing and CPS disciplines. △ Less

Submitted 26 November, 2021; originally announced December 2021.

arXiv:2110.08664 [pdf, other]

Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review

Authors: Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, Mihai Nica, Hermann Felbinger

Abstract: Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the i… ▽ More Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the identification of scenarios and in particular critical ones that may impose unacceptable risk if not considered. Critical scenarios are particularly important to support design, verification and validation efforts, and as a basis for a safety case. In this paper, we present the results of a systematic literature review in the context of autonomous driving. The main contributions are: (i) introducing a comprehensive taxonomy for critical scenario identification methods; (ii) giving an overview of the state-of-the-art research based on the taxonomy encompassing 86 papers between 2017 and 2020; and (iii) identifying open issues and directions for further research. The provided taxonomy comprises three main perspectives encompassing the problem definition (the why), the solution (the methods to derive scenarios), and the assessment of the established scenarios. In addition, we discuss open research issues considering the perspectives of coverage, practicability, and scenario space explosion. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: 37 pages, 24 figures

arXiv:2101.09154 [pdf, other]

doi 10.1016/j.rse.2021.112772

Virtual laser scanning with HELIOS++: A novel take on ray tracing-based simulation of topographic 3D laser scanning

Authors: Lukas Winiwarter, Alberto Manuel Esmorís Pena, Hannah Weiser, Katharina Anders, Jorge Martínez Sanchez, Mark Searle, Bernhard Höfle

Abstract: Topographic laser scanning is a remote sensing method to create detailed 3D point cloud representations of the Earth's surface. Since data acquisition is expensive, simulations can complement real data given certain premises are available: i) a model of 3D scene and scanner, ii) a model of the beam-scene interaction, simplified to a computationally feasible while physically realistic level, and ii… ▽ More Topographic laser scanning is a remote sensing method to create detailed 3D point cloud representations of the Earth's surface. Since data acquisition is expensive, simulations can complement real data given certain premises are available: i) a model of 3D scene and scanner, ii) a model of the beam-scene interaction, simplified to a computationally feasible while physically realistic level, and iii) an application for which simulated data is fit for use. A number of laser scanning simulators for different purposes exist, which we enrich by presenting HELIOS++. HELIOS++ is an open-source simulation framework for terrestrial static, mobile, UAV-based and airborne laser scanning implemented in C++. The HELIOS++ concept provides a flexible solution for the trade-off between physical accuracy (realism) and computational complexity (runtime, memory footprint), as well as ease of use and of configuration. Unique features of HELIOS++ include the availability of Python bindings (pyhelios) for controlling simulations, and a range of model types for 3D scene representation. HELIOS++ further allows the simulation of beam divergence using a subsampling strategy, and is able to create full-waveform outputs as a basis for detailed analysis. As generation and analysis of waveforms can strongly impact runtimes, the user may set the level of detail for the subsampling, or optionally disable full-waveform output altogether. A detailed assessment of computational considerations and a comparison of HELIOS++ to its predecessor, HELIOS, reveal reduced runtimes by up to 83 %. At the same time, memory requirements are reduced by up to 94 %, allowing for much larger (i.e. more complex) 3D scenes to be loaded into memory and hence to be virtually acquired by laser scanning simulation. △ Less

Submitted 21 January, 2021; originally announced January 2021.

arXiv:1802.02511 [pdf, other]

DeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk Prediction

Authors: Brandon Ballinger, Johnson Hsieh, Avesh Singh, Nimit Sohoni, Jack Wang, Geoffrey H. Tison, Gregory M. Marcus, Jose M. Sanchez, Carol Maguire, Jeffrey E. Olgin, Mark J. Pletcher

Abstract: We train and validate a semi-supervised, multi-task LSTM on 57,675 person-weeks of data from off-the-shelf wearable heart rate sensors, showing high accuracy at detecting multiple medical conditions, including diabetes (0.8451), high cholesterol (0.7441), high blood pressure (0.8086), and sleep apnea (0.8298). We compare two semi-supervised train- ing methods, semi-supervised sequence learning and… ▽ More We train and validate a semi-supervised, multi-task LSTM on 57,675 person-weeks of data from off-the-shelf wearable heart rate sensors, showing high accuracy at detecting multiple medical conditions, including diabetes (0.8451), high cholesterol (0.7441), high blood pressure (0.8086), and sleep apnea (0.8298). We compare two semi-supervised train- ing methods, semi-supervised sequence learning and heuristic pretraining, and show they outperform hand-engineered biomarkers from the medical literature. We believe our work suggests a new approach to patient risk stratification based on cardiovascular risk scores derived from popular wearables such as Fitbit, Apple Watch, or Android Wear. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Comments: Presented at AAAI 2018

arXiv:1507.03352 [pdf]

Self-Modeling Based Diagnosis of Software-Defined Networks

Authors: José Manuel Sánchez, Imen Grida Ben Yahia, Noel Crespi

Abstract: Networks built using SDN (Software-Defined Networks) and NFV (Network Functions Virtualization) approaches are expected to face several challenges such as scalability, robustness and resiliency. In this paper, we propose a self-modeling based diagnosis to enable resilient networks in the context of SDN and NFV. We focus on solving two major problems: On the one hand, we lack today of a model or te… ▽ More Networks built using SDN (Software-Defined Networks) and NFV (Network Functions Virtualization) approaches are expected to face several challenges such as scalability, robustness and resiliency. In this paper, we propose a self-modeling based diagnosis to enable resilient networks in the context of SDN and NFV. We focus on solving two major problems: On the one hand, we lack today of a model or template that describes the managed elements in the context of SDN and NFV. On the other hand, the highly dynamic networks enabled by the softwarisation require the generation at runtime of a diagnosis model from which the root causes can be identified. In this paper, we propose finer granular templates that do not only model network nodes but also their sub-components for a more detailed diagnosis suitable in the SDN and NFV context. In addition, we specify and validate a self-modeling based diagnosis using Bayesian Networks. This approach differs from the state of the art in the discovery of network and service dependencies at run-time and the building of the diagnosis model of any SDN infrastructure using our templates. △ Less

Submitted 13 July, 2015; originally announced July 2015.

Showing 1–8 of 8 results for author: Sanchez, J M