Search | arXiv e-print repository

arXiv:2506.09913 [pdf, ps, other]

A Note on the Reliability of Goal-Oriented Error Estimates for Galerkin Finite Element Methods with Nonlinear Functionals

Authors: Brian N. Granzow, Stephen D. Bond, D. Thomas Seidl, Bernhard Endtmayer

Abstract: We consider estimating the discretization error in a nonlinear functional $J(u)$ in the setting of an abstract variational problem: find $u \in \mathcal{V}$ such that $B(u,\varphi) = L(\varphi) \; \forall \varphi \in \mathcal{V}$, as approximated by a Galerkin finite element method. Here, $\mathcal{V}$ is a Hilbert space, $B(\cdot,\cdot)$ is a bilinear form, and $L(\cdot)$ is a linear functional.… ▽ More We consider estimating the discretization error in a nonlinear functional $J(u)$ in the setting of an abstract variational problem: find $u \in \mathcal{V}$ such that $B(u,\varphi) = L(\varphi) \; \forall \varphi \in \mathcal{V}$, as approximated by a Galerkin finite element method. Here, $\mathcal{V}$ is a Hilbert space, $B(\cdot,\cdot)$ is a bilinear form, and $L(\cdot)$ is a linear functional. We consider well-known error estimates $η$ of the form $J(u) - J(u_h) \approx η= L(z) - B(u_h, z)$, where $u_h$ denotes a finite element approximation to $u$, and $z$ denotes the solution to an auxiliary adjoint variational problem. We show that there exist nonlinear functionals for which error estimates of this form are not reliable, even in the presence of an exact adjoint solution solution $z$. An estimate $η$ is said to be reliable if there exists a constant $C \in \mathbb{R}_{>0}$ independent of $u_h$ such that $|J(u) - J(u_h)| \leq C|η|$. We present several example pairs of bilinear forms and nonlinear functionals where reliability of $η$ is not achieved. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: 6 pages

Report number: SAND2025-07143O

arXiv:2404.04621 [pdf, other]

doi 10.1145/3656391

IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store Applications

Authors: Chujun Geng, Spyros Blanas, Michael D. Bond, Yang Wang

Abstract: This paper presents the first dynamic predictive analysis for data store applications under weak isolation levels, called Isopredict. Given an observed serializable execution of a data store application, Isopredict generates and solves SMT constraints to find an unserializable execution that is a feasible execution of the application. Isopredict introduces novel techniques that handle divergent ap… ▽ More This paper presents the first dynamic predictive analysis for data store applications under weak isolation levels, called Isopredict. Given an observed serializable execution of a data store application, Isopredict generates and solves SMT constraints to find an unserializable execution that is a feasible execution of the application. Isopredict introduces novel techniques that handle divergent application behavior; solve mutually recursive sets of constraints; and balance coverage, precision, and performance. An evaluation on four transactional data store benchmarks shows that Isopredict often predicts unserializable behaviors, 99% of which are feasible. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Journal ref: Proc. ACM Program. Lang., Vol. 8, No. PLDI, Article 161. Publication date: June 2024

arXiv:2311.00097 [pdf, ps, other]

doi 10.1145/3649817

Cocoon: Static Information Flow Control in Rust

Authors: Ada Lamba, Max Taylor, Vincent Beardsley, Jacob Bambeck, Michael D. Bond, Zhiqiang Lin

Abstract: Information flow control (IFC) provides confidentiality by enforcing noninterference, which ensures that high-secrecy values cannot affect low-secrecy values. Prior work introduces fine-grained IFC approaches that modify the programming language and use nonstandard compilation tools, impose run-time overhead, or report false secrecy leaks -- all of which hinder adoption. This paper presents Coco… ▽ More Information flow control (IFC) provides confidentiality by enforcing noninterference, which ensures that high-secrecy values cannot affect low-secrecy values. Prior work introduces fine-grained IFC approaches that modify the programming language and use nonstandard compilation tools, impose run-time overhead, or report false secrecy leaks -- all of which hinder adoption. This paper presents Cocoon, a Rust library for static type-based IFC that uses the unmodified Rust language and compiler. The key insight of Cocoon lies in leveraging Rust's type system and procedural macros to establish an effect system that enforces noninterference. A performance evaluation shows that using Cocoon increases compile time but has no impact on application performance. To demonstrate Cocoon's utility, we retrofitted two popular Rust programs, the Spotify TUI client and Mozilla's Servo browser engine, to use Cocoon to enforce limited confidentiality policies. △ Less

Submitted 18 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Will be published in PACMPL(OOPSLA) in October 2024

arXiv:2305.15285 [pdf, other]

doi 10.1016/j.cma.2023.116364

Linearization Errors in Discrete Goal-Oriented Error Estimation

Authors: Brian N. Granzow, D. Thomas Seidl, Stephen D. Bond

Abstract: This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlin… ▽ More This paper is concerned with goal-oriented a posteriori error estimation for nonlinear functionals in the context of nonlinear variational problems solved with continuous Galerkin finite element discretizations. A two-level, or discrete, adjoint-based approach for error estimation is considered. The traditional method to derive an error estimate in this context requires linearizing both the nonlinear variational form and the nonlinear functional of interest which introduces linearization errors into the error estimate. In this paper, we investigate these linearization errors. In particular, we develop a novel discrete goal-oriented error estimate that accounts for traditionally neglected nonlinear terms at the expense of greater computational cost. We demonstrate how this error estimate can be used to drive mesh adaptivity. We show that accounting for linearization errors in the error estimate can improve its effectivity for several nonlinear model problems and quantities of interest. We also demonstrate that an adaptive strategy based on the newly proposed estimate can lead to more accurate approximations of the nonlinear functional with fewer degrees of freedom when compared to uniform refinement and traditional adjoint-based approaches. △ Less

Submitted 19 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Report number: SAND2023-03991O

arXiv:2212.03054 [pdf, other]

DisTRaC: Accelerating High Performance Compute Processing for Temporary Data Storage

Authors: Gabryel Mason-Williams, Dave Bond, Mark Basham

Abstract: High Performance Compute (HPC) clusters often produce intermediate files as part of code execution and message passing is not always possible to supply data to these cluster jobs. In these cases, I/O goes back to central distributed storage to allow cross node data sharing. These systems are often high performance and characterised by their high cost per TB and sensitivity to workload type such as… ▽ More High Performance Compute (HPC) clusters often produce intermediate files as part of code execution and message passing is not always possible to supply data to these cluster jobs. In these cases, I/O goes back to central distributed storage to allow cross node data sharing. These systems are often high performance and characterised by their high cost per TB and sensitivity to workload type such as being tuned to small or large file I/O. However, compute nodes often have large amounts of RAM, so when dealing with intermediate files where longevity or reliability of the system is not as important, local RAM disks can be used to obtain performance benefits. In this paper we show how this problem was tackled by creating a RAM block that could interact with the object storage system Ceph, as well as creating a deployment tool to deploy Ceph on HPC infrastructure effectively. This work resulted in a system that was more performant than the central high performance distributed storage system used at Diamond reducing I/O overhead and processing time for Savu, a tomography data processing application, by 81.04% and 8.32% respectively. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 5 pages, 4 figures

arXiv:2112.07880 [pdf, other]

doi 10.1109/ISPASS55109.2022.00005

Distilling the Real Cost of Production Garbage Collectors

Authors: Zixian Cai, Stephen M. Blackburn, Michael D. Bond, Martin Maas

Abstract: Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the de… ▽ More Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations. △ Less

Submitted 5 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Camera-ready version

arXiv:2107.05453 [pdf, other]

Neat: Low-Complexity, Efficient On-Chip Cache Coherence

Authors: Rui Zhang, Swarnendu Biswas, Vignesh Balaji, Michael D. Bond, Brandon Lucia

Abstract: Cache coherence protocols such as MESI that use writer-initiated invalidation have high complexity and sometimes have poor performance and energy usage, especially under false sharing. Such protocols require numerous transient states, a shared directory, and support for core-to-core communication, while also suffering under false sharing. An alternative to MESI's writer-initiated invalidation is s… ▽ More Cache coherence protocols such as MESI that use writer-initiated invalidation have high complexity and sometimes have poor performance and energy usage, especially under false sharing. Such protocols require numerous transient states, a shared directory, and support for core-to-core communication, while also suffering under false sharing. An alternative to MESI's writer-initiated invalidation is self-invalidation, which achieves lower complexity than MESI but adds high performance costs or relies on programmer annotations or specific data access patterns. This paper presents Neat, a low-complexity, efficient cache coherence protocol. Neat uses self-invalidation, thus avoiding MESI's transient states, directory, and core-to-core communication requirements. Neat uses novel mechanisms that effectively avoid many unnecessary self-invalidations. An evaluation shows that Neat is simple and has lower verification complexity than the MESI protocol. Neat not only outperforms state-of-the-art self-invalidation protocols, but its performance and energy consumption are comparable to MESI's, and it outperforms MESI under false sharing. △ Less

Submitted 24 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2004.00262 [pdf, ps, other]

doi 10.1145/3385412.3385991

Crafty: Efficient, HTM-Compatible Persistent Transactions

Authors: Kaan Genç, Michael D. Bond, Guoqing Harry Xu

Abstract: Byte-addressable persistent memory, such as Intel/Micron 3D XPoint, is an emerging technology that bridges the gap between volatile memory and persistent storage. Data in persistent memory survives crashes and restarts; however, it is challenging to ensure that this data is consistent after failures. Existing approaches incur significant performance costs to ensure crash consistency. This paper in… ▽ More Byte-addressable persistent memory, such as Intel/Micron 3D XPoint, is an emerging technology that bridges the gap between volatile memory and persistent storage. Data in persistent memory survives crashes and restarts; however, it is challenging to ensure that this data is consistent after failures. Existing approaches incur significant performance costs to ensure crash consistency. This paper introduces Crafty, a new approach for ensuring consistency and atomicity on persistent memory operations using commodity hardware with existing hardware transactional memory (HTM) capabilities, while incurring low overhead. Crafty employs a novel technique called nondestructive undo logging that leverages commodity HTM to control persist ordering. Our evaluation shows that Crafty outperforms state-of-the-art prior work under low contention, and performs competitively under high contention. △ Less

Submitted 20 April, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: 32 pages, 24 figures. To appear in PLDI 2020

ACM Class: D.3.3; H.3.2

arXiv:1907.08337 [pdf, ps, other]

Online Set-Based Dynamic Analysis for Sound Predictive Race Detection

Authors: Jake Roemer, Michael D. Bond

Abstract: Predictive data race detectors find data races that exist in executions other than the observed execution. Smaragdakis et al. introduced the causally-precedes (CP) relation and a polynomial-time analysis for sound (no false races) predictive data race detection. However, their analysis cannot scale beyond analyzing bounded windows of execution traces. This work introduces a novel dynamic analysis… ▽ More Predictive data race detectors find data races that exist in executions other than the observed execution. Smaragdakis et al. introduced the causally-precedes (CP) relation and a polynomial-time analysis for sound (no false races) predictive data race detection. However, their analysis cannot scale beyond analyzing bounded windows of execution traces. This work introduces a novel dynamic analysis called Raptor that computes CP soundly and completely. Raptor is inherently an online analysis that analyzes and finds all CP-races of an execution trace in its entirety. An evaluation of a prototype implementation of Raptor shows that it scales to program executions that the prior CP analysis cannot handle, finding data races that the prior CP analysis cannot find. △ Less

Submitted 18 July, 2019; originally announced July 2019.

arXiv:1905.00494 [pdf, other]

doi 10.1145/3385412.3385993

SmartTrack: Efficient Predictive Race Detection

Authors: Jake Roemer, Kaan Genç, Michael D. Bond

Abstract: Widely used data race detectors, including the state-of-the-art FastTrack algorithm, incur performance costs that are acceptable for regular in-house testing, but miss races detectable from the analyzed execution. Predictive analyses detect more data races in an analyzed execution than FastTrack detects, but at significantly higher performance cost. This paper presents SmartTrack, an algorithm t… ▽ More Widely used data race detectors, including the state-of-the-art FastTrack algorithm, incur performance costs that are acceptable for regular in-house testing, but miss races detectable from the analyzed execution. Predictive analyses detect more data races in an analyzed execution than FastTrack detects, but at significantly higher performance cost. This paper presents SmartTrack, an algorithm that optimizes predictive race detection analyses, including two analyses from prior work and a new analysis introduced in this paper. SmartTrack's algorithm incorporates two main optimizations: (1) epoch and ownership optimizations from prior work, applied to predictive analysis for the first time; and (2) novel conflicting critical section optimizations introduced by this paper. Our evaluation shows that SmartTrack achieves performance competitive with FastTrack-a qualitative improvement in the state of the art for data race detection. △ Less

Submitted 8 April, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: Extended arXiv version of PLDI 2020 paper (adds Appendices A-E) #228 SmartTrack: Efficient Predictive Race Detection

arXiv:1904.13088 [pdf, ps, other]

Dependence-Aware, Unbounded Sound Predictive Race Detection

Authors: Kaan Genç, Jake Roemer, Yufan Xu, Michael D. Bond

Abstract: Data races are a real problem for parallel software, yet hard to detect. Sound predictive analysis observes a program execution and detects data races that exist in some other, unobserved execution. However, existing predictive analyses miss races because they do not scale to full program executions or do not precisely incorporate data and control dependence. This paper introduces two novel, sou… ▽ More Data races are a real problem for parallel software, yet hard to detect. Sound predictive analysis observes a program execution and detects data races that exist in some other, unobserved execution. However, existing predictive analyses miss races because they do not scale to full program executions or do not precisely incorporate data and control dependence. This paper introduces two novel, sound predictive approaches that incorporate data and control dependence and handle full program executions. An evaluation using real, large Java programs shows that these approaches detect more data races than the closest related approaches, thus advancing the state of the art in sound predictive race detection. △ Less

Submitted 28 June, 2021; v1 submitted 30 April, 2019; originally announced April 2019.

arXiv:cs/0504092 [pdf, ps, other]

On Optimality Condition of Complex Systems: Computational Evidence

Authors: Victor Korotkikh, Galina Korotkikh, Darryl Bond

Abstract: A general condition determining the optimal performance of a complex system has not yet been found and the possibility of its existence is unknown. To contribute in this direction, an optimization algorithm as a complex system is presented. The performance of the algorithm for any problem is controlled as a convex function with a single optimum. To characterize the performance optimums, certain… ▽ More A general condition determining the optimal performance of a complex system has not yet been found and the possibility of its existence is unknown. To contribute in this direction, an optimization algorithm as a complex system is presented. The performance of the algorithm for any problem is controlled as a convex function with a single optimum. To characterize the performance optimums, certain quantities of the algorithm and the problem are suggested and interpreted as their complexities. An optimality condition of the algorithm is computationally found: if the algorithm shows its best performance for a problem, then the complexity of the algorithm is in a linear relationship with the complexity of the problem. The optimality condition provides a new perspective to the subject by recognizing that the relationship between certain quantities of the complex system and the problem may determine the optimal performance. △ Less

Submitted 23 April, 2005; originally announced April 2005.

Comments: 5 pages, 4 figures

Showing 1–12 of 12 results for author: Bond, D