-
FireGuard: A Generalized Microarchitecture for Fine-Grained Security Analysis on OoO Superscalar Cores
Authors:
Zhe Jiang,
Sam Ainsworth,
Timothy Jones
Abstract:
High-performance security guarantees rely on hardware support. Generic programmable support for fine-grained instruction analysis has gained broad interest in the literature as a fundamental building block for the security of future processors. Yet, implementation in real out-of-order (OoO) superscalar processors presents tough challenges that cannot be explored in highly abstract simulators. We d…
▽ More
High-performance security guarantees rely on hardware support. Generic programmable support for fine-grained instruction analysis has gained broad interest in the literature as a fundamental building block for the security of future processors. Yet, implementation in real out-of-order (OoO) superscalar processors presents tough challenges that cannot be explored in highly abstract simulators. We detail the challenges of implementing complex programmable pathways without critical paths or contention. We then introduce FireGuard, the first implementation of fine-grained instruction analysis on a real OoO superscalar processor. We establish an end-to-end system, including microarchitecture, SoC, ISA and programming model. Experiments show that our solution simultaneously ensures both security and performance of the system, with parallel scalability. We examine the feasibility of building FireGuard into modern SoCs: Apple's M1-Pro, Huawei's Kirin-960, and Intel's i7-12700F, where less than 1% silicon area is introduced. The Repo. of FireGuard's source code: https://github.com/SEU-ACAL/reproduce-FireGuard-DAC-25.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
MEEK: Re-thinking Heterogeneous Parallel Error Detection Architecture for Real-World OoO Superscalar Processors
Authors:
Zhe Jiang,
Minli Liao,
Sam Ainsworth,
Dean You,
Timothy Jones
Abstract:
Heterogeneous parallel error detection is an approach to achieving fault-tolerant processors, leveraging multiple power-efficient cores to re-execute software originally run on a high-performance core. Yet, its complex components, gathering data cross-chip from many parts of the core, raise questions of how to build it into commodity cores without heavy design invasion and extensive re-engineering…
▽ More
Heterogeneous parallel error detection is an approach to achieving fault-tolerant processors, leveraging multiple power-efficient cores to re-execute software originally run on a high-performance core. Yet, its complex components, gathering data cross-chip from many parts of the core, raise questions of how to build it into commodity cores without heavy design invasion and extensive re-engineering.
We build the first full-RTL design, MEEK, into an open-source SoC, from microarchitecture and ISA to the OS and programming model. We identify and solve bottlenecks and bugs overlooked in previous work, and demonstrate that MEEK offers microsecond-level detection capacity with affordable overheads. By trading off architectural functionalities across codesigned hardware-software layers, MEEK features only light changes to a mature out-of-order superscalar core, simple coordinating software layers, and a few lines of operating-system code. The Repo. of MEEK's source code: https://github.com/SEU-ACAL/reproduce-MEEK-DAC-25.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher
Authors:
Sam Ainsworth,
Lev Mukhanov
Abstract:
Temporal prefetching, where correlated pairs of addresses are logged and replayed on repeat accesses, has recently become viable in commercial designs. Arm's latest processors include Correlating Miss Chaining prefetchers, which store such patterns in a partition of the on-chip cache. However, the state-of-the-art on-chip temporal prefetcher in the literature, Triage, features some design inconsis…
▽ More
Temporal prefetching, where correlated pairs of addresses are logged and replayed on repeat accesses, has recently become viable in commercial designs. Arm's latest processors include Correlating Miss Chaining prefetchers, which store such patterns in a partition of the on-chip cache. However, the state-of-the-art on-chip temporal prefetcher in the literature, Triage, features some design inconsistencies and inaccuracies that pose challenges for practical implementation. We first examine and design fixes for these inconsistencies to produce an implementable baseline. We then introduce Triangel, a prefetcher that extends Triage with novel sampling-based methodologies to allow it to be aggressive and timely when the prefetcher is able to handle observed long-term patterns, and to avoid inaccurate prefetches when less able to do so. Triangel gives a 26.4% speedup compared to a baseline system with a conventional stride prefetcher alone, compared with 9.3% for Triage at degree 1 and 14.2% at degree 4. At the same time Triangel only increases memory traffic by 10% relative to baseline, versus 28.5% for Triage.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Rewriting History: Repurposing Domain-Specific CGRAs
Authors:
Jackson Woodruff,
Thomas Koehler,
Alexander Brauckmann,
Chris Cummins,
Sam Ainsworth,
Michael F. P. O'Boyle
Abstract:
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations the…
▽ More
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support.
FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2$\times$ increase in the number of loop kernels accelerated leading to 3$\times$ speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
CONVOLVE: Smart and seamless design of smart edge processors
Authors:
M. Gomony,
F. Putter,
A. Gebregiorgis,
G. Paulin,
L. Mei,
V. Jain,
S. Hamdioui,
V. Sanchez,
T. Grosser,
M. Geilen,
M. Verhelst,
F. Zenke,
F. Gurkaynak,
B. Bruin,
S. Stuijk,
S. Davidson,
S. De,
M. Ghogho,
A. Jimborean,
S. Eissa,
L. Benini,
D. Soudris,
R. Bishnoi,
S. Ainsworth,
F. Corradi
, et al. (3 additional authors not shown)
Abstract:
With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP), with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC…
▽ More
With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP), with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC market. However, this requires AI edge processing to become at least 100 times more energy-efficient, while offering sufficient flexibility and scalability to deal with AI as a fast-moving target. Since the design space of these complex SoCs is huge, advanced tooling is needed to make their design tractable. The CONVOLVE project (currently in Inital stage) addresses these roadblocks. It takes a holistic approach with innovations at all levels of the design hierarchy. Starting with an overview of SOTA DL processing support and our project methodology, this paper presents 8 important design choices largely impacting the energy efficiency and flexibility of DL hardware. Finding good solutions is key to making smart-edge computing a reality.
△ Less
Submitted 2 May, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Hacky Racers: Exploiting Instruction-Level Parallelism to Generate Stealthy Fine-Grained Timers
Authors:
Haocheng Xiao,
Sam Ainsworth
Abstract:
Side-channel attacks pose serious threats to many security models, especially sandbox-based browsers. While transient-execution side channels in out-of-order processors have previously been blamed for vulnerabilities such as Spectre and Meltdown, we show that in fact, the capability of out-of-order execution \emph{itself} to cause mayhem is far more general.
We develop Hacky Racers, a new type o…
▽ More
Side-channel attacks pose serious threats to many security models, especially sandbox-based browsers. While transient-execution side channels in out-of-order processors have previously been blamed for vulnerabilities such as Spectre and Meltdown, we show that in fact, the capability of out-of-order execution \emph{itself} to cause mayhem is far more general.
We develop Hacky Racers, a new type of timing gadget that uses instruction-level parallelism, another key feature of out-of-order execution, to measure arbitrary fine-grained timing differences, even in the presence of highly restricted JavaScript sandbox environments. While such environments try to mitigate timing side channels by reducing timer precision and removing language features such as \textit{SharedArrayBuffer} that can be used to indirectly generate timers via thread-level parallelism, no such restrictions can be designed to limit Hacky Racers. We also design versions of Hacky Racers that require no misspeculation whatsoever, demonstrating that transient execution is not the only threat to security from modern microarchitectural performance optimization.
We use Hacky Racers to construct novel \textit{backwards-in-time} Spectre gadgets, which break many hardware countermeasures in the literature by leaking secrets before misspeculation is discovered. We also use them to generate the first known last-level cache eviction set generator in JavaScript that does not require \textit{SharedArrayBuffer} support.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Git Re-Basin: Merging Models modulo Permutation Symmetries
Authors:
Samuel K. Ainsworth,
Jonathan Hayase,
Siddhartha Srinivasa
Abstract:
The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearl…
▽ More
The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes often contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. 2021. We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.
△ Less
Submitted 1 March, 2023; v1 submitted 11 September, 2022;
originally announced September 2022.
-
GhostMinion: A Strictness-Ordered Cache System for Spectre Mitigation
Authors:
Sam Ainsworth
Abstract:
Out-of-order speculation, a technique ubiquitous since the early 1990s, remains a fundamental security flaw. Via attacks such as Spectre and Meltdown, an attacker can trick a victim, in an otherwise entirely correct program, into leaking its secrets through the effects of misspeculated execution, in a way that is entirely invisible to the programmer's model. This has serious implications for appli…
▽ More
Out-of-order speculation, a technique ubiquitous since the early 1990s, remains a fundamental security flaw. Via attacks such as Spectre and Meltdown, an attacker can trick a victim, in an otherwise entirely correct program, into leaking its secrets through the effects of misspeculated execution, in a way that is entirely invisible to the programmer's model. This has serious implications for application sandboxing and inter-process communication.
Designing efficient mitigations, that preserve the performance of out-of-order execution, has been a challenge. The speculation-hiding techniques in the literature have been shown to not close such channels comprehensively, allowing adversaries to redesign attacks. Strong, precise guarantees are necessary, but at the same time mitigations must achieve high performance to be adopted. We present Strictness Ordering, a new constraint system that shows how we can comprehensively eliminate transient side channel attacks, while still allowing complex speculation and data forwarding between speculative instructions. We then present GhostMinion, a cache modification built using a variety of new techniques designed to provide Strictness Order at only 2.5% overhead.
△ Less
Submitted 9 September, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Faster Policy Learning with Continuous-Time Gradients
Authors:
Samuel Ainsworth,
Kendall Lowrey,
John Thickstun,
Zaid Harchaoui,
Siddhartha Srinivasa
Abstract:
We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate…
▽ More
We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.
△ Less
Submitted 24 June, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Mosaic: A Sample-Based Database System for Open World Query Processing
Authors:
Laurel Orr,
Samuel Ainsworth,
Walter Cai,
Kevin Jamieson,
Magda Balazinska,
Dan Suciu
Abstract:
Data scientists have relied on samples to analyze populations of interest for decades. Recently, with the increase in the number of public data repositories, sample data has become easier to access. It has not, however, become easier to analyze. This sample data is arbitrarily biased with an unknown sampling probability, meaning data scientists must manually debias the sample with custom technique…
▽ More
Data scientists have relied on samples to analyze populations of interest for decades. Recently, with the increase in the number of public data repositories, sample data has become easier to access. It has not, however, become easier to analyze. This sample data is arbitrarily biased with an unknown sampling probability, meaning data scientists must manually debias the sample with custom techniques to avoid inaccurate results. In this vision paper, we propose Mosaic, a database system that treats samples as first-class citizens and allows users to ask questions over populations represented by these samples. Answering queries over biased samples is non-trivial as there is no existing, standard technique to answer population queries when the sampling probability is unknown. In this paper, we show how our envisioned system solves this problem by having a unique sample-based data model with extensions to the SQL language. We propose how to perform population query answering using biased samples and give preliminary results for one of our novel query answering techniques.
△ Less
Submitted 10 January, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
Authors:
Samuel Ainsworth,
Matt Barnes,
Siddhartha Srinivasa
Abstract:
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of…
▽ More
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
MuonTrap: Preventing Cross-Domain Spectre-Like Attacks by Capturing Speculative State
Authors:
Sam Ainsworth,
Timothy M. Jones
Abstract:
The disclosure of the Spectre speculative-execution attacks in January 2018 has left a severe vulnerability that systems are still struggling with how to patch. The solutions that currently exist tend to have incomplete coverage, perform badly, or have highly undesirable edge cases that cause application domains to break.
MuonTrap allows processors to continue to speculate, avoiding significant…
▽ More
The disclosure of the Spectre speculative-execution attacks in January 2018 has left a severe vulnerability that systems are still struggling with how to patch. The solutions that currently exist tend to have incomplete coverage, perform badly, or have highly undesirable edge cases that cause application domains to break.
MuonTrap allows processors to continue to speculate, avoiding significant reductions in performance, without impacting security. We instead prevent the propagation of any state based on speculative execution, by placing the results of speculative cache accesses into a small, fast L0 filter cache, that is non-inclusive, non-exclusive with the rest of the cache hierarchy. This isolates all parts of the system that can't be quickly cleared on any change in threat domain.
MuonTrap uses these speculative filter caches, which are cleared on context and protection-domain switches, along with a series of extensions to the cache coherence protocol and prefetcher. This renders systems immune to cross-domain information leakage via Spectre and a host of similar attacks based on speculative execution, with low performance impact and few changes to the CPU design.
△ Less
Submitted 28 April, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Disentangled VAE Representations for Multi-Aspect and Missing Data
Authors:
Samuel K. Ainsworth,
Nicholas J. Foti,
Emily B. Fox
Abstract:
Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of th…
▽ More
Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of these problems is the issue of missing data: we can observe many English, French, or German sentences individually but only occasionally do we have data for a sentence pair. Motivated by these applications and inspired by recent progress in variational autoencoders for grouped data, we develop factVAE, a deep generative model capable of handling multi-aspect data, robust to missing observations, and with a prior that encourages disentanglement between the groups and the latent dimensions. The effectiveness of factVAE is demonstrated on a variety of rich real-world datasets, including motion capture poses and pictures of faces captured from varying poses and perspectives.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
Interpretable VAEs for nonlinear group factor analysis
Authors:
Samuel Ainsworth,
Nicholas Foti,
Adrian KC Lee,
Emily Fox
Abstract:
Deep generative models have recently yielded encouraging results in producing subjectively realistic samples of complex data. Far less attention has been paid to making these generative models interpretable. In many scenarios, ranging from scientific applications to finance, the observed variables have a natural grouping. It is often of interest to understand systems of interaction amongst these g…
▽ More
Deep generative models have recently yielded encouraging results in producing subjectively realistic samples of complex data. Far less attention has been paid to making these generative models interpretable. In many scenarios, ranging from scientific applications to finance, the observed variables have a natural grouping. It is often of interest to understand systems of interaction amongst these groups, and latent factor models (LFMs) are an attractive approach. However, traditional LFMs are limited by assuming a linear correlation structure. We present an output interpretable VAE (oi-VAE) for grouped data that models complex, nonlinear latent-to-observed relationships. We combine a structured VAE comprised of group-specific generators with a sparsity-inducing prior. We demonstrate that oi-VAE yields meaningful notions of interpretability in the analysis of motion capture and MEG data. We further show that in these situations, the regularization inherent to oi-VAE can actually lead to improved generalization and learned generative processes.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
A Framework for Evaluation of Composite Memento Temporal Coherence
Authors:
Scott G. Ainsworth,
Michael L. Nelson,
Herbert Van de Sompel
Abstract:
Most archived HTML pages embed other web resources, such as images and stylesheets. Playback of the archived web pages typically provides only the capture date (or Memento-Datetime) of the root resource and not the Memento-Datetime of the embedded resources. In the course of our research, we have discovered that the Memento-Datetime of embedded resources can be up to several years in the future or…
▽ More
Most archived HTML pages embed other web resources, such as images and stylesheets. Playback of the archived web pages typically provides only the capture date (or Memento-Datetime) of the root resource and not the Memento-Datetime of the embedded resources. In the course of our research, we have discovered that the Memento-Datetime of embedded resources can be up to several years in the future or past, relative to the Memento-Datetime of the embedding root resource. We introduce a framework for assessing temporal coherence between a root resource and its embedded resource depending on Memento-Datetime, Last-Modified datetime, and entity body.
△ Less
Submitted 5 October, 2014; v1 submitted 4 February, 2014;
originally announced February 2014.
-
Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive
Authors:
Scott G. Ainsworth,
Michael L. Nelson
Abstract:
When a user views an archived page using the archive's user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed; drifting…
▽ More
When a user views an archived page using the archive's user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed; drifting away from the datetime originally selected. When browsing sparsely-archived pages, this nearly-silent drift can be many years in just a few clicks. We conducted 200,000 acyclic walks of archived pages, following up to 50 links per walk, comparing the results of two target datetime policies. The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive's Wayback Machine. The Sticky Target policy, represented by the Memento API, keeps the target datetime the same throughout the walk. We found that the Sliding Target policy drift increases with the number of walk steps, number of domains visited, and choice (number of links available). However, the Sticky Target policy controls temporal drift, holding it to less than 30 days on average regardless of walk length or number of domains visited. The Sticky Target policy shows some increase as choice increases, but this may be caused by other factors. We conclude that based on walk length, the Sticky Target policy generally produces at least 30 days less drift than the Sliding Target policy.
△ Less
Submitted 21 September, 2013;
originally announced September 2013.
-
How Much of the Web Is Archived?
Authors:
Scott G. Ainsworth,
Ahmed AlSum,
Hany SalahEldeen,
Michele C. Weigle,
Michael L. Nelson
Abstract:
Although the Internet Archive's Wayback Machine is the largest and most well-known web archive, there have been a number of public web archives that have emerged in the last several years. With varying resources, audiences and collection development policies, these archives have varying levels of overlap with each other. While individual archives can be measured in terms of number of URIs, number…
▽ More
Although the Internet Archive's Wayback Machine is the largest and most well-known web archive, there have been a number of public web archives that have emerged in the last several years. With varying resources, audiences and collection development policies, these archives have varying levels of overlap with each other. While individual archives can be measured in terms of number of URIs, number of copies per URI, and intersection with other archives, to date there has been no answer to the question "How much of the Web is archived?" We study the question by approximating the Web using sample URIs from DMOZ, Delicious, Bitly, and search engine indexes; and, counting the number of copies of the sample URIs exist in various public web archives. Each sample set provides its own bias. The results from our sample sets indicate that range from 35%-90% of the Web has at least one archived copy, 17%-49% has between 2-5 copies, 1%-8% has 6-10 copies, and 8%-63% has more than 10 copies in public web archives. The number of URI copies varies as a function of time, but no more than 31.3% of URIs are archived more than once per month.
△ Less
Submitted 5 January, 2013; v1 submitted 26 December, 2012;
originally announced December 2012.
-
An HTTP-Based Versioning Mechanism for Linked Data
Authors:
Herbert Van de Sompel,
Robert Sanderson,
Michael L. Nelson,
Lyudmila L. Balakireva,
Harihar Shankar,
Scott Ainsworth
Abstract:
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also available, for example, as resource versions in Content Management Systems or archival resources in Web Archives such as the Internet Archive. This paper introduces a resource versioning mechanism that is fully based on HTT…
▽ More
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also available, for example, as resource versions in Content Management Systems or archival resources in Web Archives such as the Internet Archive. This paper introduces a resource versioning mechanism that is fully based on HTTP and uses datetime as a global version indicator. The approach allows "follow your nose" style navigation both from the current time-generic resource to associated time-specific version resources as well as among version resources. The proposed versioning mechanism is congruent with the Architecture of the World Wide Web, and is based on the Memento framework that extends HTTP with transparent content negotiation in the datetime dimension. The paper shows how the versioning approach applies to Linked Data, and by means of a demonstrator built for DBpedia, it also illustrates how it can be used to conduct a time-series analysis across versions of Linked Data descriptions.
△ Less
Submitted 18 March, 2010;
originally announced March 2010.
-
Memento: Time Travel for the Web
Authors:
Herbert Van de Sompel,
Michael L. Nelson,
Robert Sanderson,
Lyudmila L. Balakireva,
Scott Ainsworth,
Harihar Shankar
Abstract:
The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actua…
▽ More
The Web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol-wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in the most common Web protocol, HTTP, prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a significant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. This paper proposes the protocol-based Memento solution to address this problem, and describes a proof-of-concept experiment that includes major servers of archival content, including Wikipedia and the Internet Archive. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is a framework in which archived resources can seamlessly be reached via the URI of their original: protocol-based time travel for the Web.
△ Less
Submitted 6 November, 2009; v1 submitted 5 November, 2009;
originally announced November 2009.