-
Modeling Buffer Occupancy in bittide Systems
Authors:
Sanjay Lall,
Tammo Spalink
Abstract:
The bittide mechanism enables logically synchronous computation across distributed systems by leveraging the continuous frame transmission inherent to wired networks such as Ethernet. Instead of relying on a global clock, bittide uses a decentralized control system to adjust local clock frequencies, ensuring all nodes operate with a consistent notion of time by utilizing elastic buffers at each no…
▽ More
The bittide mechanism enables logically synchronous computation across distributed systems by leveraging the continuous frame transmission inherent to wired networks such as Ethernet. Instead of relying on a global clock, bittide uses a decentralized control system to adjust local clock frequencies, ensuring all nodes operate with a consistent notion of time by utilizing elastic buffers at each node to absorb frequency variations. This paper presents an analysis of the steady-state occupancy of these elastic buffers, a critical factor influencing system latency. Using a fluid model of the bittide system, we prove that buffer occupancy converges and derive an explicit formula for the steady-state value in terms of system parameters, including network topology, physical latencies, and controller gains. This analysis provides valuable insights for optimizing buffer sizes and minimizing latency in bittide-based distributed systems.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
On Buffer Centering for Bittide Synchronization
Authors:
Sanjay Lall,
Calin Cascaval,
Martin Izzard,
Tammo Spalink
Abstract:
We discuss distributed reframing control of bittide systems. In a bittide system, multiple processors synchronize by monitoring communication over the network. The processors remain in logical synchrony by controlling the timing of frame transmissions. The protocol for doing this relies upon an underlying dynamic control system, where each node makes only local observations and performs no direct…
▽ More
We discuss distributed reframing control of bittide systems. In a bittide system, multiple processors synchronize by monitoring communication over the network. The processors remain in logical synchrony by controlling the timing of frame transmissions. The protocol for doing this relies upon an underlying dynamic control system, where each node makes only local observations and performs no direct coordination with other nodes. In this paper we develop a control algorithm based on the idea of reset control, which allows all nodes to maintain small buffer offsets while also requiring very little state information at each node. We demonstrate that with reframing, we can achieve separate control of frequency and phase, allowing both the frequencies to be syntonized and the buffers to be moved the desired points, rather than combining their control via a proportional-integral controller. This offers the potential for simplified boot processes and failure handling.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Resistance Distance and Control Performance for bittide Synchronization
Authors:
Sanjay Lall,
Calin Cascaval,
Martin Izzard,
Tammo Spalink
Abstract:
We discuss control of bittide distributed systems, which are designed to provide logical synchronization between networked machines by observing data flow rates between adjacent systems at the physical network layer and controlling local reference clock frequencies. We analyze the performance of approximate proportional-integral control of the synchronization mechanism and develop a simple continu…
▽ More
We discuss control of bittide distributed systems, which are designed to provide logical synchronization between networked machines by observing data flow rates between adjacent systems at the physical network layer and controlling local reference clock frequencies. We analyze the performance of approximate proportional-integral control of the synchronization mechanism and develop a simple continuous-time model to show the resulting dynamics are stable for any positive choice of gains. We then construct explicit formulae to show that closed-loop performance measured using the L2 norm is a product of two terms, one depending only on resistance distances in the graph, and the other depending only on controller gains.
△ Less
Submitted 31 March, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Modeling and Control of bittide Synchronization
Authors:
Sanjay Lall,
Calin Cascaval,
Martin Izzard,
Tammo Spalink
Abstract:
Distributed system applications rely on a fine-grain common sense of time. Existing systems maintain the common sense of time by keeping each independent machine as close as possible to wall-clock time through a combination of software protocols like NTP and GPS signals and/or precision references like atomic clocks. This approach is expensive and has tolerance limitations that require protocols t…
▽ More
Distributed system applications rely on a fine-grain common sense of time. Existing systems maintain the common sense of time by keeping each independent machine as close as possible to wall-clock time through a combination of software protocols like NTP and GPS signals and/or precision references like atomic clocks. This approach is expensive and has tolerance limitations that require protocols to deal with asynchrony and its performance consequences. Moreover, at data-center scale it is impractical to distribute a physical clock as is done on a chip or printed circuit board. In this paper we introduce a distributed system design that removes the need for physical clock distribution or mechanisms for maintaining close alignment to wall-clock time, and instead provides applications with a perfectly synchronized logical clock. We discuss the abstract frame model (AFM), a mathematical model that underpins the system synchronization. The model is based on the rate of communication between nodes in a topology without requiring a global clock. We show that there are families of controllers that satisfy the properties required for existence and uniqueness of solutions to the AFM, and give examples.
△ Less
Submitted 31 March, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.