-
"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture
Authors:
Mert Yildiz,
Alexey Rolich,
Andrea Baiocchi
Abstract:
A continuing effort is devoted to devising effective dispatching policies for clusters of First Come First Served servers. Although the optimal solution for dispatchers aware of both job size and server state remains elusive, lower bounds and strong heuristics are known. In this paper, we introduce a two-stage cluster architecture that applies classical Round Robin, Join Idle Queue, and Least Work…
▽ More
A continuing effort is devoted to devising effective dispatching policies for clusters of First Come First Served servers. Although the optimal solution for dispatchers aware of both job size and server state remains elusive, lower bounds and strong heuristics are known. In this paper, we introduce a two-stage cluster architecture that applies classical Round Robin, Join Idle Queue, and Least Work Left dispatching schemes, coupled with an optimized service-time threshold to separate large jobs from shorter ones. Using both synthetic (Weibull) workloads and real Google data center traces, we demonstrate that our two-stage approach greatly improves upon the corresponding single-stage policies and closely approaches the performance of advanced size- and state-aware methods. Our results highlight that careful architectural design-rather than increased complexity at the dispatcher-can yield significantly better mean response times in large-scale computing environments.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads
Authors:
Mert Yildiz,
Alexey Rolich,
Andrea Baiocchi
Abstract:
Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. Through extensive data-driven simulations, we aim to highlight the key features of workload traffic traces that influence response time performance under simple yet representative dispatching policies.…
▽ More
Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. Through extensive data-driven simulations, we aim to highlight the key features of workload traffic traces that influence response time performance under simple yet representative dispatching policies. For a given computational power budget, we vary the cluster size, i.e., the number of available servers. A job-level analysis reveals that Join Idle Queue (JIQ) and Least Work Left (LWL) exhibit an optimal working point for a fixed utilization coefficient as the number of servers is varied, whereas Round Robin (RR) demonstrates monotonously worsening performance. Additionally, we explore the accuracy of simple G/G queue approximations. When decomposing jobs into tasks, interesting results emerge; notably, the simpler, non-size-based policy JIQ appears to outperform the more "powerful" size-based LWL policy. Complementing these findings, we present preliminary results on a two-stage scheduling approach that partitions tasks based on service thresholds, illustrating that modest architectural modifications can further enhance performance under realistic workload conditions. We provide insights into these results and suggest promising directions for fully explaining the observed phenomena.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture
Authors:
Mert Yildiz,
Alexey Rolich,
Andrea Baiocchi
Abstract:
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers…
▽ More
While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Analysis of Status Update in Wireless Networks with Successive Interference Cancellation
Authors:
Asmad Bin Abdul Razzaque,
Andrea Baiocchi
Abstract:
Data collection in an IoT environment requires simple and effective communication solutions to address resource constraints, ensure network efficiency, while achieving scalability. Efficiency is evaluated based on the timeliness of collected data (Age of Information), the energy spent per delivered unit of data, and the effectiveness in utilizing spectrum resources. This paper addresses a random m…
▽ More
Data collection in an IoT environment requires simple and effective communication solutions to address resource constraints, ensure network efficiency, while achieving scalability. Efficiency is evaluated based on the timeliness of collected data (Age of Information), the energy spent per delivered unit of data, and the effectiveness in utilizing spectrum resources. This paper addresses a random multiple access adaptive system, in which a large number of devices send sporadic messages in non-periodic pattern. In particular, our analysis highlights the potential of Successive Interference Cancellation and identifies an adaptive parameter setting to maximize its benefits as the level of contention on the shared channel varies. An analytical model is defined, easily scalable with the number of nodes and yielding all the relevant metrics. Evidence of the accuracy of the model is given by comparing predicted results against simulations. The model is utilized to assess the trade-off between Age of Information and energy consumption, revealing a sharp relationship between the two. The considered approach lends itself to many generalizations and applications to massive machine-type communications and IoT networks.
△ Less
Submitted 30 August, 2024;
originally announced September 2024.
-
SIC-based Random Multiple Access Protocol: Fixed or Adaptive Approach
Authors:
A. B. Abdul Razzaque,
A. Baiocchi
Abstract:
Efficient data collection from a multitude of Internet of Things (IoT) devices is crucial for various applications, yet existing solutions often struggle with minimizing access delay and Age of Information (AoI), especially when managing multiple simultaneous transmissions and access strategies. This challenge becomes increasingly critical as IoT deployments continue to expand, demanding robust me…
▽ More
Efficient data collection from a multitude of Internet of Things (IoT) devices is crucial for various applications, yet existing solutions often struggle with minimizing access delay and Age of Information (AoI), especially when managing multiple simultaneous transmissions and access strategies. This challenge becomes increasingly critical as IoT deployments continue to expand, demanding robust mechanisms for handling diverse traffic scenarios. In this study, we propose a novel approach leveraging Successive Interference Cancellation (SIC) based on adaptive and fixed parameter schemes to address these limitations. By analyzing both throughput and AoI along with access delay, we demonstrate the effectiveness of our adaptive approach compared to the fixed approach, particularly in scenarios featuring heavy and light traffic. Our findings highlight the pivotal role of adaptive approaches in optimizing data collection processes in IoT ecosystems, with a particular focus on minimizing access delay, AoI, and spectral efficiency.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Asymptotic analysis of sum-rate under SIC
Authors:
Andrea Baiocchi,
Asmad Razzaque
Abstract:
Limitation of the cost of coordination and contention among a large number of nodes calls for grant-free approaches, exploiting physical layer techniques to solve collisions. Successive Interference Cancellation (SIC) is becoming a key building block of multiple access channel receiver, in an effort to support massive Internet of Things (IoT). In this paper, we explore the large-scale performance…
▽ More
Limitation of the cost of coordination and contention among a large number of nodes calls for grant-free approaches, exploiting physical layer techniques to solve collisions. Successive Interference Cancellation (SIC) is becoming a key building block of multiple access channel receiver, in an effort to support massive Internet of Things (IoT). In this paper, we explore the large-scale performance of SIC in a theoretical framework. A general model of a SIC receiver is stated for a shared channel with $n$ transmitters. The asymptotic sum-rate performance is characterized as $n \rightarrow \infty$, for a suitably scaled target Signal to Noise Interference Ratio (SNIR). The probability distribution of the number of correctly decoded packets is shown to tend to a deterministic distribution asymptotically for large values of $n$. The asymptotic analysis is carried out for any probability distribution of the wireless channel gain, assuming that the average received power level is same for all nodes, through power control.
△ Less
Submitted 27 June, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Conditional computation in neural networks: principles and research trends
Authors:
Simone Scardapane,
Alessandro Baiocchi,
Alessio Devoto,
Valerio Marsocci,
Pasquale Minervini,
Jary Pomponi
Abstract:
This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), a…
▽ More
This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.
△ Less
Submitted 8 July, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Adaptive Point Transformer
Authors:
Alessandro Baiocchi,
Indro Spinelli,
Alessandro Nicolosi,
Simone Scardapane
Abstract:
The recent surge in 3D data acquisition has spurred the development of geometric deep learning models for point cloud processing, boosted by the remarkable success of transformers in natural language processing. While point cloud transformers (PTs) have achieved impressive results recently, their quadratic scaling with respect to the point cloud size poses a significant scalability challenge for r…
▽ More
The recent surge in 3D data acquisition has spurred the development of geometric deep learning models for point cloud processing, boosted by the remarkable success of transformers in natural language processing. While point cloud transformers (PTs) have achieved impressive results recently, their quadratic scaling with respect to the point cloud size poses a significant scalability challenge for real-world applications. To address this issue, we propose the Adaptive Point Cloud Transformer (AdaPT), a standard PT model augmented by an adaptive token selection mechanism. AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds. Furthermore, we introduce a budget mechanism to flexibly adjust the computational cost of the model at inference time without the need for retraining or fine-tuning separate models. Our extensive experimental evaluation on point cloud classification tasks demonstrates that AdaPT significantly reduces computational complexity while maintaining competitive accuracy compared to standard PTs. The code for AdaPT is made publicly available.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.