-
Analyzing Common Electronic Structure Theory Algorithms for Distributed Quantum Computing
Authors:
Grier M. Jones,
Hans-Arno Jacobsen
Abstract:
To move towards the utility era of quantum computing, many corporations have posed distributed quantum computing (DQC) as a framework for scaling the current generation of devices for practical applications. One of these applications is quantum chemistry, also known as electronic structure theory, which has been poised as a "killer application" of quantum computing, To this end, we analyze five el…
▽ More
To move towards the utility era of quantum computing, many corporations have posed distributed quantum computing (DQC) as a framework for scaling the current generation of devices for practical applications. One of these applications is quantum chemistry, also known as electronic structure theory, which has been poised as a "killer application" of quantum computing, To this end, we analyze five electronic structure methods, found in common packages such as Tequila and ffsim, which can be easily interfaced with the Qiskit Circuit Cutting addon. Herein, we provide insights into cutting these algorithms using local operations (LO) to determine their aptitude for distribution. The key findings of our work are that many of these algorithms cannot be efficiently parallelized using LO, and new methods must be developed to apply electronic structure theory within a DQC framework.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees
Authors:
Herbert Woisetschläger,
Ryan Zhang,
Shiqiang Wang,
Hans-Arno Jacobsen
Abstract:
Open-weight LLM zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing int…
▽ More
Open-weight LLM zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of 2x cost savings compared to existing LLM routing techniques.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Beyond Performance: Measuring the Environmental Impact of Analytical Databases
Authors:
Michail Bachras,
Hans-Arno Jacobsen
Abstract:
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked. This paper presents ATLAS, a comprehensive methodology for measuring and quantifying the environmental footprint of analytical database systems, considering both operational impacts…
▽ More
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked. This paper presents ATLAS, a comprehensive methodology for measuring and quantifying the environmental footprint of analytical database systems, considering both operational impacts and manufacturing costs of hardware components. Through extensive empirical evaluation of four distinct database architectures (DuckDB, MonetDB, Hyper, and StarRocks), we uncover how fundamental architectural decisions affect environmental efficiency. Our findings reveal that environmental considerations in database operations are multifaceted, encompassing both immediate operational impacts and long-term sustainability implications. We demonstrate that architectural choices can significantly influence both power consumption and environmental sustainability, while deployment location emerges as a critical factor that can amplify or diminish these architectural advantages.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Decentralization in PoS Blockchain Consensus: Quantification and Advancement
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Decentralization is a foundational principle of permissionless blockchains, with consensus mechanisms serving a critical role in its realization. This study quantifies the decentralization of consensus mechanisms in proof-of-stake (PoS) blockchains using a comprehensive set of metrics, including Nakamoto coefficients, Gini, Herfindahl Hirschman Index (HHI), Shapley values, and Zipfs coefficient. O…
▽ More
Decentralization is a foundational principle of permissionless blockchains, with consensus mechanisms serving a critical role in its realization. This study quantifies the decentralization of consensus mechanisms in proof-of-stake (PoS) blockchains using a comprehensive set of metrics, including Nakamoto coefficients, Gini, Herfindahl Hirschman Index (HHI), Shapley values, and Zipfs coefficient. Our empirical analysis across ten prominent blockchains reveals significant concentration of stake among a few validators, posing challenges to fair consensus. To address this, we introduce two alternative weighting models for PoS consensus: Square Root Stake Weight (SRSW) and Logarithmic Stake Weight (LSW), which adjust validator influence through non-linear transformations. Results demonstrate that SRSW and LSW models improve decentralization metrics by an average of 51% and 132%, respectively, supporting more equitable and resilient blockchain systems.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning
Authors:
Shivesh Prakash,
Hans-Arno Jacobsen,
Viki Kumar Prasad
Abstract:
We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based o…
▽ More
We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.
△ Less
Submitted 3 April, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
Cabinet: Dynamically Weighted Consensus Made Fast
Authors:
Gengrui Zhang,
Shiquan Zhang,
Michail Bachras,
Yuqiu Zhang,
Hans-Arno Jacobsen
Abstract:
Conventional consensus algorithms, such as Paxos and Raft, encounter inefficiencies when applied to large-scale distributed systems due to the requirement of waiting for replies from a majority of nodes. To address these challenges, we propose Cabinet, a novel consensus algorithm that introduces dynamically weighted consensus, allocating distinct weights to nodes based on any given failure thresho…
▽ More
Conventional consensus algorithms, such as Paxos and Raft, encounter inefficiencies when applied to large-scale distributed systems due to the requirement of waiting for replies from a majority of nodes. To address these challenges, we propose Cabinet, a novel consensus algorithm that introduces dynamically weighted consensus, allocating distinct weights to nodes based on any given failure thresholds. Cabinet dynamically adjusts nodes' weights according to their responsiveness, assigning higher weights to faster nodes. The dynamic weight assignment maintains an optimal system performance, especially in large-scale and heterogeneous systems where node responsiveness varies. We evaluate Cabinet against Raft with distributed MongoDB and PostgreSQL databases using YCSB and TPC-C workloads. The evaluation results show that Cabinet outperforms Raft in throughput and latency under increasing system scales, complex networks, and failures in both homogeneous and heterogeneous clusters, offering a promising high-performance consensus solution.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Authors:
Jiahui Geng,
Qing Li,
Herbert Woisetschlaeger,
Zongxiong Chen,
Fengyu Cai,
Yuxia Wang,
Preslav Nakov,
Hans-Arno Jacobsen,
Fakhri Karray
Abstract:
This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest,…
▽ More
This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field.
△ Less
Submitted 31 May, 2025; v1 submitted 22 February, 2025;
originally announced March 2025.
-
Adversarial Robustness of Partitioned Quantum Classifiers
Authors:
Pouya Kananian,
Hans-Arno Jacobsen
Abstract:
Adversarial robustness in quantum classifiers is a critical area of study, providing insights into their performance compared to classical models and uncovering potential advantages inherent to quantum machine learning. In the NISQ era of quantum computing, circuit cutting is a notable technique for simulating circuits that exceed the qubit limitations of current devices, enabling the distribution…
▽ More
Adversarial robustness in quantum classifiers is a critical area of study, providing insights into their performance compared to classical models and uncovering potential advantages inherent to quantum machine learning. In the NISQ era of quantum computing, circuit cutting is a notable technique for simulating circuits that exceed the qubit limitations of current devices, enabling the distribution of a quantum circuit's execution across multiple quantum processing units through classical communication. We examine how partitioning quantum classifiers through circuit cutting increase their susceptibility to adversarial attacks, establishing a link between attacking the state preparation channels in wire cutting and implementing adversarial gates within intermediate layers of a quantum classifier. We then proceed to study the latter problem from both a theoretical and experimental perspective.
△ Less
Submitted 28 January, 2025;
originally announced February 2025.
-
WaveGAS: Waveform Relaxation for Scaling Graph Neural Networks
Authors:
Jana Vatter,
Mykhaylo Zayats,
Marcos Martínez Galindo,
Vanessa López,
Ruben Mayer,
Hans-Arno Jacobsen,
Hoang Thanh Lam
Abstract:
With the ever-growing size of real-world graphs, numerous techniques to overcome resource limitations when training Graph Neural Networks (GNNs) have been developed. One such approach, GNNAutoScale (GAS), uses graph partitioning to enable training under constrained GPU memory. GAS also stores historical embedding vectors, which are retrieved from one-hop neighbors in other partitions, ensuring cri…
▽ More
With the ever-growing size of real-world graphs, numerous techniques to overcome resource limitations when training Graph Neural Networks (GNNs) have been developed. One such approach, GNNAutoScale (GAS), uses graph partitioning to enable training under constrained GPU memory. GAS also stores historical embedding vectors, which are retrieved from one-hop neighbors in other partitions, ensuring critical information is captured across partition boundaries. The historical embeddings which come from the previous training iteration are stale compared to the GAS estimated embeddings, resulting in approximation errors of the training algorithm. Furthermore, these errors accumulate over multiple layers, leading to suboptimal node embeddings. To address this shortcoming, we propose two enhancements: first, WaveGAS, inspired by waveform relaxation, performs multiple forward passes within GAS before the backward pass, refining the approximation of historical embeddings and gradients to improve accuracy; second, a gradient-tracking method that stores and utilizes more accurate historical gradients during training. Empirical results show that WaveGAS enhances GAS and achieves better accuracy, even outperforming methods that train on full graphs, thanks to its robust estimation of node embeddings.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Authors:
Daouda Sow,
Herbert Woisetschläger,
Saikiran Bulusu,
Shiqiang Wang,
Hans-Arno Jacobsen,
Yingbin Liang
Abstract:
Pretraining large language models (LLMs) on vast and heterogeneous datasets is crucial for achieving state-of-the-art performance across diverse downstream tasks. However, current training paradigms treat all samples equally, overlooking the importance or relevance of individual samples throughout the training process. Existing reweighting strategies, which primarily focus on group-level data impo…
▽ More
Pretraining large language models (LLMs) on vast and heterogeneous datasets is crucial for achieving state-of-the-art performance across diverse downstream tasks. However, current training paradigms treat all samples equally, overlooking the importance or relevance of individual samples throughout the training process. Existing reweighting strategies, which primarily focus on group-level data importance, fail to leverage fine-grained instance-level information and do not adapt dynamically to individual sample importance as training progresses. In this paper, we introduce novel algorithms for dynamic, instance-level data reweighting aimed at improving both the efficiency and effectiveness of LLM pretraining. Our methods adjust the weight of each training sample based on its loss value in an online fashion, allowing the model to dynamically focus on more informative or important samples at the current training stage. In particular, our framework allows us to systematically devise reweighting strategies deprioritizing redundant or uninformative data, which we find tend to work best. Furthermore, we develop a new theoretical framework for analyzing the impact of loss-based reweighting on the convergence of gradient-based optimization, providing the first formal characterization of how these strategies affect convergence bounds. We empirically validate our approach across a spectrum of tasks, from pretraining 7B and 1.4B parameter LLMs to smaller-scale language models and linear regression problems, demonstrating that our loss-based reweighting approach can lead to faster convergence and significantly improved performance.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
MESS+: Energy-Optimal Inferencing in Language Model Zoos with Service Level Guarantees
Authors:
Ryan Zhang,
Herbert Woisetschläger,
Shiqiang Wang,
Hans Arno Jacobsen
Abstract:
Open-weight large language model (LLM) zoos allow users to quickly integrate state-of-the-art models into systems. Despite increasing availability, selecting the most appropriate model for a given task still largely relies on public benchmark leaderboards and educated guesses. This can be unsatisfactory for both inference service providers and end users, where the providers usually prioritize cost…
▽ More
Open-weight large language model (LLM) zoos allow users to quickly integrate state-of-the-art models into systems. Despite increasing availability, selecting the most appropriate model for a given task still largely relies on public benchmark leaderboards and educated guesses. This can be unsatisfactory for both inference service providers and end users, where the providers usually prioritize cost efficiency, while the end users usually prioritize model output quality for their inference requests. In commercial settings, these two priorities are often brought together in Service Level Agreements (SLA). We present MESS+, an online stochastic optimization algorithm for energy-optimal model selection from a model zoo, which works on a per-inference-request basis. For a given SLA that requires high accuracy, we are up to 2.5x more energy efficient with MESS+ than with randomly selecting an LLM from the zoo while maintaining SLA quality constraints.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study
Authors:
Nikolai Merkel,
Pierre Toussing,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Graph neural networks (GNNs) are a type of neural network capable of learning on graph-structured data. However, training GNNs on large-scale graphs is challenging due to iterative aggregations of high-dimensional features from neighboring vertices within sparse graph structures combined with neural network operations. The sparsity of graphs frequently results in suboptimal memory access patterns…
▽ More
Graph neural networks (GNNs) are a type of neural network capable of learning on graph-structured data. However, training GNNs on large-scale graphs is challenging due to iterative aggregations of high-dimensional features from neighboring vertices within sparse graph structures combined with neural network operations. The sparsity of graphs frequently results in suboptimal memory access patterns and longer training time. Graph reordering is an optimization strategy aiming to improve the graph data layout. It has shown to be effective to speed up graph analytics workloads, but its effect on the performance of GNN training has not been investigated yet. The generalization of reordering to GNN performance is nontrivial, as multiple aspects must be considered: GNN hyper-parameters such as the number of layers, the number of hidden dimensions, and the feature size used in the GNN model, neural network operations, large intermediate vertex states, and GPU acceleration.
In our work, we close this gap by performing an empirical evaluation of 12 reordering strategies in two state-of-the-art GNN systems, PyTorch Geometric and Deep Graph Library. Our results show that graph reordering is effective in reducing training time for CPU- and GPU-based training, respectively. Further, we find that GNN hyper-parameters influence the effectiveness of reordering, that reordering metrics play an important role in selecting a reordering strategy, that lightweight reordering performs better for GPU-based than for CPU-based training, and that invested reordering time can in many cases be amortized.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis
Authors:
Herbert Woisetschläger,
Simon Mertel,
Christoph Krönke,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data…
▽ More
The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition
Authors:
Kai Shao,
Rui Wang,
Yixue Hao,
Long Hu,
Min Chen,
Hans Arno Jacobsen
Abstract:
Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal…
▽ More
Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.
△ Less
Submitted 25 June, 2024; v1 submitted 22 June, 2024;
originally announced June 2024.
-
Should my Blockchain Learn to Drive? A Study of Hyperledger Fabric
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Similar to other transaction processing frameworks, blockchain systems need to be dynamically reconfigured to adapt to varying workloads and changes in network conditions. However, achieving optimal reconfiguration is particularly challenging due to the complexity of the blockchain stack, which has diverse configurable parameters. This paper explores the concept of self-driving blockchains, which…
▽ More
Similar to other transaction processing frameworks, blockchain systems need to be dynamically reconfigured to adapt to varying workloads and changes in network conditions. However, achieving optimal reconfiguration is particularly challenging due to the complexity of the blockchain stack, which has diverse configurable parameters. This paper explores the concept of self-driving blockchains, which have the potential to predict workload changes and reconfigure themselves for optimal performance without human intervention. We compare and contrast our discussions with existing research on databases and highlight aspects unique to blockchains. We identify specific parameters and components in Hyperledger Fabric, a popular permissioned blockchain system, that are suitable for autonomous adaptation and offer potential solutions for the challenges involved. Further, we implement three demonstrative locally autonomous systems, each targeting a different layer of the blockchain stack, and conduct experiments to understand the feasibility of our findings. Our experiments indicate up to 11% improvement in success throughput and a 30% decrease in latency, making this a significant step towards implementing a fully autonomous blockchain system in the future.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Federated Computing -- Survey on Building Blocks, Extensions and Systems
Authors:
René Schwermer,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
In response to the increasing volume and sensitivity of data, traditional centralized computing models face challenges, such as data security breaches and regulatory hurdles. Federated Computing (FC) addresses these concerns by enabling collaborative processing without compromising individual data privacy. This is achieved through a decentralized network of devices, each retaining control over its…
▽ More
In response to the increasing volume and sensitivity of data, traditional centralized computing models face challenges, such as data security breaches and regulatory hurdles. Federated Computing (FC) addresses these concerns by enabling collaborative processing without compromising individual data privacy. This is achieved through a decentralized network of devices, each retaining control over its data, while participating in collective computations. The motivation behind FC extends beyond technical considerations to encompass societal implications. As the need for responsible AI and ethical data practices intensifies, FC aligns with the principles of user empowerment and data sovereignty. FC comprises of Federated Learning (FL) and Federated Analytics (FA). FC systems became more complex over time and they currently lack a clear definition and taxonomy describing its moving pieces. Current surveys capture domain-specific FL use cases, describe individual components in an FC pipeline individually or decoupled from each other, or provide a quantitative overview of the number of published papers. This work surveys more than 150 papers to distill the underlying structure of FC systems with their basic building blocks, extensions, architecture, environment, and motivation. We capture FL and FA systems individually and point out unique difference between those two.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Federated Learning Priorities Under the European Union Artificial Intelligence Act
Authors:
Herbert Woisetschläger,
Alexander Erben,
Bill Marino,
Shiqiang Wang,
Nicholas D. Lane,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL tow…
▽ More
The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL toward mainstream adoption. However, this can only occur if the FL community reprioritizes its research focus. In our position paper, we perform a first-of-its-kind interdisciplinary analysis (legal and ML) of the impact the AI Act may have on FL and make a series of observations supporting our primary position through quantitative and qualitative analysis. We explore data governance issues and the concern for privacy. We establish new challenges regarding performance and energy efficiency within lifecycle monitoring. Taken together, our analysis suggests there is a sizable opportunity for FL to become a crucial component of AI Act-compliant ML systems and for the new regulation to drive the adoption of FL techniques in general. Most noteworthy are the opportunities to defend against data bias and enhance private and secure computation
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Choosing a Classical Planner with Graph Neural Networks
Authors:
Jana Vatter,
Ruben Mayer,
Hans-Arno Jacobsen,
Horst Samulowitz,
Michael Katz
Abstract:
Online planner selection is the task of choosing a solver out of a predefined set for a given planning problem. As planning is computationally hard, the performance of solvers varies greatly on planning problems. Thus, the ability to predict their performance on a given problem is of great importance. While a variety of learning methods have been employed, for classical cost-optimal planning the p…
▽ More
Online planner selection is the task of choosing a solver out of a predefined set for a given planning problem. As planning is computationally hard, the performance of solvers varies greatly on planning problems. Thus, the ability to predict their performance on a given problem is of great importance. While a variety of learning methods have been employed, for classical cost-optimal planning the prevailing approach uses Graph Neural Networks (GNNs). In this work, we continue the line of work on using GNNs for online planner selection. We perform a thorough investigation of the impact of the chosen GNN model, graph representation and node features, as well as prediction task. Going further, we propose using the graph representation obtained by a GNN as an input to the Extreme Gradient Boosting (XGBoost) model, resulting in a more resource-efficient yet accurate approach. We show the effectiveness of a variety of GNN-based online planner selection methods, opening up new exciting avenues for research on online planner selection.
△ Less
Submitted 25 January, 2024;
originally announced February 2024.
-
A Survey on Efficient Federated Learning Methods for Foundation Model Training
Authors:
Herbert Woisetschläger,
Alexander Isenko,
Shiqiang Wang,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications.…
▽ More
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. Typically, FMs have already been pre-trained across a wide variety of tasks and can be fine-tuned to specific downstream tasks over significantly smaller datasets than required for full model training. However, access to such datasets is often challenging. By its design, FL can help to open data silos. With this survey, we introduce a novel taxonomy focused on computational and communication efficiency, the vital elements to make use of FMs in FL systems. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications, elaborate on the readiness of FL frameworks to work with FMs, and provide future research opportunities on how to evaluate generative models in FL as well as the interplay of privacy and PEFT.
△ Less
Submitted 5 September, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
How Does Stake Distribution Influence Consensus? Analyzing Blockchain Decentralization
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among…
▽ More
In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among validators, underscoring the need for an equitable approach. To counter this, we introduce the Square Root Stake Weight (SRSW) model, which effectively recalibrates staking weight distribution. Our examination of the SRSW model demonstrates notable improvements in the decentralization metrics: the Gini index improves by 37.16% on average, while Nakamoto coefficients for liveness and safety see mean enhancements of 101.04% and 80.09%, respectively. This research is a pivotal step toward a more fair and equitable distribution of staking weight, advancing the decentralization in blockchain consensus mechanisms.
△ Less
Submitted 20 May, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
An End-to-End Performance Comparison of Seven Permissioned Blockchain Systems
Authors:
Frank Christian Geyer,
Hans-Arno Jacobsen,
Ruben Mayer,
Peter Mandl
Abstract:
The emergence of more and more blockchain solutions with innovative approaches to optimising performance, scalability, privacy and governance complicates performance analysis. Reasons for the difficulty of benchmarking blockchains include, for example, the high number of system parameters to configure and the effort to deploy a blockchain network. In addition, performance data, which mostly comes…
▽ More
The emergence of more and more blockchain solutions with innovative approaches to optimising performance, scalability, privacy and governance complicates performance analysis. Reasons for the difficulty of benchmarking blockchains include, for example, the high number of system parameters to configure and the effort to deploy a blockchain network. In addition, performance data, which mostly comes from system vendors, is often intransparent. We investigate and evaluate the performance of seven permissioned blockchain systems using different parameter settings in a reproducible manner. We employ an end-to-end approach, where the clients sending the transactions are fully involved in the data collection approach. Our results highlight the peculiarities and limitations of the systems under investigation. Due to the insights given, our work forms the basis for continued research to optimise the performance of blockchain systems.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
FabricCRDT: A Conflict-Free Replicated Datatypes Approach to Permissioned Blockchains
Authors:
Pezhman Nasirifard,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate (EOV), is a potential scalability bottleneck. The…
▽ More
With the increased adaption of blockchain technologies, permissioned blockchains such as Hyperledger Fabric provide a robust ecosystem for developing production-grade decentralized applications. However, the additional latency between executing and committing transactions, due to Fabric's three-phase transaction lifecycle of Execute-Order-Validate (EOV), is a potential scalability bottleneck. The added latency increases the probability of concurrent updates on the same keys by different transactions, leading to transaction failures caused by Fabric's concurrency control mechanism. The transaction failures increase the application development complexity and decrease Fabric's throughput. Conflict-free Replicated Datatypes (CRDTs) provide a solution for merging and resolving conflicts in the presence of concurrent updates. In this work, we introduce FabricCRDT, an approach for integrating CRDTs to Fabric. Our evaluations show that in general, FabricCRDT offers higher throughput of successful transactions than Fabric, while successfully committing and merging all conflicting transactions without any failures.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
Authors:
Herbert Woisetschläger,
Alexander Isenko,
Shiqiang Wang,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing reso…
▽ More
Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications.
This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
△ Less
Submitted 2 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Lifting the Fog of Uncertainties: Dynamic Resource Orchestration for the Containerized Cloud
Authors:
Yuqiu Zhang,
Tongkun Zhang,
Gengrui Zhang,
Hans-Arno Jacobsen
Abstract:
The advances in virtualization technologies have sparked a growing transition from virtual machine (VM)-based to container-based infrastructure for cloud computing. From the resource orchestration perspective, containers' lightweight and highly configurable nature not only enables opportunities for more optimized strategies, but also poses greater challenges due to additional uncertainties and a l…
▽ More
The advances in virtualization technologies have sparked a growing transition from virtual machine (VM)-based to container-based infrastructure for cloud computing. From the resource orchestration perspective, containers' lightweight and highly configurable nature not only enables opportunities for more optimized strategies, but also poses greater challenges due to additional uncertainties and a larger configuration parameter search space. Towards this end, we propose Drone, a resource orchestration framework that adaptively configures resource parameters to improve application performance and reduce operational cost in the presence of cloud uncertainties. Built on Contextual Bandit techniques, Drone is able to achieve a balance between performance and resource cost on public clouds, and optimize performance on private clouds where a hard resource constraint is present. We show that our algorithms can achieve sub-linear growth in cumulative regret, a theoretically sound convergence guarantee, and our extensive experiments show that Drone achieves an up to 45% performance improvement and a 20% resource footprint reduction across batch processing jobs and microservice workloads.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training
Authors:
Nikolai Merkel,
Daniel Stoll,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Recently, graph neural networks (GNNs) have gained much attention as a growing area of deep learning capable of learning on graph-structured data. However, the computational and memory requirements for training GNNs on large-scale graphs make it necessary to distribute the training. A prerequisite for distributed GNN training is to partition the input graph into smaller parts that are distributed…
▽ More
Recently, graph neural networks (GNNs) have gained much attention as a growing area of deep learning capable of learning on graph-structured data. However, the computational and memory requirements for training GNNs on large-scale graphs make it necessary to distribute the training. A prerequisite for distributed GNN training is to partition the input graph into smaller parts that are distributed among multiple machines of a compute cluster. Although graph partitioning has been studied with regard to graph analytics and graph databases, its effect on GNN training performance is largely unexplored. As a consequence, it is unclear whether investing computational efforts into high-quality graph partitioning would pay off in GNN training scenarios.
In this paper, we study the effectiveness of graph partitioning for distributed GNN training. Our study aims to understand how different factors such as GNN parameters, mini-batch size, graph type, features size, and scale-out factor influence the effectiveness of graph partitioning. We conduct experiments with two different GNN systems using vertex and edge partitioning. We found that high-quality graph partitioning is a very effective optimization to speed up GNN training and to reduce memory consumption. Furthermore, our results show that invested partitioning time can quickly be amortized by reduced GNN training time, making it a relevant optimization for most GNN scenarios. Compared to research on distributed graph processing, our study reveals that graph partitioning plays an even more significant role in distributed GNN training, which motivates further research on the graph partitioning problem.
△ Less
Submitted 12 August, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
PrestigeBFT: Revolutionizing View Changes in BFT Consensus Algorithms with Reputation Mechanisms
Authors:
Gengrui Zhang,
Fei Pan,
Sofia Tijanic,
Hans-Arno Jacobsen
Abstract:
This paper proposes PrestigeBFT, a novel leader-based BFT consensus algorithm that addresses the weaknesses of passive view-change protocols. Passive protocols blindly rotate leadership among servers on a predefined schedule, potentially selecting unavailable or slow servers as leaders. PrestigeBFT proposes an active view-change protocol using reputation mechanisms that calculate a server's potent…
▽ More
This paper proposes PrestigeBFT, a novel leader-based BFT consensus algorithm that addresses the weaknesses of passive view-change protocols. Passive protocols blindly rotate leadership among servers on a predefined schedule, potentially selecting unavailable or slow servers as leaders. PrestigeBFT proposes an active view-change protocol using reputation mechanisms that calculate a server's potential correctness based on historic behavior. The active protocol enables servers to campaign for leadership by performing reputation-associated work. As such, up-to-date and correct servers with good reputations are more likely to be elected as leaders as they perform less work, whereas faulty servers with bad reputations are suppressed from becoming leaders by being required to perform more work. Under normal operation, PrestigeBFT achieves 5X higher throughput than the baseline that uses passive view-change protocols. In addition, PrestigeBFT remains unaffected under benign faults and experiences only a 24% drop in throughput under a variety of Byzantine faults, while the baseline throughput drops by 62% and 69%, respectively.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems
Authors:
Herbert Woisetschläger,
Alexander Erben,
Ruben Mayer,
Shiqiang Wang,
Hans-Arno Jacobsen
Abstract:
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. Heterogeneous hardware, unreliable client devices, and energy constraints often characterize edge computing systems. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities. We focus on com…
▽ More
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. Heterogeneous hardware, unreliable client devices, and energy constraints often characterize edge computing systems. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities. We focus on computational and communication bottlenecks, client behavior, and data security implications. Our experiments with models varying from 14K to 80M trainable parameters are carried out on dedicated hardware with emulated network characteristics and client behavior. We find that state-of-the-art embedded hardware has significant memory bottlenecks, leading to 4x longer processing times than on modern data center GPUs.
△ Less
Submitted 4 November, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Authors:
Alexander Erben,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP, and ASR models. To expand the current training options fur…
▽ More
This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP, and ASR models. To expand the current training options further, we compare the scalability potential for hybrid-cloud scenarios by adding cloud resources to on-premise hardware to improve training throughput. Finally, we show how leveraging spot instance pricing enables a new cost-efficient way to train models with multiple cheap VMs, trumping both more centralized and powerful hardware and even on-demand cloud offerings at competitive prices.
△ Less
Submitted 2 June, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Analyzing Geospatial Distribution in Blockchains
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as min…
▽ More
Blockchains are decentralized; are they genuinely? We analyze blockchain decentralization's often-overlooked but quantifiable dimension: geospatial distribution of transaction processing. Blockchains bring with them the potential for geospatially distributed transaction processing. They enable validators from geospatially distant locations to partake in consensus protocols; we refer to them as minority validators. Based on our observations, in practice, most validators are often geographically concentrated in close proximity. Furthermore, we observed that minority validators tend not to meet the performance requirements, often misidentified as crash failures. Consequently, they are subject to punishment by jailing (removal from the validator set) and/or slashing (penalty in native tokens). Our emulations, under controlled conditions, demonstrate the same results, raising serious concerns about the potential for the geospatial centralization of validators. To address this, we developed a solution that easily integrates with consensus protocols, and we demonstrated its effectiveness.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey
Authors:
Jana Vatter,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry.…
▽ More
Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry. With the rapid growing size of real world graphs, the need for efficient and scalable GNN training solutions has come. Consequently, many works proposing GNN systems have emerged throughout the past few years. However, there is an acute lack of overview, categorization and comparison of such systems. We aim to fill this gap by summarizing and categorizing important methods and techniques for large-scale GNN solutions. In addition, we establish connections between GNN systems, graph processing systems and DL systems.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Partitioner Selection with EASE to Optimize Distributed Graph Processing
Authors:
Nikolai Merkel,
Ruben Mayer,
Tawkir Ahmed Fakir,
Hans-Arno Jacobsen
Abstract:
For distributed graph processing on massive graphs, a graph is partitioned into multiple equally-sized parts which are distributed among machines in a compute cluster. In the last decade, many partitioning algorithms have been developed which differ from each other with respect to the partitioning quality, the run-time of the partitioning and the type of graph for which they work best. The plethor…
▽ More
For distributed graph processing on massive graphs, a graph is partitioned into multiple equally-sized parts which are distributed among machines in a compute cluster. In the last decade, many partitioning algorithms have been developed which differ from each other with respect to the partitioning quality, the run-time of the partitioning and the type of graph for which they work best. The plethora of graph partitioning algorithms makes it a challenging task to select a partitioner for a given scenario. Different studies exist that provide qualitative insights into the characteristics of graph partitioning algorithms that support a selection. However, in order to enable automatic selection, a quantitative prediction of the partitioning quality, the partitioning run-time and the run-time of subsequent graph processing jobs is needed. In this paper, we propose a machine learning-based approach to provide such a quantitative prediction for different types of edge partitioning algorithms and graph processing workloads. We show that training based on generated graphs achieves high accuracy, which can be further improved when using real-world data. Based on the predictions, the automatic selection reduces the end-to-end run-time on average by 11.1% compared to a random selection, by 17.4% compared to selecting the partitioner that yields the lowest cut size, and by 29.1% compared to the worst strategy, respectively. Furthermore, in 35.7% of the cases, the best strategy was selected.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Diba: A Re-configurable Stream Processor
Authors:
Mohammadreza Najafi,
Thamir M. Qadah,
Mohammad Sadoghi,
Hans-Arno Jacobsen
Abstract:
Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable str…
▽ More
Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable stream processor (Diba), a complete rethinking of a previously proposed customized and flexible query processor that targets real-time stream processing. Diba uses a unidirectional dataflow not dedicated to any specific type of query (operator) on streams, allowing a straightforward placement of processing components on a general data path that facilitates query mapping. In Diba, the concepts of the distribution network and processing components are implemented as two separate entities connected using generic interfaces. This approach allows the adoption of a versatile architecture for a family of queries rather than forcing a rigid chain of processing components to implement such queries. Our experimental evaluations of representative queries from TPC-H yielded processing times of 300, 1220, and 3520 milliseconds for data streams with scale factor sizes of one, four, and ten gigabytes, respectively.
△ Less
Submitted 27 August, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
V-Guard: An Efficient Permissioned Blockchain for Achieving Consensus under Dynamic Memberships in V2X
Authors:
Gengrui Zhang,
Yunhao Mao,
Shiquan Zhang,
Shashank Motepalli,
Fei Pan,
Hans-Arno Jacobsen
Abstract:
This paper presents V-Guard, a new permissioned blockchain that achieves consensus for vehicular data under changing memberships, targeting the problem in V2X networks where vehicles are often intermittently connected on the roads. To achieve this goal, V-Guard integrates membership management into the consensus process for agreeing on data entries. It binds a data entry with a membership configur…
▽ More
This paper presents V-Guard, a new permissioned blockchain that achieves consensus for vehicular data under changing memberships, targeting the problem in V2X networks where vehicles are often intermittently connected on the roads. To achieve this goal, V-Guard integrates membership management into the consensus process for agreeing on data entries. It binds a data entry with a membership configuration profile that describes responsible vehicles for achieving consensus for the data entry. As such, V-Guard produces chained consensus results of both data entries and their residing membership profiles, which enables consensus to be achieved seamlessly under changing memberships. In addition, V-Guard separates the ordering of transactions from consensus, allowing concurrent ordering instances and periodic consensus instances to order and commit data entries. These features make V-Guard efficient for achieving consensus under dynamic memberships with high throughput and latency performance.
△ Less
Submitted 3 April, 2023; v1 submitted 15 January, 2023;
originally announced January 2023.
-
How To Optimize My Blockchain? A Multi-Level Recommendation Approach
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Aside from the conception of new blockchain architectures, existing blockchain optimizations in the literature primarily focus on system or data-oriented optimizations within prevailing blockchains. However, since blockchains handle multiple aspects ranging from organizational governance to smart contract design, a holistic approach that encompasses all the different layers of a given blockchain s…
▽ More
Aside from the conception of new blockchain architectures, existing blockchain optimizations in the literature primarily focus on system or data-oriented optimizations within prevailing blockchains. However, since blockchains handle multiple aspects ranging from organizational governance to smart contract design, a holistic approach that encompasses all the different layers of a given blockchain system is required to ensure that all optimization opportunities are taken into consideration. In this vein, we define a multi-level optimization recommendation approach that identifies optimization opportunities within a blockchain at the system, data, and user level. Multiple metrics and attributes are derived from a blockchain log and nine optimization recommendations are formalized. We implement an automated optimization recommendation tool, BlockOptR, based on these concepts. The system is extensively evaluated with a wide range of workloads covering multiple real-world scenarios. After implementing the recommended optimizations, we observe an average of 20% improvement in the success rate of transactions and an average of 40% improvement in latency.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
A Serverless Publish/Subscribe System
Authors:
Pezhman Nasirifard,
Hans-Arno Jacobsen
Abstract:
Operating a scalable and reliable server application, such as publish/subscribe (pub/sub) systems, requires tremendous development efforts and resources. The emerging serverless paradigm simplifies the development and deployment of highly available applications by delegating most operational concerns to the cloud providers. The serverless paradigm describes a programming model where the developers…
▽ More
Operating a scalable and reliable server application, such as publish/subscribe (pub/sub) systems, requires tremendous development efforts and resources. The emerging serverless paradigm simplifies the development and deployment of highly available applications by delegating most operational concerns to the cloud providers. The serverless paradigm describes a programming model where the developers break the application downs into smaller microservices that run on the cloud in response to events. This paper proposes designing a serverless pub/sub system based on the IBM Bluemix cloud platform. Our pub/sub system performs topic-based, content-based, and function-based matchings. The function-based matching is a novel matching approach where the subscribers can define a highly customizable subscription function that the broker applies to the publications in the cloud. The evaluations of the designed system verify the practicality of the designed system. However, the vendor-specific constraints of the IBM Bluemix resources are a bottleneck to the scalability of the broker.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
i13DR: A Real-Time Demand Response Infrastructure for Integrating Renewable Energy Resources
Authors:
Pezhman Nasirifard,
Hans-Arno Jacobsen
Abstract:
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy according to the changes and the availability of su…
▽ More
With the ongoing integration of Renewable Energy Sources (RES), the complexity of power grids is increasing. Due to the fluctuating nature of RES, ensuring the reliability of power grids can be challenging. One possible approach for addressing these challenges is Demand Response (DR) which is described as matching the demand for electrical energy according to the changes and the availability of supply. However, implementing a DR system to monitor and control a broad set of electrical appliances in real-time introduces several new complications, including ensuring the reliability and financial feasibility of the system. In this work, we address these issues by designing and implementing a distributed real-time DR infrastructure for laptops, which estimates and controls the power consumption of a network of connected laptops in response to the fast, irregular changes of RES. Furthermore, since our approach is entirely software-based, we dramatically reduce the initial costs of the demand side participants. The result of our field experiments confirms that our system successfully schedules and executes rapid and effective DR events. However, the accuracy of the estimated power consumption of all participating laptops is relatively low, directly caused by our software-based approach.
△ Less
Submitted 31 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
OrderlessChain: Do Permissioned Blockchains Need Total Global Order of Transactions?
Authors:
Pezhman Nasirifard,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Existing permissioned blockchains often rely on coordination-based consensus protocols to ensure the safe execution of applications in a Byzantine environment. Furthermore, these protocols serialize the transactions by ordering them into a total global order. The serializability preserves the correctness of the application's state stored on the blockchain. However, using coordination-based protoco…
▽ More
Existing permissioned blockchains often rely on coordination-based consensus protocols to ensure the safe execution of applications in a Byzantine environment. Furthermore, these protocols serialize the transactions by ordering them into a total global order. The serializability preserves the correctness of the application's state stored on the blockchain. However, using coordination-based protocols to attain the global order of transactions can limit the throughput and induce high latency. In contrast, application-level correctness requirements exist that are not dependent on the order of transactions, known as invariant-confluence (I-confluence). The I-confluent applications can execute in a coordination-free manner benefiting from the improved performance compared to the coordination-based approaches. The safety and liveness of I-confluent applications are studied in non-Byzantine environments, but the correct execution of such applications remains a challenge in Byzantine coordination-free environments. This work introduces OrderlessChain, a coordination-free permissioned blockchain for the safe and live execution of I-confluent applications in a Byzantine environment. We implemented a prototype of our system, and our evaluation results demonstrate that our coordination-free approach performs better than coordination-based blockchains.
△ Less
Submitted 24 October, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Representation Learning for Appliance Recognition: A Comparison to Classical Machine Learning
Authors:
Matthias Kahl,
Daniel Jorde,
Hans-Arno Jacobsen
Abstract:
Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driv…
▽ More
Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driven, hand-crafted feature extraction with hierarchical learning from many representations in raw data format. In this paper, we show how the NILM processing-chain can be improved, reduced in complexity and alternatively designed with recent deep learning algorithms. On the basis of an event-based appliance recognition approach, we evaluate seven different classification models: a classical machine learning approach that is based on a hand-crafted feature extraction, three different deep neural network architectures for automated feature extraction on raw waveform data, as well as three baseline approaches for raw data processing. We evaluate all approaches on two large-scale energy consumption datasets with more than 50,000 events of 44 appliances. We show that with the use of deep learning, we are able to reach and surpass the performance of the state-of-the-art classical machine learning approach for appliance recognition with an F-Score of 0.75 and 0.86 compared to 0.69 and 0.87 of the classical approach.
△ Less
Submitted 26 August, 2022;
originally announced September 2022.
-
The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick Data
Authors:
Sebastian Frischbier,
Jawad Tahir,
Christoph Doblander,
Arne Hormann,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
The DEBS Grand Challenge (GC) is an annual programming competition open to practitioners from both academia and industry. The GC 2022 edition focuses on real-time complex event processing of high-volume tick data provided by Infront Financial Technology GmbH. The goal of the challenge is to efficiently compute specific trend indicators and detect patterns in these indicators like those used by rea…
▽ More
The DEBS Grand Challenge (GC) is an annual programming competition open to practitioners from both academia and industry. The GC 2022 edition focuses on real-time complex event processing of high-volume tick data provided by Infront Financial Technology GmbH. The goal of the challenge is to efficiently compute specific trend indicators and detect patterns in these indicators like those used by real-life traders to decide on buying or selling in financial markets. The data set Trading Data used for benchmarking contains 289 million tick events from approximately 5500+ financial instruments that had been traded on the three major exchanges Amsterdam (NL), Paris (FR), and Frankfurt am Main (GER) over the course of a full week in 2021. The data set is made publicly available. In addition to correctness and performance, submissions must explicitly focus on reusability and practicability. Hence, participants must address specific nonfunctional requirements and are asked to build upon open-source platforms. This paper describes the required scenario and the data set Trading Data, defines the queries of the problem statement, and explains the enhancements made to the evaluation platform Challenger that handles data distribution, dynamic subscriptions, and remote evaluation of the submissions.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Towards Data-Driven Precision Agriculture using Open Data and Open Source Software
Authors:
Jacob Høxbroe Jeppesen,
Rune Hylsberg Jacobsen,
Rasmus Nyholm Jørgensen,
Thomas Skjødeberg Toftegaard
Abstract:
Information and communications technology (ICT) within the agricultural sector is characterized by a widespread use of proprietary data formats, a strong lack of interoperability standards, and a tight connection to specific hardware implementations resulting from vendor lock-in. This partly explains why ICT has not yet had its full impact within the domain. By utilizing the vast amount of publicl…
▽ More
Information and communications technology (ICT) within the agricultural sector is characterized by a widespread use of proprietary data formats, a strong lack of interoperability standards, and a tight connection to specific hardware implementations resulting from vendor lock-in. This partly explains why ICT has not yet had its full impact within the domain. By utilizing the vast amount of publicly available open data, ranging from topographic maps to multispectral satellite images, the economically and environmentally optimal farming practices can be advanced beyond state of the art. This paper addresses the potential of applying publicly available information sources to improve crop production, with emphasis on yield optimization. This potential is evaluated based on free public data for the growth season 2016 by examining winter wheat production for a selected region in Denmark. Data aggregation is performed by promoting opensource software tools as a foundation for decision support. That allows the farmer, or another domain expert, to query a certain crop type, merge this information with other data sets, and perform analysis on data ranging from sub-field analysis to statistics on national/regional scale. The registration of field polygons and sowed crop types for fields in Denmark, alongside with detailed geographic data and free satellite images, enable us to exploit publicly available data of high quality, which can be applied to perform further analysis.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Reaching Consensus in the Byzantine Empire: A Comprehensive Review of BFT Consensus Algorithms
Authors:
Gengrui Zhang,
Fei Pan,
Yunhao Mao,
Sofia Tijanic,
Michael Dang'ana,
Shashank Motepalli,
Shiquan Zhang,
Hans-Arno Jacobsen
Abstract:
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in the search for consensus solutions that offer high throughput, low latency, an…
▽ More
Byzantine fault-tolerant (BFT) consensus algorithms are at the core of providing safety and liveness guarantees for distributed systems that must operate in the presence of arbitrary failures. Recently, numerous new BFT algorithms have been proposed, not least due to the traction blockchain technologies have garnered in the search for consensus solutions that offer high throughput, low latency, and robust system designs. In this paper, we conduct a systematic survey of selected and distinguished BFT algorithms that have received extensive attention in academia and industry alike. We perform a qualitative comparison among all algorithms we review considering message and time complexities. Furthermore, we decompose each consensus algorithm into its constituent subprotocols for replication and view change backed by intuitive figures to illustrate the message-passing pattern. We also elaborate on the strengths and weaknesses of each algorithm as compared to the state-of-the-art approaches.
△ Less
Submitted 5 December, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Out-of-Core Edge Partitioning at Linear Run-Time
Authors:
Ruben Mayer,
Kamil Orujzade,
Hans-Arno Jacobsen
Abstract:
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low memory overhead. Exsisting out-of-core algorithms m…
▽ More
Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low memory overhead. Exsisting out-of-core algorithms mainly work in a streaming manner and can be grouped into two types. While \emph{stateless} streaming edge partitioning is fast and yields low partitioning quality, stateful streaming edge partitioning yields better quality, but is expensive, as it requires a scoring function to be evaluated for every edge on every partition, leading to a time complexity of $\mathcal{O}(|E|*k)$. In this paper, we propose 2PS-L, a novel out-of-core edge partitioning algorithm that builds upon the stateful streaming model, but achieves linear run-time (i.e., $\mathcal{O}(|E|)$). 2PS-L consists of two phases. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and vertex clustering from the first phase is exploited to reduce the search space of graph partitioning to only two target partitions for every edge. Our evaluations show that 2PS-L can achieve better partitioning quality than existing stateful streaming edge partitioners while having a much lower run-time. As a consequence, the total run-time of partitioning and subsequent distributed graph processing can be significantly reduced.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Decentralizing Permissioned Blockchain with Delay Towers
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissionless setting, we present L4L to encourage decentr…
▽ More
Growing excitement around permissionless blockchains is uncovering its latent scalability concerns. Permissioned blockchains offer high transactional throughput and low latencies while compromising decentralization. In the quest for a decentralized, scalable blockchain fabric, i.e., to offer the scalability of permissioned blockchain in a permissionless setting, we present L4L to encourage decentralization over the permissioned Libra network without compromising its sustainability. L4L employs delay towers, -- puzzle towers that leverage verifiable delay functions -- for establishing identity in a permissionless setting. Delay towers cannot be parallelized due to their sequential execution, making them an eco-friendly alternative. We also discuss methodologies to replace validators participating in consensus to promote compliant behavior. Our evaluations found that the cost of enabling decentralization over permissioned networks is almost negligible. Furthermore, delay towers offer an alternative to existing permissionless consensus mechanisms without requiring airdrops or pre-sale of tokens.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
ESCAPE to Precaution against Leader Failures
Authors:
Gengrui Zhang,
Hans-Arno Jacobsen
Abstract:
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects only partial votes. This split-vote scenario can res…
▽ More
Leader-based consensus protocols must undergo a view-change phase to elect a new leader when the current leader fails. The new leader is often decided upon a candidate server that collects votes from a quorum of servers. However, voting-based election mechanisms intrinsically cause competition in leadership candidacy when each candidate collects only partial votes. This split-vote scenario can result in no leadership winner and prolong the undesired view-change period. In this paper, we investigate a case study of Raft's leader election mechanism and propose a new leader election protocol, called ESCAPE, that fundamentally solves split votes by prioritizing servers based on their log responsiveness. ESCAPE dynamically assigns servers with a configuration that offers different priorities through Raft's periodic heartbeat. In each assignment, ESCAPE keeps track of server log responsiveness and assigns configurations that are inclined to win an election to more up-to-date servers, thereby preparing a pool of prioritized candidates. Consequently, when the next election takes place, the candidate with the highest priority will defeat its counterparts and becomes the next leader without competition. The evaluation results show that ESCAPE progressively reduces the leader election time when the cluster scales up, and the improvement becomes more significant under message loss.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
Authors:
Alexander Isenko,
Ruben Mayer,
Jeffrey Jedele,
Hans-Arno Jacobsen
Abstract:
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amou…
▽ More
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines.
In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.
△ Less
Submitted 25 March, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
A Review on Communication Protocols for Autonomous Unmanned Aerial Vehicles for Inspection Application
Authors:
Liping Shi,
Néstor J. Hernández Marcano,
Rune Hylsberg Jacobsen
Abstract:
The communication system is a critical part of the system design for the autonomous UAV. It has to address different considerations, including efficiency, reliability and mobility of the UAV. In addition, a multi-UAV system requires a communication system to assist information sharing, task allocation and collaboration in a team of UAVs. In this paper, we review communication solutions for support…
▽ More
The communication system is a critical part of the system design for the autonomous UAV. It has to address different considerations, including efficiency, reliability and mobility of the UAV. In addition, a multi-UAV system requires a communication system to assist information sharing, task allocation and collaboration in a team of UAVs. In this paper, we review communication solutions for supporting a team of UAVs while considering an application in the power line inspection industry. We provide a review of candidate wireless communication technologies {for supporting communication in UAV applications. Performance measurements and UAV-related channel modeling of those candidate technologies are reviewed. A discussion of current technologies for building UAV mesh networks is presented. We then analyze the structure, interface and performance of robotic communication middleware, ROS and ROS2. Based on our review, the features and dependencies of candidate solutions in each layer of the communication system are presented.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Reward Mechanism for Blockchains Using Evolutionary Game Theory
Authors:
Shashank Motepalli,
Hans-Arno Jacobsen
Abstract:
Blockchains have witnessed widespread adoption in the past decade in various fields. The growing demand makes their scalability and sustainability challenges more evident than ever. As a result, more and more blockchains have begun to adopt proof-of-stake (PoS) consensus protocols to address those challenges. One of the fundamental characteristics of any blockchain technology is its crypto-economi…
▽ More
Blockchains have witnessed widespread adoption in the past decade in various fields. The growing demand makes their scalability and sustainability challenges more evident than ever. As a result, more and more blockchains have begun to adopt proof-of-stake (PoS) consensus protocols to address those challenges. One of the fundamental characteristics of any blockchain technology is its crypto-economics and incentives. Lately, each PoS blockchain has designed a unique reward mechanism, yet, many of them are prone to free-rider and nothing-at-stake problems. To better understand the ad-hoc design of reward mechanisms, in this paper, we develop a reward mechanism framework that could apply to many PoS blockchains. We formulate the block validation game wherein the rewards are distributed for validating the blocks correctly. Using evolutionary game theory, we analyze how the participants' behaviour could potentially evolve with the reward mechanism. Also, penalties are found to play a central role in maintaining the integrity of blockchains.
△ Less
Submitted 8 July, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints
Authors:
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory…
▽ More
Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity -- optimal graph partitioning is NP-hard -- another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can flexibly adapt its memory overhead by separating the edge set of the graph into two sub-sets. One sub-set is partitioned by NE++, a novel, efficient in-memory algorithm, while the other sub-set is partitioned by a streaming approach. Our evaluations on large real-world graphs show that in many cases, HEP outperforms both in-memory partitioning and streaming partitioning at the same time. Hence, HEP is an attractive alternative to existing solutions that cannot fine-tune their memory overheads. Finally, we show that using HEP, we achieve a significant speedup of distributed graph processing jobs on Spark/GraphX compared to state-of-the-art partitioning algorithms.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Why Do My Blockchain Transactions Fail? A Study of Hyperledger Fabric (Extended version)*
Authors:
Jeeta Ann Chacko,
Ruben Mayer,
Hans-Arno Jacobsen
Abstract:
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While there is active research to reduce failures, there i…
▽ More
Permissioned blockchain systems promise to provide both decentralized trust and privacy. Hyperledger Fabric is currently one of the most wide-spread permissioned blockchain systems and is heavily promoted both in industry and academia. Due to its optimistic concurrency model, the transaction failure rates in Fabric can become a bottleneck. While there is active research to reduce failures, there is a lack of understanding on their root cause and, consequently, a lack of guidelines on how to configure Fabric optimally for different scenarios. To close this gap, in this paper, we first introduce a formal definition of the different types of transaction failures in Fabric. Then, we develop a comprehensive testbed and benchmarking system, HyperLedgerLab, along with four different chaincodes that represent realistic use cases and a chaincode/workload generator. Using HyperLedgerLab, we conduct exhaustive experiments to analyze the impact of different parameters of Fabric such as block size, endorsement policies, and others, on transaction failures. We further analyze three recently proposed optimizations from the literature, Fabric++, Streamchain and FabricSharp, and evaluate under which conditions they reduce the failure rates. Finally, based on our results, we provide recommendations for Fabric practitioners on how to configure the system and also propose new research directions.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
2PS: High-Quality Edge Partitioning with Two-Phase Streaming
Authors:
Ruben Mayer,
Kamil Orujzade,
Hans-Arno Jacobsen
Abstract:
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges…
▽ More
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.