-
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
Authors:
Jari Kolehmainen,
Nikolay Blagoev,
John Donaghy,
Oğuzhan Ersoy,
Christopher Nies
Abstract:
Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive,…
▽ More
Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network.
In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training.
We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to $4\%$ faster convergence rate with wide range of model sizes and accelerator counts.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks
Authors:
Nikolay Blagoev,
Lydia Yiyu Chen,
Oğuzhan Ersoy
Abstract:
Data and pipeline parallelism are ubiquitous for training of Large Language Models (LLM) on distributed nodes. Driven by the need for cost-effective training, recent work explores efficient communication arrangement for end to end training. Motivated by LLM's resistance to layer skipping and layer reordering, in this paper, we explore stage (several consecutive layers) skipping in pipeline trainin…
▽ More
Data and pipeline parallelism are ubiquitous for training of Large Language Models (LLM) on distributed nodes. Driven by the need for cost-effective training, recent work explores efficient communication arrangement for end to end training. Motivated by LLM's resistance to layer skipping and layer reordering, in this paper, we explore stage (several consecutive layers) skipping in pipeline training, and challenge the conventional practice of sequential pipeline execution. We derive convergence and throughput constraints (guidelines) for pipelining with skipping and swapping pipeline stages. Based on these constraints, we propose SkipPipe, the first partial pipeline framework to reduce the end-to-end training time for LLMs while preserving the convergence. The core of SkipPipe is a path scheduling algorithm that optimizes the paths for individual microbatches and reduces idle time (due to microbatch collisions) on the distributed nodes, complying with the given stage skipping ratio. We extensively evaluate SkipPipe on LLaMa models from 500M to 8B parameters on up to 20 nodes. Our results show that SkipPipe reduces training iteration time by up to $55\%$ compared to full pipeline. Our partial pipeline training also improves resistance to layer omission during inference, experiencing a drop in perplexity of only $7\%$ when running only half the model. Our code is available at https://github.com/gensyn-ai/skippipe.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Verde: Verification via Refereed Delegation for Machine Learning Programs
Authors:
Arasu Arun,
Adam St. Arnaud,
Alexey Titov,
Brian Wilcox,
Viktor Kolobaric,
Marc Brinkmann,
Oguzhan Ersoy,
Ben Fielding,
Joseph Bonneau
Abstract:
Machine learning programs, such as those performing inference, fine-tuning, and training of LLMs, are commonly delegated to untrusted compute providers. To provide correctness guarantees for the client, we propose adapting the cryptographic notion of refereed delegation to the machine learning setting. This approach enables a computationally limited client to delegate a program to multiple untrust…
▽ More
Machine learning programs, such as those performing inference, fine-tuning, and training of LLMs, are commonly delegated to untrusted compute providers. To provide correctness guarantees for the client, we propose adapting the cryptographic notion of refereed delegation to the machine learning setting. This approach enables a computationally limited client to delegate a program to multiple untrusted compute providers, with a guarantee of obtaining the correct result if at least one of them is honest. Refereed delegation of ML programs poses two technical hurdles: (1) an arbitration protocol to resolve disputes when compute providers disagree on the output, and (2) the ability to bitwise reproduce ML programs across different hardware setups, For (1), we design Verde, a dispute arbitration protocol that efficiently handles the large scale and graph-based computational model of modern ML programs. For (2), we build RepOps (Reproducible Operators), a library that eliminates hardware "non-determinism" by controlling the order of floating point operations performed on all hardware. Our implementation shows that refereed delegation achieves both strong guarantees for clients and practical overheads for compute providers.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
HDEE: Heterogeneous Domain Expert Ensemble
Authors:
Oğuzhan Ersoy,
Jari Kolehmainen,
Gabriel Passamani Andrade
Abstract:
Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models. Several studies aim to reduce this dependency on centralization by reducing the communication overhead of training dense models. Taking this idea of reducing communication overhead to a natural extreme, by training embarrassingly paralleli…
▽ More
Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models. Several studies aim to reduce this dependency on centralization by reducing the communication overhead of training dense models. Taking this idea of reducing communication overhead to a natural extreme, by training embarrassingly parallelizable ensembles of small independent experts, has been shown to outperform large dense models trained in traditional centralized settings. However, existing studies do not take into account underlying differences amongst data domains and treat them as monolithic, regardless of their underlying complexity, size, or distribution. In this paper, we explore the effects of introducing heterogeneity to these ensembles of domain expert models. Specifically, by allowing models within the ensemble to vary in size--as well as the number of training steps taken depending on the training data's domain--we study the effect heterogeneity has on these ensembles when evaluated against domains included in, and excluded from, the training set. We use the same compute budget to train heterogeneous ensembles and homogeneous baselines for comparison. We show that the heterogeneous ensembles achieve the lowest perplexity scores in $20$ out of the $21$ data domains used in the evaluation. Our code is available at https://github.com/gensyn-ai/hdee.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank
Authors:
Doğan Tunca Arık,
Asaf Behzat Şahin,
Özgün Ersoy
Abstract:
Terahertz imaging shows significant potential across diverse fields, yet the cost-effectiveness of multi-pixel imaging equipment remains an obstacle for many researchers. To tackle this issue, the utilization of single-pixel imaging arises as a lower-cost option, however, the data collection process necessary for reconstructing images is time-consuming. Compressive Sensing offers a promising solut…
▽ More
Terahertz imaging shows significant potential across diverse fields, yet the cost-effectiveness of multi-pixel imaging equipment remains an obstacle for many researchers. To tackle this issue, the utilization of single-pixel imaging arises as a lower-cost option, however, the data collection process necessary for reconstructing images is time-consuming. Compressive Sensing offers a promising solution by enabling image generation with fewer measurements than required by Nyquist's theorem, yet long processing times remain an issue, especially for large-sized images. Our proposed solution to this issue involves using caustic lens effect induced by perturbations in a ripple tank as a sampling mask. The dynamic characteristics of the ripple tank introduce randomness into the sampling process, thereby reducing measurement time through exploitation of the inherent sparsity of THz band signals. In this study, a Convolutional Neural Network was used to conduct target classification, based on the distinctive signal patterns obtained via the caustic lens mask. The suggested classifier obtained a 95.16 % accuracy rate in differentiating targets resembling Latin letters.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Privacy-Preserving Aggregation for Decentralized Learning with Byzantine-Robustness
Authors:
Ali Reza Ghavamipour,
Benjamin Zi Hao Zhao,
Oguzhan Ersoy,
Fatih Turkmen
Abstract:
Decentralized machine learning (DL) has been receiving an increasing interest recently due to the elimination of a single point of failure, present in Federated learning setting. Yet, it is threatened by the looming threat of Byzantine clients who intentionally disrupt the learning process by broadcasting arbitrary model updates to other clients, seeking to degrade the performance of the global mo…
▽ More
Decentralized machine learning (DL) has been receiving an increasing interest recently due to the elimination of a single point of failure, present in Federated learning setting. Yet, it is threatened by the looming threat of Byzantine clients who intentionally disrupt the learning process by broadcasting arbitrary model updates to other clients, seeking to degrade the performance of the global model. In response, robust aggregation schemes have emerged as promising solutions to defend against such Byzantine clients, thereby enhancing the robustness of Decentralized Learning. Defenses against Byzantine adversaries, however, typically require access to the updates of other clients, a counterproductive privacy trade-off that in turn increases the risk of inference attacks on those same model updates.
In this paper, we introduce SecureDL, a novel DL protocol designed to enhance the security and privacy of DL against Byzantine threats. SecureDL~facilitates a collaborative defense, while protecting the privacy of clients' model updates through secure multiparty computation. The protocol employs efficient computation of cosine similarity and normalization of updates to robustly detect and exclude model updates detrimental to model convergence. By using MNIST, Fashion-MNIST, SVHN and CIFAR-10 datasets, we evaluated SecureDL against various Byzantine attacks and compared its effectiveness with four existing defense mechanisms. Our experiments show that SecureDL is effective even in the case of attacks by the malicious majority (e.g., 80% Byzantine clients) while preserving high training accuracy.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
UAV-based Maritime Communications: Relaying to Enhance the Link Quality
Authors:
Abdullah Taha Çağan,
Görkem Berkay Koç,
Handan Yakın,
Berk Çiloğlu,
Muhammad Zeeshan Ashgar,
Özgün Ersoy,
Jyri Hämäläinen,
Metin Öztürk
Abstract:
Providing a stable connectivity in maritime communications is of utmost importance to unleash the full potential of smart ports. Nonetheless, due to the crowded nature of harbor environments, it is likely that some ships are shadowed by others, resulting in reduced received power that subsequently diminishes their data rates-even threatens basic connectivity requirements. Given that uncrewed aeria…
▽ More
Providing a stable connectivity in maritime communications is of utmost importance to unleash the full potential of smart ports. Nonetheless, due to the crowded nature of harbor environments, it is likely that some ships are shadowed by others, resulting in reduced received power that subsequently diminishes their data rates-even threatens basic connectivity requirements. Given that uncrewed aerial vehicles (UAVs) have been regarded as an integral part of future generations of wireless communication networks, they can be employed in maritime communications as well. In this paper, we investigate the use of UAV-mounted relays in order to help mitigate the reduced data rates of blocked links in maritime communications. Various communication architectures are considered based on the positioning mechanism of the UAV; in this regard, fixed, k-means algorithm-based, and landing spot-based positioning approaches are examined. Additionally, since UAVs are predominantly battery-operated, the energy consumption performances of these approaches are also measured. Results reveal that the landing spot-based UAV relay positioning approach finds the best trade-off between the data rate and energy consumption.
△ Less
Submitted 6 June, 2024; v1 submitted 17 April, 2023;
originally announced April 2023.
-
On Feasibility of Server-side Backdoor Attacks on Split Learning
Authors:
Behrad Tajalli,
Oguzhan Ersoy,
Stjepan Picek
Abstract:
Split learning is a collaborative learning design that allows several participants (clients) to train a shared model while keeping their datasets private. Recent studies demonstrate that collaborative learning models, specifically federated learning, are vulnerable to security and privacy attacks such as model inference and backdoor attacks. Backdoor attacks are a group of poisoning attacks in whi…
▽ More
Split learning is a collaborative learning design that allows several participants (clients) to train a shared model while keeping their datasets private. Recent studies demonstrate that collaborative learning models, specifically federated learning, are vulnerable to security and privacy attacks such as model inference and backdoor attacks. Backdoor attacks are a group of poisoning attacks in which the attacker tries to control the model output by manipulating the model's training process. While there have been studies regarding inference attacks on split learning, it has not yet been tested for backdoor attacks. This paper performs a novel backdoor attack on split learning and studies its effectiveness. Despite traditional backdoor attacks done on the client side, we inject the backdoor trigger from the server side. For this purpose, we provide two attack methods: one using a surrogate client and another using an autoencoder to poison the model via incoming smashed data and its outgoing gradient toward the innocent participants. We did our experiments using three model architectures and three publicly available datasets in the image domain and ran a total of 761 experiments to evaluate our attack methods. The results show that despite using strong patterns and injection methods, split learning is highly robust and resistant to such poisoning attacks. While we get the attack success rate of 100% as our best result for the MNIST dataset, in most of the other cases, our attack shows little success when increasing the cut layer.
△ Less
Submitted 26 May, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data
Authors:
Gorka Abad,
Oguzhan Ersoy,
Stjepan Picek,
Aitor Urbieta
Abstract:
Deep neural networks (DNNs) have demonstrated remarkable performance across various tasks, including image and speech recognition. However, maximizing the effectiveness of DNNs requires meticulous optimization of numerous hyperparameters and network parameters through training. Moreover, high-performance DNNs entail many parameters, which consume significant energy during training. In order to ove…
▽ More
Deep neural networks (DNNs) have demonstrated remarkable performance across various tasks, including image and speech recognition. However, maximizing the effectiveness of DNNs requires meticulous optimization of numerous hyperparameters and network parameters through training. Moreover, high-performance DNNs entail many parameters, which consume significant energy during training. In order to overcome these challenges, researchers have turned to spiking neural networks (SNNs), which offer enhanced energy efficiency and biologically plausible data processing capabilities, rendering them highly suitable for sensory data tasks, particularly in neuromorphic data. Despite their advantages, SNNs, like DNNs, are susceptible to various threats, including adversarial examples and backdoor attacks. Yet, the field of SNNs still needs to be explored in terms of understanding and countering these attacks.
This paper delves into backdoor attacks in SNNs using neuromorphic datasets and diverse triggers. Specifically, we explore backdoor triggers within neuromorphic data that can manipulate their position and color, providing a broader scope of possibilities than conventional triggers in domains like images. We present various attack strategies, achieving an attack success rate of up to 100% while maintaining a negligible impact on clean accuracy. Furthermore, we assess these attacks' stealthiness, revealing that our most potent attacks possess significant stealth capabilities. Lastly, we adapt several state-of-the-art defenses from the image domain, evaluating their efficacy on neuromorphic data and uncovering instances where they fall short, leading to compromised performance.
△ Less
Submitted 5 February, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks
Authors:
Xiaoyun Xu,
Oguzhan Ersoy,
Stjepan Picek
Abstract:
Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclass…
▽ More
Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a pre-designed pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer perturbations to mislead the model than UAPs from clean models. UAPs of backdoored models tend to exploit the shortcut from all classes to the target class, built by the backdoor trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs. Experiments on 345 models trained on several datasets show that USB effectively detects the injected backdoor and provides comparable or better results than state-of-the-art methods.
△ Less
Submitted 24 August, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
SyncPCN/PSyncPCN: Payment Channel Networks without Blockchain Synchrony
Authors:
Oğuzhan Ersoy,
Jérémie Decouchant,
Satwik Prabhu Kimble,
Stefanie Roos
Abstract:
Payment channel networks (PCNs) enhance the scalability of blockchains by allowing parties to conduct transactions off-chain, i.e, without broadcasting every transaction to all blockchain participants. To conduct transactions, a sender and a receiver can either establish a direct payment channel with a funding blockchain transaction or leverage existing channels in a multi-hop payment. The securit…
▽ More
Payment channel networks (PCNs) enhance the scalability of blockchains by allowing parties to conduct transactions off-chain, i.e, without broadcasting every transaction to all blockchain participants. To conduct transactions, a sender and a receiver can either establish a direct payment channel with a funding blockchain transaction or leverage existing channels in a multi-hop payment. The security of PCNs usually relies on the synchrony of the underlying blockchain, i.e., evidence of misbehavior needs to be published on the blockchain within a time limit. Alternative payment channel proposals that do not require blockchain synchrony rely on quorum certificates and use a committee to register the transactions of a channel. However, these proposals do not support multi-hop payments, a limitation we aim to overcome. In this paper, we demonstrate that it is in fact impossible to design a multi-hop payment protocol with both network asynchrony and faulty channels, i.e., channels that may not correctly follow the protocol. We then detail two committee-based multi-hop payment protocols that respectively assume synchronous communications and possibly faulty channels, or asynchronous communication and correct channels. The first protocol relies on possibly faulty committees instead of the blockchain to resolve channel disputes, and enforces privacy properties within a synchronous network. The second one relies on committees that contain at most f faulty members out of 3f+1 and successively delegate to each other the role of eventually completing a multi-hop payment. We show that both protocols satisfy the security requirements of a multi-hop payment and compare their communication complexity and latency.
△ Less
Submitted 4 August, 2022; v1 submitted 23 July, 2022;
originally announced July 2022.
-
Sniper Backdoor: Single Client Targeted Backdoor Attack in Federated Learning
Authors:
Gorka Abad,
Servio Paguada,
Oguzhan Ersoy,
Stjepan Picek,
Víctor Julio Ramírez-Durán,
Aitor Urbieta
Abstract:
Federated Learning (FL) enables collaborative training of Deep Learning (DL) models where the data is retained locally. Like DL, FL has severe security weaknesses that the attackers can exploit, e.g., model inversion and backdoor attacks. Model inversion attacks reconstruct the data from the training datasets, whereas backdoors misclassify only classes containing specific properties, e.g., a pixel…
▽ More
Federated Learning (FL) enables collaborative training of Deep Learning (DL) models where the data is retained locally. Like DL, FL has severe security weaknesses that the attackers can exploit, e.g., model inversion and backdoor attacks. Model inversion attacks reconstruct the data from the training datasets, whereas backdoors misclassify only classes containing specific properties, e.g., a pixel pattern. Backdoors are prominent in FL and aim to poison every client model, while model inversion attacks can target even a single client.
This paper introduces a novel technique to allow backdoor attacks to be client-targeted, compromising a single client while the rest remain unchanged. The attack takes advantage of state-of-the-art model inversion and backdoor attacks. Precisely, we leverage a Generative Adversarial Network to perform the model inversion. Afterward, we shadow-train the FL network, in which, using a Siamese Neural Network, we can identify, target, and backdoor the victim's model. Our attack has been validated using the MNIST, F-MNIST, EMNIST, and CIFAR-100 datasets under different settings -- achieving up to 99\% accuracy on both source (clean) and target (backdoor) classes and against state-of-the-art defenses, e.g., Neural Cleanse, opening a novel threat model to be considered in the future.
△ Less
Submitted 28 February, 2023; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Watermarking Graph Neural Networks based on Backdoor Attacks
Authors:
Jing Xu,
Stefanos Koffas,
Oguzhan Ersoy,
Stjepan Picek
Abstract:
Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise in fine-tuning the model. Moreover, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authen…
▽ More
Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. Building a powerful GNN model is not a trivial task, as it requires a large amount of training data, powerful computing resources, and human expertise in fine-tuning the model. Moreover, with the development of adversarial attacks, e.g., model stealing attacks, GNNs raise challenges to model authentication. To avoid copyright infringement on GNNs, verifying the ownership of the GNN models is necessary.
This paper presents a watermarking framework for GNNs for both graph and node classification tasks. We 1) design two strategies to generate watermarked data for the graph classification task and one for the node classification task, 2) embed the watermark into the host model through training to obtain the watermarked GNN model, and 3) verify the ownership of the suspicious model in a black-box setting. The experiments show that our framework can verify the ownership of GNN models with a very high probability (up to $99\%$) for both tasks. Finally, we experimentally show that our watermarking approach is robust against a state-of-the-art model extraction technique and four state-of-the-art defenses against backdoor attacks.
△ Less
Submitted 13 November, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Parallel Implementation of Distributed Global Optimization (DGO)
Authors:
Homayoun Valafar,
Okan K. Ersoy,
Farmaraz Valafar
Abstract:
Parallel implementations of distributed global optimization (DGO) [13] on MP-1 and NCUBE parallel computers revealed an approximate O(n) increase in the performance of this algorithm. Therefore, the implementation of the DGO on parallel processors can remedy the only draw back of this algorithm which is the O(n2) of execution time as the number of the dimensions increase. The speed up factor of th…
▽ More
Parallel implementations of distributed global optimization (DGO) [13] on MP-1 and NCUBE parallel computers revealed an approximate O(n) increase in the performance of this algorithm. Therefore, the implementation of the DGO on parallel processors can remedy the only draw back of this algorithm which is the O(n2) of execution time as the number of the dimensions increase. The speed up factor of the parallel implementations of DGO is measured with respect to the sequential execution time of the identical problem on SPARC IV computer. The best speed up was achieved by the SIMD implementation of the algorithm on the MP-1 with the total speedup of 126 for an optimization problem with n = 9. This optimization problem was distributed across 128 PEs of Mas-Par.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Distributed Global Optimization (DGO)
Authors:
Homayoun Valafar,
Okan K. Ersoy,
Faramarz Valafar
Abstract:
A new technique of global optimization and its applications in particular to neural networks are presented. The algorithm is also compared to other global optimization algorithms such as Gradient descent (GD), Monte Carlo (MC), Genetic Algorithm (GA) and other commercial packages. This new optimization technique proved itself worthy of further study after observing its accuracy of convergence, spe…
▽ More
A new technique of global optimization and its applications in particular to neural networks are presented. The algorithm is also compared to other global optimization algorithms such as Gradient descent (GD), Monte Carlo (MC), Genetic Algorithm (GA) and other commercial packages. This new optimization technique proved itself worthy of further study after observing its accuracy of convergence, speed of convergence and ease of use. Some of the advantages of this new optimization technique are listed below: 1. Optimizing function does not have to be continuous or differentiable. 2. No random mechanism is used, therefore this algorithm does not inherit the slow speed of random searches. 3. There are no fine-tuning parameters (such as the step rate of G.D. or temperature of S.A.) needed for this technique. 4. This algorithm can be implemented on parallel computers so that there is little increase in computation time (compared to linear increase) as the number of dimensions increases. The time complexity of O(n) is achieved.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Channel Attention Networks for Robust MR Fingerprinting Matching
Authors:
Refik Soyak,
Ebru Navruz,
Eda Ozgu Ersoy,
Gastao Cruz,
Claudia Prieto,
Andrew P. King,
Devrim Unay,
Ilkay Oksuz
Abstract:
Magnetic Resonance Fingerprinting (MRF) enables simultaneous mapping of multiple tissue parameters such as T1 and T2 relaxation times. The working principle of MRF relies on varying acquisition parameters pseudo-randomly, so that each tissue generates its unique signal evolution during scanning. Even though MRF provides faster scanning, it has disadvantages such as erroneous and slow generation of…
▽ More
Magnetic Resonance Fingerprinting (MRF) enables simultaneous mapping of multiple tissue parameters such as T1 and T2 relaxation times. The working principle of MRF relies on varying acquisition parameters pseudo-randomly, so that each tissue generates its unique signal evolution during scanning. Even though MRF provides faster scanning, it has disadvantages such as erroneous and slow generation of the corresponding parametric maps, which needs to be improved. Moreover, there is a need for explainable architectures for understanding the guiding signals to generate accurate parametric maps. In this paper, we addressed both of these shortcomings by proposing a novel neural network architecture consisting of a channel-wise attention module and a fully convolutional network. The proposed approach, evaluated over 3 simulated MRF signals, reduces error in the reconstruction of tissue parameters by 8.88% for T1 and 75.44% for T2 with respect to state-of-the-art methods. Another contribution of this study is a new channel selection method: attention-based channel selection. Furthermore, the effect of patch size and temporal frames of MRF signal on channel reduction are analyzed by employing a channel-wise attention.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Parallel, Self Organizing, Consensus Neural Networks
Authors:
Homayoun Valafar,
Faramarz Valafar,
Okan Ersoy
Abstract:
A new neural network architecture (PSCNN) is developed to improve performance and speed of such networks. The architecture has all the advantages of the previous models such as self-organization and possesses some other superior characteristics such as input parallelism and decision making based on consensus. Due to the properties of this network, it was studied with respect to implementation on a…
▽ More
A new neural network architecture (PSCNN) is developed to improve performance and speed of such networks. The architecture has all the advantages of the previous models such as self-organization and possesses some other superior characteristics such as input parallelism and decision making based on consensus. Due to the properties of this network, it was studied with respect to implementation on a Parallel Processor (Ncube Machine) as well as a regular sequential machine. The architecture self organizes its own modules in a way to maximize performance. Since it is completely parallel, both recall and learning procedures are very fast. The performance of the network was compared to the Backpropagation networks in problems of language perception, remote sensing and binary logic (Exclusive-Or). PSCNN showed superior performance in all cases studied.
△ Less
Submitted 30 July, 2020;
originally announced August 2020.
-
Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning
Authors:
Gustavo A. Valencia-Zapata,
Carolina Gonzalez-Canas,
Michael G. Zentner,
Okan Ersoy,
Gerhard Klimeck
Abstract:
Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts…
▽ More
Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. Most of these approaches focus on remediation of one among many problems, with experimental results coming from few datasets and classification algorithms, insufficient measures of prediction power, and lack of statistical validation for testing the real benefit of the proposed approach. This paper consists of two main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Thereby, early and correct diagnosis of these problems is to be achieved in order to select not only the most convenient remediation treatment but also unbiased performance metrics. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers.
△ Less
Submitted 15 April, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
How to profit from payments channels
Authors:
Oguzhan Ersoy,
Stefanie Roos,
Zekeriya Erkin
Abstract:
Payment channel networks like Bitcoin's Lightning network are an auspicious approach for realizing high transaction throughput and almost-instant confirmations in blockchain networks. However, the ability to successfully make payments in such networks relies on the willingness of participants to lock collateral in the network. In Lightning, the key financial incentive is to lock collateral are sma…
▽ More
Payment channel networks like Bitcoin's Lightning network are an auspicious approach for realizing high transaction throughput and almost-instant confirmations in blockchain networks. However, the ability to successfully make payments in such networks relies on the willingness of participants to lock collateral in the network. In Lightning, the key financial incentive is to lock collateral are small fees for routing payments for other participants. While users can choose these fees, currently, they mainly stick to the default fees. By providing insights on beneficial choices for fees, we aim to incentivize users to lock more collateral and improve the effectiveness of the network.
In this paper, we consider a node $\mathbf{A}$ that given the network topology and the channel details selects where to establish channels and how much fee to charge such that its financial gain is maximized. We formalize the optimization problem and show that it is NP-hard. We design a greedy algorithm to approximate the optimal solution. In each step, our greedy algorithm selects a node which maximizes the total reward concerning the number of shortest paths passing through $\mathbf{A}$ and channel fees. Our simulation study leverages real-world data set to quantify the impact of our gain optimization and indicates that our strategy is at least a factor two better than other strategies.
△ Less
Submitted 25 November, 2019; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Ladder Networks for Semi-Supervised Hyperspectral Image Classification
Authors:
Julian Büchel,
Okan Ersoy
Abstract:
We used the Ladder Network [Rasmus et al. (2015)] to perform Hyperspectral Image Classification in a semi-supervised setting. The Ladder Network distinguishes itself from other semi-supervised methods by jointly optimizing a supervised and unsupervised cost. In many settings this has proven to be more successful than other semi-supervised techniques, such as pretraining using unlabeled data. We fu…
▽ More
We used the Ladder Network [Rasmus et al. (2015)] to perform Hyperspectral Image Classification in a semi-supervised setting. The Ladder Network distinguishes itself from other semi-supervised methods by jointly optimizing a supervised and unsupervised cost. In many settings this has proven to be more successful than other semi-supervised techniques, such as pretraining using unlabeled data. We furthermore show that the convolutional Ladder Network outperforms most of the current techniques used in hyperspectral image classification and achieves new state-of-the-art performance on the Pavia University dataset given only 5 labeled data points per class.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.
-
Transaction Propagation on Permissionless Blockchains: Incentive and Routing Mechanisms
Authors:
Oguzhan Ersoy,
Zhijie Ren,
Zekeriya Erkin,
Reginald L. Lagendijk
Abstract:
Existing permissionless blockchain solutions rely on peer-to-peer propagation mechanisms, where nodes in a network transfer transaction they received to their neighbors. Unfortunately, there is no explicit incentive for such transaction propagation. Therefore, existing propagation mechanisms will not be sustainable in a fully decentralized blockchain with rational nodes. In this work, we formally…
▽ More
Existing permissionless blockchain solutions rely on peer-to-peer propagation mechanisms, where nodes in a network transfer transaction they received to their neighbors. Unfortunately, there is no explicit incentive for such transaction propagation. Therefore, existing propagation mechanisms will not be sustainable in a fully decentralized blockchain with rational nodes. In this work, we formally define the problem of incentivizing nodes for transaction propagation. We propose an incentive mechanism where each node involved in the propagation of a transaction receives a share of the transaction fee. We also show that our proposal is Sybil-proof. Furthermore, we combine the incentive mechanism with smart routing to reduce the communication and storage costs at the same time. The proposed routing mechanism reduces the redundant transaction propagation from the size of the network to a factor of average shortest path length. The routing mechanism is built upon a specific type of consensus protocol where the round leader who creates the transaction block is known in advance. Note that our routing mechanism is a generic one and can be adopted independently from the incentive mechanism.
△ Less
Submitted 14 June, 2018; v1 submitted 20 December, 2017;
originally announced December 2017.
-
A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms
Authors:
Gustavo A Valencia-Zapata,
Daniel Mejia,
Gerhard Klimeck,
Michael Zentner,
Okan Ersoy
Abstract:
Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and g…
▽ More
Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and generate synthetic data in order to reach better classification accuracy. It means we focus on increasing the training data synthetically to increase the classification accuracy.
△ Less
Submitted 5 September, 2017;
originally announced September 2017.
-
A collocation method based on extended cubic B-splines for numerical solutions of the Klein-Gordon equation
Authors:
Alper Korkmaz,
Ozlem Ersoy,
Idiris Dag
Abstract:
A generalization of classical cubic B-spline functions with a parameter is used as basis in the collocation method. Some initial boundary value problems constructed on the nonlinear Klein-gordon equation are solved by the proposed method for extension various parameters. The coupled system derived as a result of the reduction of the time order of the equation is integrated in time by the Crank-Nic…
▽ More
A generalization of classical cubic B-spline functions with a parameter is used as basis in the collocation method. Some initial boundary value problems constructed on the nonlinear Klein-gordon equation are solved by the proposed method for extension various parameters. The coupled system derived as a result of the reduction of the time order of the equation is integrated in time by the Crank-Nicolson method. After linearizing the nonlinear term, the collocation procedure is implemented. Adapting the initial conditions provides a linear iteration system for the fully integration of the equation. The validity of the method is investigated by measuring the maximum errors between analytical and the numerical solutions. The absolute relative changes of the conservation laws describing the energy and the momentum are computed for both problems.
△ Less
Submitted 15 October, 2016;
originally announced November 2016.
-
Numerical investigation of the solutions of Schrodinger equation with exponential cubic B-spline finite element method
Authors:
Ozlem Ersoy,
Idris Dag,
Ali Sahin
Abstract:
In this paper, we investigate the numerical solutions of the cubic nonlinear Schrodinger equation via the exponential B-spline collocation method. Crank-Nicolson formulas are used for time discretization of the target equation. A linearization technique is also employed for the numerical purpose. Four numerical examples related to single soliton, collision of two solitons that move in opposite dir…
▽ More
In this paper, we investigate the numerical solutions of the cubic nonlinear Schrodinger equation via the exponential B-spline collocation method. Crank-Nicolson formulas are used for time discretization of the target equation. A linearization technique is also employed for the numerical purpose. Four numerical examples related to single soliton, collision of two solitons that move in opposite directions, the birht of standing and mobile solitons and bound state solution are considered as the test problems. The accuracy and the efficiency of the purposed method are measured by max error norm and conserved constants. The obtained results are compared with the possible analytical values and those in some earlier studies.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Motion of Patterns Modeled by the Gray-Scott Autocatalysis System in One Dimension
Authors:
Alper Korkmaz,
Ozlem Ersoy,
Idiris Dag
Abstract:
Occupation of an interval by self-replicating initial pulses is studied numerically. Two different approximates in different categories are proposed for the numerical solutions of some initial-boundary value problems. The sinc differential quadrature combined with third-fourth order implicit Rosenbrock and exponential B-spline collocation methods are setup to obtain the numerical solutions of the…
▽ More
Occupation of an interval by self-replicating initial pulses is studied numerically. Two different approximates in different categories are proposed for the numerical solutions of some initial-boundary value problems. The sinc differential quadrature combined with third-fourth order implicit Rosenbrock and exponential B-spline collocation methods are setup to obtain the numerical solutions of the mentioned problems. The numerical simulations containing occupation of single initial pulse, non or slow occupation model and covering the domain with two initial pulses are demonstrated by using both proposed methods.
△ Less
Submitted 16 May, 2016;
originally announced May 2016.
-
Multilevel Threshold Secret and Function Sharing based on the Chinese Remainder Theorem
Authors:
Oguzhan Ersoy,
Kamer Kaya,
Kerem Kaskaloglu
Abstract:
A recent work of Harn and Fuyou presents the first multilevel (disjunctive) threshold secret sharing scheme based on the Chinese Remainder Theorem. In this work, we first show that the proposed method is not secure and also fails to work with a certain natural setting of the threshold values on compartments. We then propose a secure scheme that works for all threshold settings. In this scheme, we…
▽ More
A recent work of Harn and Fuyou presents the first multilevel (disjunctive) threshold secret sharing scheme based on the Chinese Remainder Theorem. In this work, we first show that the proposed method is not secure and also fails to work with a certain natural setting of the threshold values on compartments. We then propose a secure scheme that works for all threshold settings. In this scheme, we employ a refined version of Asmuth-Bloom secret sharing with a special and generic Asmuth-Bloom sequence called the {\it anchor sequence}. Based on this idea, we also propose the first multilevel conjunctive threshold secret sharing scheme based on the Chinese Remainder Theorem. Lastly, we discuss how the proposed schemes can be used for multilevel threshold function sharing by employing it in a threshold RSA cryptosystem as an example.
△ Less
Submitted 25 May, 2016;
originally announced May 2016.
-
Solitary wave simulations of the Boussinesq Systems
Authors:
Ozlem Ersoy,
Idiris Dag,
Alper Korkmaz
Abstract:
In the study, the collocation method based on exponential cubic B-spline functions is proposed to solve one dimensional Boussinesq systems numerically. Two initial boundary value problems for Regularized and Classical Boussinesq systems modeling motion of traveling waves are considered. The accuracy of the method is validated by measuring the error between the numerical and analytical solutions. T…
▽ More
In the study, the collocation method based on exponential cubic B-spline functions is proposed to solve one dimensional Boussinesq systems numerically. Two initial boundary value problems for Regularized and Classical Boussinesq systems modeling motion of traveling waves are considered. The accuracy of the method is validated by measuring the error between the numerical and analytical solutions. The numerical solutions obtained by various values of free parameter $p$ are compared with some solutions in literature.
△ Less
Submitted 16 May, 2016;
originally announced May 2016.
-
The Numerical Approach to the Fisher's Equation via Trigonometric Cubic B-spline Collocation Method
Authors:
Ozlem Ersoy,
Idris Dag
Abstract:
In this study, we set up a numerical technique to get approximate solutions of Fisher's equation which is one of the most important model equation in population biology. We integrate the equation fully by using combination of the trigonometric cubic B-spline functions for space variable and Crank-Nicolson for the time integration. Numerical results have been presented to show the accuracy of the c…
▽ More
In this study, we set up a numerical technique to get approximate solutions of Fisher's equation which is one of the most important model equation in population biology. We integrate the equation fully by using combination of the trigonometric cubic B-spline functions for space variable and Crank-Nicolson for the time integration. Numerical results have been presented to show the accuracy of the current algorithm. We have seen that the proposed technique is a good alternative to some existing techniques for getting solutions of the Fisher's equation.
△ Less
Submitted 23 April, 2016;
originally announced April 2016.
-
A Trigonometric Cubic B-spline Finite Element Method for Solving the Nonlinear Coupled Burger Equation
Authors:
Ozlem Ersoy,
Idris Dag
Abstract:
The coupled Burgers equation is solved by way of the trigonometric B-spline collocation method. The unknown of the coupled Burgers equation is integrated in time by aid of the Crank-Nicolson method. Resulting time-integrated coupled Burgers equation is discretized using the trigonometric cubic B-spline collocation method. Fully-integrated couupled Burgers equation which is a system of nonlinear al…
▽ More
The coupled Burgers equation is solved by way of the trigonometric B-spline collocation method. The unknown of the coupled Burgers equation is integrated in time by aid of the Crank-Nicolson method. Resulting time-integrated coupled Burgers equation is discretized using the trigonometric cubic B-spline collocation method. Fully-integrated couupled Burgers equation which is a system of nonlinear algebraic equation is solved with a variant of Thomas algorithm. The three model test problems are studied to illustrate the accuracy of the suggested method.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
The Exponential Cubic B-spline Algorithm for Burgers's Equation
Authors:
Ozlem Ersoy,
Idris Dag,
Nihat Adar
Abstract:
The exponential cubic B-spline functions are used to set up the collocation method for finding solutions of the Burgers's equation. The effect of the exponential cubic B-splines in the collocation method is sought by studying four text problems.
The exponential cubic B-spline functions are used to set up the collocation method for finding solutions of the Burgers's equation. The effect of the exponential cubic B-splines in the collocation method is sought by studying four text problems.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
Spectroscopic super-resolution fluorescence cell imaging using ultra-small Ge quantum dots
Authors:
Mingying Song,
Ali Karatutlu,
Osman Ersoy,
Yun Zhou,
Yongxin Yang,
Yuanpeng Zhang,
William R. Little,
Ann P. Wheeler,
Andrei V. Sapelkin
Abstract:
In single molecule localisation super-resolution microscopy the need for repeated image capture limits the imaging speed, while the size of fluorescence probes limits the possible theoretical localisation resolution. Here, we demonstrated a spectral imaging based super-resolution approach by separating the overlapping diffraction spots into several detectors during a single scanning period and tak…
▽ More
In single molecule localisation super-resolution microscopy the need for repeated image capture limits the imaging speed, while the size of fluorescence probes limits the possible theoretical localisation resolution. Here, we demonstrated a spectral imaging based super-resolution approach by separating the overlapping diffraction spots into several detectors during a single scanning period and taking advantage of the size-dependent emission wavelength in nanoparticles. This approach has been tested using off-the-shelf quantum dots (Qdot) and in-house novel ultra-small (~3 nm) Ge QDs. Furthermore, we developed a method-specific Gaussian fitting and maximum likelihood estimation based on a Matlab algorithm for fast QDs localisation. We demonstrate that this methodology results in ~ 40 nm localisation resolution using commercial QDs and ~12 nm localisation resolution using Ge QDs. Using a standard scanning confocal microscope we achieved data acquisition rate of 1.6 seconds/frame. However, we show that this approach has a potential to deliver data acquisition rates on ms scale thus providing super-resolution in live systems.
△ Less
Submitted 9 April, 2015; v1 submitted 31 March, 2015;
originally announced March 2015.
-
An Exponential Cubic B-spline Finite Element Method for Solving the Nonlinear Coupled Burger Equation
Authors:
Ozlem Ersoy,
Idiris Dag
Abstract:
The exponential cubic B-spline functions together with Crank Nicolson are used to solve numerically the nonlinear coupled Burgers' equation using collocation method. This method has been tested by three different problems. The proposed scheme is compared with some existing methods. We have noticed that proposed scheme produced a highly accurate results.
The exponential cubic B-spline functions together with Crank Nicolson are used to solve numerically the nonlinear coupled Burgers' equation using collocation method. This method has been tested by three different problems. The proposed scheme is compared with some existing methods. We have noticed that proposed scheme produced a highly accurate results.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.
-
The Trigonometric Cubic B-spline Algorithm for Burgers' Equation
Authors:
I. Dag,
O. Ersoy,
O. Kacmaz
Abstract:
The cubic Trigonometric B-spline(CTB) functions are used to set up the collocation method for finding solutions of the Burgers' equation. The effect of the CTB in the collocation method is sought by studying two text problems. The Burgers' equation is fully-discretized using the Crank-Nicholson method for the time discretizion and CTB function for discretizion of spatial variable. Numerical exampl…
▽ More
The cubic Trigonometric B-spline(CTB) functions are used to set up the collocation method for finding solutions of the Burgers' equation. The effect of the CTB in the collocation method is sought by studying two text problems. The Burgers' equation is fully-discretized using the Crank-Nicholson method for the time discretizion and CTB function for discretizion of spatial variable. Numerical examples are performed to show the convenience of the method for solutions of Burgers equation
△ Less
Submitted 21 July, 2014;
originally announced July 2014.
-
Nonlinear Dynamic Field Embedding: On Hyperspectral Scene Visualization
Authors:
Dalton Lunga 'and' Okan Ersoy
Abstract:
Graph embedding techniques are useful to characterize spectral signature relations for hyperspectral images. However, such images consists of disjoint classes due to spatial details that are often ignored by existing graph computing tools. Robust parameter estimation is a challenge for kernel functions that compute such graphs. Finding a corresponding high quality coordinate system to map signatur…
▽ More
Graph embedding techniques are useful to characterize spectral signature relations for hyperspectral images. However, such images consists of disjoint classes due to spatial details that are often ignored by existing graph computing tools. Robust parameter estimation is a challenge for kernel functions that compute such graphs. Finding a corresponding high quality coordinate system to map signature relations remains an open research question. We answer positively on these challenges by first proposing a kernel function of spatial and spectral information in computing neighborhood graphs. Secondly, the study exploits the force field interpretation from mechanics and devise a unifying nonlinear graph embedding framework. The generalized framework leads to novel unsupervised multidimensional artificial field embedding techniques that rely on the simple additive assumption of pair-dependent attraction and repulsion functions. The formulations capture long range and short range distance related effects often associated with living organisms and help to establish algorithmic properties that mimic mutual behavior for the purpose of dimensionality reduction. The main benefits from the proposed models includes the ability to preserve the local topology of data and produce quality visualizations i.e. maintaining disjoint meaningful neighborhoods. As part of evaluation, visualization, gradient field trajectories, and semisupervised classification experiments are conducted for image scenes acquired by multiple sensors at various spatial resolutions over different types of objects. The results demonstrate the superiority of the proposed embedding framework over various widely used methods.
△ Less
Submitted 28 November, 2012;
originally announced November 2012.