Search | arXiv e-print repository

Model-Distributed Inference for Large Language Models at the Edge

Authors: Davide Macario, Hulya Seferoglu, Erdem Koyuncu

Abstract: We introduce Model-Distributed Inference for Large-Language Models (MDI-LLM), a novel framework designed to facilitate the deployment of state-of-the-art large-language models (LLMs) across low-power devices at the edge. This is accomplished by dividing the model into multiple partitions, which are then assigned to different devices/nodes within the network. These nodes exchange intermediate activ… ▽ More We introduce Model-Distributed Inference for Large-Language Models (MDI-LLM), a novel framework designed to facilitate the deployment of state-of-the-art large-language models (LLMs) across low-power devices at the edge. This is accomplished by dividing the model into multiple partitions, which are then assigned to different devices/nodes within the network. These nodes exchange intermediate activation vectors via device-to-device links, enabling collaborative computation. To enhance the efficiency of this process, we propose the "recurrent pipeline parallelism" technique, which reduces idle time on each device and facilitates parallel inference during the generation of multiple text sequences. By leveraging the combined computational resources of multiple edge devices, MDI-LLM enables the deployment of LLMs that exceed the memory capacity of individual devices, making it possible to perform inference on low-cost hardware. Furthermore, as the number of participating devices increases, MDI-LLM boosts token generation throughput and reduces memory consumption per device. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2410.06106 [pdf, other]

Distributed Tomographic Reconstruction with Quantization

Authors: Runxuan Miao, Selin Aslan, Erdem Koyuncu, Doğa Gürsoy

Abstract: Conventional tomographic reconstruction typically depends on centralized servers for both data storage and computation, leading to concerns about memory limitations and data privacy. Distributed reconstruction algorithms mitigate these issues by partitioning data across multiple nodes, reducing server load and enhancing privacy. However, these algorithms often encounter challenges related to memor… ▽ More Conventional tomographic reconstruction typically depends on centralized servers for both data storage and computation, leading to concerns about memory limitations and data privacy. Distributed reconstruction algorithms mitigate these issues by partitioning data across multiple nodes, reducing server load and enhancing privacy. However, these algorithms often encounter challenges related to memory constraints and communication overhead between nodes. In this paper, we introduce a decentralized Alternating Directions Method of Multipliers (ADMM) with configurable quantization. By distributing local objectives across nodes, our approach is highly scalable and can efficiently reconstruct images while adapting to available resources. To overcome communication bottlenecks, we propose two quantization techniques based on K-means clustering and JPEG compression. Numerical experiments with benchmark images illustrate the tradeoffs between communication efficiency, memory use, and reconstruction accuracy. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 26 pages, 8 figures

MSC Class: 68W15; 65R32

arXiv:2408.05247 [pdf, other]

Early-Exit meets Model-Distributed Inference at Edge Networks

Authors: Marco Colocrese, Erdem Koyuncu, Hulya Seferoglu

Abstract: Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distribu… ▽ More Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of DNN layers. In MDI, a source device that has data processes a few layers of DNN and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI with early-exit, which advocates that there is no need to process all the layers of a model for some data to reach the desired accuracy, i.e., we can exit the model without processing all the layers if target accuracy is reached. We design a framework MDI-Exit that adaptively determines early-exit and offloading policies as well as data admission at the source. Experimental results on a real-life testbed of NVIDIA Nano edge devices show that MDI-Exit processes more data when accuracy is fixed and results in higher accuracy for the fixed data rate. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2402.17470 [pdf, other]

Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization

Authors: Panqi Jia, Jue Mao, Esin Koyuncu, A. Burakhan Koyuncu, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Elena Alshina, Andre Kaup

Abstract: Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization ef… ▽ More Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2303.11247 [pdf, ps, other]

Memorization Capacity of Neural Networks with Conditional Computation

Authors: Erdem Koyuncu

Abstract: Many empirical studies have demonstrated the performance benefits of conditional computation in neural networks, including reduced inference time and power consumption. We study the fundamental limits of neural conditional computation from the perspective of memorization capacity. For Rectified Linear Unit (ReLU) networks without conditional computation, it is known that memorizing a collection of… ▽ More Many empirical studies have demonstrated the performance benefits of conditional computation in neural networks, including reduced inference time and power consumption. We study the fundamental limits of neural conditional computation from the perspective of memorization capacity. For Rectified Linear Unit (ReLU) networks without conditional computation, it is known that memorizing a collection of $n$ input-output relationships can be accomplished via a neural network with $O(\sqrt{n})$ neurons. Calculating the output of this neural network can be accomplished using $O(\sqrt{n})$ elementary arithmetic operations of additions, multiplications and comparisons for each input. Using a conditional ReLU network, we show that the same task can be accomplished using only $O(\log n)$ operations per input. This represents an almost exponential improvement as compared to networks without conditional computation. We also show that the $Θ(\log n)$ rate is the best possible. Our achievability result utilizes a general methodology to synthesize a conditional network out of an unconditional network in a computationally-efficient manner, bridging the gap between unconditional and conditional architectures. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: To be presented at International Conference on Learning Representations (ICLR), 2023

arXiv:2212.01330 [pdf, other]

Device Interoperability for Learned Image Compression with Weights and Activations Quantization

Authors: Esin Koyuncu, Timofey Solovyev, Elena Alshina, André Kaup

Abstract: Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. I… ▽ More Learning-based image compression has improved to a level where it can outperform traditional image codecs such as HEVC and VVC in terms of coding performance. In addition to good compression performance, device interoperability is essential for a compression codec to be deployed, i.e., encoding and decoding on different CPUs or GPUs should be error-free and with negligible performance reduction. In this paper, we present a method to solve the device interoperability problem of a state-of-the-art image compression network. We implement quantization to entropy networks which output entropy parameters. We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly with minor performance deviation, of 0.3% BD-rate, from floating point model results. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 5 pages, 5 figures, Picture Coding Symposium (PCS) 2022

arXiv:2210.15621 [pdf, other]

Class Based Thresholding in Early Exit Semantic Segmentation Networks

Authors: Alperen Görmez, Erdem Koyuncu

Abstract: We propose Class Based Thresholding (CBT) to reduce the computational cost of early exit semantic segmentation models while preserving the mean intersection over union (mIoU) performance. A key idea of CBT is to exploit the naturally-occurring neural collapse phenomenon. Specifically, by calculating the mean prediction probabilities of each class in the training set, CBT assigns different masking… ▽ More We propose Class Based Thresholding (CBT) to reduce the computational cost of early exit semantic segmentation models while preserving the mean intersection over union (mIoU) performance. A key idea of CBT is to exploit the naturally-occurring neural collapse phenomenon. Specifically, by calculating the mean prediction probabilities of each class in the training set, CBT assigns different masking threshold values to each class, so that the computation can be terminated sooner for pixels belonging to easy-to-predict classes. We show the effectiveness of CBT on Cityscapes and ADE20K datasets. CBT can reduce the computational cost by $23\%$ compared to the previous state-of-the-art early exit models. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2210.07282 [pdf, other]

Harfang3D Dog-Fight Sandbox: A Reinforcement Learning Research Platform for the Customized Control Tasks of Fighter Aircrafts

Authors: Muhammed Murat Özbek, Süleyman Yıldırım, Muhammet Aksoy, Eric Kernin, Emre Koyuncu

Abstract: The advent of deep learning (DL) gave rise to significant breakthroughs in Reinforcement Learning (RL) research. Deep Reinforcement Learning (DRL) algorithms have reached super-human level skills when applied to vision-based control problems as such in Atari 2600 games where environment states were extracted from pixel information. Unfortunately, these environments are far from being applicable to… ▽ More The advent of deep learning (DL) gave rise to significant breakthroughs in Reinforcement Learning (RL) research. Deep Reinforcement Learning (DRL) algorithms have reached super-human level skills when applied to vision-based control problems as such in Atari 2600 games where environment states were extracted from pixel information. Unfortunately, these environments are far from being applicable to highly dynamic and complex real-world tasks as in autonomous control of a fighter aircraft since these environments only involve 2D representation of a visual world. Here, we present a semi-realistic flight simulation environment Harfang3D Dog-Fight Sandbox for fighter aircrafts. It is aimed to be a flexible toolbox for the investigation of main challenges in aviation studies using Reinforcement Learning. The program provides easy access to flight dynamics model, environment states, and aerodynamics of the plane enabling user to customize any specific task in order to build intelligent decision making (control) systems via RL. The software also allows deployment of bot aircrafts and development of multi-agent tasks. This way, multiple groups of aircrafts can be configured to be competitive or cooperative agents to perform complicated tasks including Dog Fight. During the experiments, we carried out training for two different scenarios: navigating to a designated location and within visual range (WVR) combat, shortly Dog Fight. Using Deep Reinforcement Learning techniques for both scenarios, we were able to train competent agents that exhibit human-like behaviours. Based on this results, it is confirmed that Harfang3D Dog-Fight Sandbox can be utilized as a 3D realistic RL research platform. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 18 pages, 18 figures,4 tables

arXiv:2207.03644 [pdf, other]

Pruning Early Exit Networks

Authors: Alperen Görmez, Erdem Koyuncu

Abstract: Deep learning models that perform well often have high computational costs. In this paper, we combine two approaches that try to reduce the computational cost while keeping the model performance high: pruning and early exit networks. We evaluate two approaches of pruning early exit networks: (1) pruning the entire network at once, (2) pruning the base network and additional linear classifiers in a… ▽ More Deep learning models that perform well often have high computational costs. In this paper, we combine two approaches that try to reduce the computational cost while keeping the model performance high: pruning and early exit networks. We evaluate two approaches of pruning early exit networks: (1) pruning the entire network at once, (2) pruning the base network and additional linear classifiers in an ordered fashion. Experimental results show that pruning the entire network at once is a better strategy in general. However, at high accuracy rates, the two approaches have a similar performance, which implies that the processes of pruning and early exit can be separated without loss of optimality. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 5 pages, 3 figures, Sparsity in Neural Networks Workshop 2022

arXiv:2206.05093 [pdf, other]

Federated Momentum Contrastive Clustering

Authors: Runxuan Miao, Erdem Koyuncu

Abstract: We present federated momentum contrastive clustering (FedMCC), a learning framework that can not only extract discriminative representations over distributed local data but also perform data clustering. In FedMCC, a transformed data pair passes through both the online and target networks, resulting in four representations over which the losses are determined. The resulting high-quality representat… ▽ More We present federated momentum contrastive clustering (FedMCC), a learning framework that can not only extract discriminative representations over distributed local data but also perform data clustering. In FedMCC, a transformed data pair passes through both the online and target networks, resulting in four representations over which the losses are determined. The resulting high-quality representations generated by FedMCC can outperform several existing self-supervised learning methods for linear evaluation and semi-supervised learning tasks. FedMCC can easily be adapted to ordinary centralized clustering through what we call momentum contrastive clustering (MCC). We show that MCC achieves state-of-the-art clustering accuracy results in certain datasets such as STL-10 and ImageNet-10. We also present a method to reduce the memory footprint of our clustering schemes. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: Originally submitted March 2022

arXiv:2201.05528 [pdf]

Reinforcement Learning based Air Combat Maneuver Generation

Authors: Muhammed Murat Ozbek, Emre Koyuncu

Abstract: The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been… ▽ More The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations. △ Less

Submitted 14 January, 2022; originally announced January 2022.

arXiv:2110.12065 [pdf, other]

Multiplication-Avoiding Variant of Power Iteration with Applications

Authors: Hongyi Pan, Diaa Badawi, Runxuan Miao, Erdem Koyuncu, Ahmet Enis Cetin

Abstract: Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that… ▽ More Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that appear at the regular power iteration (RPI) with multiplication-free vector products which are Mercer-type kernel operations related with the $\ell_1$ norm. Precisely, for an $n\times n$ matrix, MAPI requires $n$ multiplications, while RPI needs $n^2$ multiplications per iteration. Therefore, MAPI provides a significant reduction of the number of multiplication operations, which are known to be costly in terms of energy consumption. We provide applications of MAPI to PCA-based image reconstruction as well as to graph-based ranking algorithms. When compared to RPI, MAPI not only typically converges much faster, but also provides superior performance. △ Less

Submitted 31 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: This is the technique report for the paper "MULTIPLICATION-AVOIDING VARIANT OF POWER ITERATION WITH APPLICATIONS", which has been accepted by ICASSP 2022

arXiv:2105.11634 [pdf, other]

Robust Principal Component Analysis Using a Novel Kernel Related with the L1-Norm

Authors: Hongyi Pan, Diaa Badawi, Erdem Koyuncu, A. Enis Cetin

Abstract: We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite ge… ▽ More We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite generalized covariance matrices, thus enabling principal component analysis (PCA). Moreover, the generalized covariance matrices can be constructed in an Energy Efficient (EEF) manner due to the multiplication-free property of the underlying vector products. We present image reconstruction examples in which our EEF PCA method result in the highest peak signal-to-noise ratios compared to the ordinary $\ell_2$-PCA and the recursive $\ell_1$-PCA. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 6 pages, 3 tables and one figure

arXiv:2103.01148 [pdf, ps, other]

doi 10.1109/IJCNN55064.2022.9891952

E$^2$CM: Early Exit via Class Means for Efficient Supervised and Unsupervised Learning

Authors: Alperen Görmez, Venkat R. Dasari, Erdem Koyuncu

Abstract: State-of-the-art neural networks with early exit mechanisms often need considerable amount of training and fine tuning to achieve good performance with low computational cost. We propose a novel early exit technique, Early Exit Class Means (E$^2$CM), based on class means of samples. Unlike most existing schemes, E$^2$CM does not require gradient-based training of internal classifiers and it does n… ▽ More State-of-the-art neural networks with early exit mechanisms often need considerable amount of training and fine tuning to achieve good performance with low computational cost. We propose a novel early exit technique, Early Exit Class Means (E$^2$CM), based on class means of samples. Unlike most existing schemes, E$^2$CM does not require gradient-based training of internal classifiers and it does not modify the base network by any means. This makes it particularly useful for neural network training in low-power devices, as in wireless edge networks. We evaluate the performance and overheads of E$^2$CM over various base neural networks such as MobileNetV3, EfficientNet, ResNet, and datasets such as CIFAR-100, ImageNet, and KMNIST. Our results show that, given a fixed training time budget, E$^2$CM achieves higher accuracy as compared to existing early exit mechanisms. Moreover, if there are no limitations on the training time budget, E$^2$CM can be combined with an existing early exit scheme to boost the latter's performance, achieving a better trade-off between computational cost and network accuracy. We also show that E$^2$CM can be used to decrease the computational cost in unsupervised learning tasks. △ Less

Submitted 11 July, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 8 pages, 4 figures, 2 tables. Accepted to IJCNN 2022 (WCCI2022)

arXiv:2010.12546 [pdf, other]

Quantizing Multiple Sources to a Common Cluster Center: An Asymptotic Analysis

Authors: Erdem Koyuncu

Abstract: We consider quantizing an $Ld$-dimensional sample, which is obtained by concatenating $L$ vectors from datasets of $d$-dimensional vectors, to a $d$-dimensional cluster center. The distortion measure is the weighted sum of $r$th powers of the distances between the cluster center and the samples. For $L=1$, one recovers the ordinary center based clustering formulation. The general case $L>1$ appear… ▽ More We consider quantizing an $Ld$-dimensional sample, which is obtained by concatenating $L$ vectors from datasets of $d$-dimensional vectors, to a $d$-dimensional cluster center. The distortion measure is the weighted sum of $r$th powers of the distances between the cluster center and the samples. For $L=1$, one recovers the ordinary center based clustering formulation. The general case $L>1$ appears when one wishes to cluster a dataset through $L$ noisy observations of each of its members. We find a formula for the average distortion performance in the asymptotic regime where the number of cluster centers are large. We also provide an algorithm to numerically optimize the cluster centers and verify our analytical results on real and artificial datasets. In terms of faithfulness to the original (noiseless) dataset, our clustering approach outperforms the naive approach that relies on quantizing the $Ld$-dimensional noisy observation vectors to $Ld$-dimensional centers. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:1910.14096 [pdf, other]

Robust and Computationally-Efficient Anomaly Detection using Powers-of-Two Networks

Authors: Usama Muneeb, Erdem Koyuncu, Yasaman Keshtkarjahromi, Hulya Seferoglu, Mehmet Fatih Erden, Ahmet Enis Cetin

Abstract: Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of t… ▽ More Robust and computationally efficient anomaly detection in videos is a problem in video surveillance systems. We propose a technique to increase robustness and reduce computational complexity in a Convolutional Neural Network (CNN) based anomaly detector that utilizes the optical flow information of video data. We reduce the complexity of the network by denoising the intermediate layer outputs of the CNN and by using powers-of-two weights, which replaces the computationally expensive multiplication operations with bit-shift operations. Denoising operation during inference forces small valued intermediate layer outputs to zero. The number of zeros in the network significantly increases as a result of denoising, we can implement the CNN about 10% faster than a comparable network while detecting all the anomalies in the testing set. It turns out that denoising operation also provides robustness because the contribution of small intermediate values to the final result is negligible. During training we also generate motion vector images by a Generative Adversarial Network (GAN) to improve the robustness of the overall system. We experimentally observe that the resulting system is robust to background motion. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1910.13511 [pdf, other]

A Generalization of Principal Component Analysis

Authors: Samuele Battaglino, Erdem Koyuncu

Abstract: Conventional principal component analysis (PCA) finds a principal vector that maximizes the sum of second powers of principal components. We consider a generalized PCA that aims at maximizing the sum of an arbitrary convex function of principal components. We present a gradient ascent algorithm to solve the problem. For the kernel version of generalized PCA, we show that the solutions can be obtai… ▽ More Conventional principal component analysis (PCA) finds a principal vector that maximizes the sum of second powers of principal components. We consider a generalized PCA that aims at maximizing the sum of an arbitrary convex function of principal components. We present a gradient ascent algorithm to solve the problem. For the kernel version of generalized PCA, we show that the solutions can be obtained as fixed points of a simple single-layer recurrent neural network. We also evaluate our algorithms on different datasets. △ Less

Submitted 15 November, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

arXiv:1910.13502 [pdf, other]

The Tradeoff Between Coverage and Computation in Wireless Networks

Authors: Erdem Koyuncu

Abstract: We consider a distributed edge computing scenario consisting of several wireless nodes that are located over an area of interest. Specifically, some of the "master" nodes are tasked to sense the environment (e.g., by acquiring images or videos via cameras) and process the corresponding sensory data, while the other nodes are assigned as "workers" to help the computationally-intensive processing ta… ▽ More We consider a distributed edge computing scenario consisting of several wireless nodes that are located over an area of interest. Specifically, some of the "master" nodes are tasked to sense the environment (e.g., by acquiring images or videos via cameras) and process the corresponding sensory data, while the other nodes are assigned as "workers" to help the computationally-intensive processing tasks of the masters. A new tradeoff that has not been previously explored in the existing literature arises in such a formulation: On one hand, one wishes to allocate as many master nodes as possible to cover a large area for accurate monitoring. On the other hand, one also wishes to allocate as many worker nodes as possible to maximize the computation rate of the sensed data. It is in the context of this tradeoff that this work is presented. By utilizing the basic physical layer principles of wireless communication systems, we formulate and analyze the tradeoff between the coverage and computation performance of spatial networks. We also present an algorithm to find the optimal tradeoff and demonstrate its performance through numerical simulations. △ Less

Submitted 29 October, 2019; originally announced October 2019.

arXiv:1904.07368 [pdf, ps, other]

Optimal Placement of UAVs for Minimum Outage Probability

Authors: Maryam Shabanighazikelayeh, Erdem Koyuncu

Abstract: We consider multiple unmanned aerial vehicles (UAVs) serving a density of ground terminals (GTs) as base stations. The objective is to minimize the outage probability of GT-to-UAV transmissions. Optimal placement of UAVs under different UAV altitude constraints and GT densities is studied. First, using a random deployment argument, a general upper bound on the optimal outage probability is found f… ▽ More We consider multiple unmanned aerial vehicles (UAVs) serving a density of ground terminals (GTs) as base stations. The objective is to minimize the outage probability of GT-to-UAV transmissions. Optimal placement of UAVs under different UAV altitude constraints and GT densities is studied. First, using a random deployment argument, a general upper bound on the optimal outage probability is found for any density of GTs and any number of UAVs. A matching lower bound is also derived to show that the optimal outage probability decays exponentially with the number of UAVs. Next, the structure of optimal deployments is studied when the common altitude constraint is large. For a wide class of GT densities, it is shown that all UAVs should be placed to the same location in an optimal deployment. A design implication is that one can use a single multi-antenna UAV as opposed to multiple single-antenna UAVs without loss of optimality. This result is also extended to a practical variant of the Rician fading model recently developed by Azari et al. for UAV communications. Numerical deployment of UAVs in the centralized and practical distributed settings are carried out using the particle swarm optimization and modified gradient descent algorithms, respectively. △ Less

Submitted 14 August, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

arXiv:1811.11331 [pdf, other]

Asynchronous Local Construction of Bounded-Degree Network Topologies Using Only Neighborhood Information

Authors: Erdem Koyuncu, Hamid Jafarkhani

Abstract: We consider ad-hoc networks consisting of $n$ wireless nodes that are located on the plane. Any two given nodes are called neighbors if they are located within a certain distance (communication range) from one another. A given node can be directly connected to any one of its neighbors and picks its connections according to a unique topology control algorithm that is available at every node. Given… ▽ More We consider ad-hoc networks consisting of $n$ wireless nodes that are located on the plane. Any two given nodes are called neighbors if they are located within a certain distance (communication range) from one another. A given node can be directly connected to any one of its neighbors and picks its connections according to a unique topology control algorithm that is available at every node. Given that each node knows only the indices (unique identification numbers) of its one- and two-hop neighbors, we identify an algorithm that preserves connectivity and can operate without the need of any synchronization among nodes. Moreover, the algorithm results in a sparse graph with at most $5n$ edges and a maximum node degree of $10$. Existing algorithms with the same promises further require neighbor distance and/or direction information at each node. We also evaluate the performance of our algorithm for random networks. In this case, our algorithm provides an asymptotically connected network with $n(1+o(1))$ edges with a degree less than or equal to $6$ for $1-o(1)$ fraction of the nodes. We also introduce another asynchronous connectivity-preserving algorithm that can provide an upper bound as well as a lower bound on node degrees. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: To appear in IEEE Transactions on Communications

arXiv:1808.05410 [pdf, ps, other]

Interleaving Channel Estimation and Limited Feedback for Point-to-Point Systems with a Large Number of Transmit Antennas

Authors: Erdem Koyuncu, Xun Zou, Hamid Jafarkhani

Abstract: We introduce and investigate the opportunities of multi-antenna communication schemes whose training and feedback stages are interleaved and mutually interacting. Specifically, unlike the traditional schemes where the transmitter first trains all of its antennas at once and then receives a single feedback message, we consider a scenario where the transmitter instead trains its antennas one by one… ▽ More We introduce and investigate the opportunities of multi-antenna communication schemes whose training and feedback stages are interleaved and mutually interacting. Specifically, unlike the traditional schemes where the transmitter first trains all of its antennas at once and then receives a single feedback message, we consider a scenario where the transmitter instead trains its antennas one by one and receives feedback information immediately after training each one of its antennas. The feedback message may ask the transmitter to train another antenna; or, it may terminate the feedback/training phase and provide the quantized codeword (e.g., a beamforming vector) to be utilized for data transmission. As a specific application, we consider a multiple-input single-output system with $t$ transmit antennas, a short-term power constraint $P$, and target data rate $ρ$. We show that for any $t$, the same outage probability as a system with perfect transmitter and receiver channel state information can be achieved with a feedback rate of $R_1$ bits per channel state and via training $R_2$ transmit antennas on average, where $R_1$ and $R_2$ are independent of $t$, and depend only on $ρ$ and $P$. In addition, we design variable-rate quantizers for channel coefficients to further minimize the feedback rate of our scheme. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: To appear in IEEE Transactions on Wireless Communications

arXiv:1803.04315 [pdf, other]

Power-Efficient Deployment of UAVs as Relays

Authors: Erdem Koyuncu

Abstract: Optimal deployment of unmanned aerial vehicles (UAVs) as communication relays is studied for fixed-rate variable-power systems. The considered setup is a set of ground transmitters (GTs) wishing to communicate with a set of ground receivers (GRs) through the UAVs. Each GT-GR pair communicates through only one selected UAV and have no direct link. Two different UAV selection scenarios are studied:… ▽ More Optimal deployment of unmanned aerial vehicles (UAVs) as communication relays is studied for fixed-rate variable-power systems. The considered setup is a set of ground transmitters (GTs) wishing to communicate with a set of ground receivers (GRs) through the UAVs. Each GT-GR pair communicates through only one selected UAV and have no direct link. Two different UAV selection scenarios are studied: In centralized selection, a decision center assigns an optimal UAV depending on the locations of all terminals. In distributed selection, a GT selects its relaying UAV using only the local knowledge of its distances to the UAVs. For both selection scenarios, the optimal tradeoff between the UAV and GT power consumptions are determined using tools from quantization theory. Specifically, the two extremal regimes of one UAV and very large number of UAVs are analyzed for a path loss exponent of $2$. Numerical optimization of UAV locations are also discussed. Simulations are provided to confirm the analytical findings. △ Less

Submitted 27 November, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

Comments: This work has been presented at the IEEE Signal Processing Advances in Wireless Communications (SPAWC), June 2018

arXiv:1708.08832 [pdf, other]

Deployment and Trajectory Optimization of UAVs: A Quantization Theory Approach

Authors: Erdem Koyuncu, Maryam Shabanighazikelayeh, Hulya Seferoglu

Abstract: Optimal deployment and movement of multiple unmanned aerial vehicles (UAVs) is studied. The considered scenario consists of several ground terminals (GTs) communicating with the UAVs using variable transmission power and fixed data rate. First, the static case of a fixed geographical GT density is analyzed. Using high resolution quantization theory, the corresponding best achievable performance (m… ▽ More Optimal deployment and movement of multiple unmanned aerial vehicles (UAVs) is studied. The considered scenario consists of several ground terminals (GTs) communicating with the UAVs using variable transmission power and fixed data rate. First, the static case of a fixed geographical GT density is analyzed. Using high resolution quantization theory, the corresponding best achievable performance (measured in terms of the average GT transmission power) is determined in the asymptotic regime of a large number of UAVs. Next, the dynamic case where the GT density is allowed to vary periodically through time is considered. For one-dimensional networks, an accurate formula for the total UAV movement that guarantees the best time-averaged performance is determined. In general, the tradeoff between the total UAV movement and the achievable performance is obtained through a Lagrangian approach. A corresponding trajectory optimization algorithm is introduced and shown to guarantee a convergent Lagrangian. Numerical simulations confirm the analytical findings. Extensions to different system models and performance measures are also discussed. △ Less

Submitted 27 November, 2018; v1 submitted 29 August, 2017; originally announced August 2017.

Comments: To appear in IEEE Transactions on Wireless Communications. Part of this work will be presented at IEEE WCNC 2018

arXiv:1708.06400 [pdf, other]

Performance Gains of Optimal Antenna Deployment for Massive MIMO Systems

Authors: Erdem Koyuncu

Abstract: We consider the uplink of a single-cell multi-user multiple-input multiple-output (MIMO) system with several single antenna transmitters/users and one base station with $N$ antennas in the $N\rightarrow\infty$ regime. The base station antennas are evenly distributed to $n$ admissable locations throughout the cell. First, we show that a reliable (per-user) rate of $O(\log n)$ is achievable throug… ▽ More We consider the uplink of a single-cell multi-user multiple-input multiple-output (MIMO) system with several single antenna transmitters/users and one base station with $N$ antennas in the $N\rightarrow\infty$ regime. The base station antennas are evenly distributed to $n$ admissable locations throughout the cell. First, we show that a reliable (per-user) rate of $O(\log n)$ is achievable through optimal locational optimization of base station antennas. We also prove that an $O(\log n)$ rate is the best possible. Therefore, in contrast to a centralized or circular deployment, where the achievable rate is at most a constant, the rate with a general deployment can grow logarithmically with $n$, resulting in a certain form of "macromultiplexing gain." Second, using tools from high-resolution quantization theory, we derive an accurate formula for the best achievable rate given any $n$ and any user density function. According to our formula, the dependence of the optimal rate on the user density function $f$ is curiously only through the differential entropy of $f$. In fact, the optimal rate decreases linearly with the differential entropy, and the worst-case scenario is a uniform user density. Numerical simulations confirm our analytical findings. △ Less

Submitted 21 August, 2017; originally announced August 2017.

Comments: GLOBECOM 2017

arXiv:1610.03019 [pdf, other]

Energy Efficiency in Two-Tiered Wireless Sensor Networks

Authors: Jun Guo, Erdem Koyuncu, Hamid Jafarkhani

Abstract: We study a two-tiered wireless sensor network (WSN) consisting of $N$ access points (APs) and $M$ base stations (BSs). The sensing data, which is distributed on the sensing field according to a density function $f$, is first transmitted to the APs and then forwarded to the BSs. Our goal is to find an optimal deployment of APs and BSs to minimize the average weighted total, or Lagrangian, of sensor… ▽ More We study a two-tiered wireless sensor network (WSN) consisting of $N$ access points (APs) and $M$ base stations (BSs). The sensing data, which is distributed on the sensing field according to a density function $f$, is first transmitted to the APs and then forwarded to the BSs. Our goal is to find an optimal deployment of APs and BSs to minimize the average weighted total, or Lagrangian, of sensor and AP powers. For $M=1$, we show that the optimal deployment of APs is simply a linear transformation of the optimal $N$-level quantizer for density $f$, and the sole BS should be located at the geometric centroid of the sensing field. Also, for a one-dimensional network and uniform $f$, we determine the optimal deployment of APs and BSs for any $N$ and $M$. Moreover, to numerically optimize node deployment for general scenarios, we propose one- and two-tiered Lloyd algorithms and analyze their convergence properties. Simulation results show that, when compared to random deployment, our algorithms can save up to 79\% of the power on average. △ Less

Submitted 19 February, 2017; v1 submitted 10 October, 2016; originally announced October 2016.

Comments: 11 pages, 7 figures

arXiv:1601.07597 [pdf, ps, other]

Flow Control and Scheduling for Shared FIFO Queues over Wireless Networks

Authors: Shanyu Zhou, Hulya Seferoglu, Erdem Koyuncu

Abstract: We investigate the performance of First-In, First-Out (FIFO) queues over wireless networks. We characterize the stability region of a general scenario where an arbitrary number of FIFO queues, which are served by a wireless medium, are shared by an arbitrary number of flows. In general, the stability region of this system is non-convex. Thus, we develop a convex inner-bound on the stability region… ▽ More We investigate the performance of First-In, First-Out (FIFO) queues over wireless networks. We characterize the stability region of a general scenario where an arbitrary number of FIFO queues, which are served by a wireless medium, are shared by an arbitrary number of flows. In general, the stability region of this system is non-convex. Thus, we develop a convex inner-bound on the stability region, which is provably tight in certain cases. The convexity of the inner bound allows us to develop a resource allocation scheme; dFC. Based on the structure of dFC, we develop a stochastic flow control and scheduling algorithm; qFC. We show that qFC achieves optimal operating point in the convex inner bound. Simulation results show that our algorithms significantly improve the throughput of wireless networks with FIFO queues, as compared to the well-known queue-based flow control and max-weight scheduling. △ Less

Submitted 27 January, 2016; originally announced January 2016.

arXiv:1403.7846 [pdf, ps, other]

Distributed Channel Quantization for Two-User Interference Networks

Authors: Xiaoyi Leo Liu, Erdem Koyuncu, Hamid Jafarkhani

Abstract: We introduce conferencing-based distributed channel quantizers for two-user interference networks where interference signals are treated as noise. Compared with the conventional distributed quantizers where each receiver quantizes its own channel independently, the proposed quantizers allow multiple rounds of feedback communication in the form of conferencing between receivers. We take the network… ▽ More We introduce conferencing-based distributed channel quantizers for two-user interference networks where interference signals are treated as noise. Compared with the conventional distributed quantizers where each receiver quantizes its own channel independently, the proposed quantizers allow multiple rounds of feedback communication in the form of conferencing between receivers. We take the network outage probabilities of sum rate and minimum rate as performance measures and consider quantizer design in the transmission strategies of time sharing and interference transmission. First, we propose distributed quantizers that achieve the optimal network outage probability of sum rate for both time sharing and interference transmission strategies with an average feedback rate of only two bits per channel state. Then, for the time sharing strategy, we propose a distributed quantizer that achieves the optimal network outage probability of minimum rate with finite average feedback rate; conventional quantizers require infinite rate to achieve the same performance. For the interference transmission strategy, a distributed quantizer that can approach the optimal network outage probability of minimum rate closely is also proposed. Numerical simulations confirm that our distributed quantizers based on conferencing outperform the conventional ones. △ Less

Submitted 30 March, 2014; originally announced March 2014.

Comments: 30 pages, 4 figures

arXiv:1301.6398 [pdf, ps, other]

Variable-Length Channel Quantizers for Maximum Diversity and Array Gains

Authors: Erdem Koyuncu, Hamid Jafarkhani

Abstract: We consider a $t \times 1$ multiple-antenna fading channel with quantized channel state information at the transmitter (CSIT). Our goal is to maximize the diversity and array gains that are associated with the symbol error rate (SER) performance of the system. It is well-known that for both beamforming and precoding strategies, finite-rate fixed-length quantizers (FLQs) cannot achieve the full-CSI… ▽ More We consider a $t \times 1$ multiple-antenna fading channel with quantized channel state information at the transmitter (CSIT). Our goal is to maximize the diversity and array gains that are associated with the symbol error rate (SER) performance of the system. It is well-known that for both beamforming and precoding strategies, finite-rate fixed-length quantizers (FLQs) cannot achieve the full-CSIT diversity and array gains. In this work, for any function $f(P)\inω(1)$, we construct variable-length quantizers (VLQs) that can achieve these full-CSIT gains with rates $1+(f(P) \log P)/P$ and $1+f(P)/P^t$ for the beamforming and precoding strategies, respectively, where $P$ is the power constraint of the transmitter. We also show that these rates are the best possible up to $o(1)$ multipliers in their $P$-dependent terms. In particular, although the full-CSIT SER is not achievable at any (even infinite) feedback rate, the full-CSIT diversity and array gains can be achieved with a feedback rate of 1 bit per channel state asymptotically. △ Less

Submitted 27 January, 2013; originally announced January 2013.

arXiv:1210.8441 [pdf, ps, other]

Very Low-Rate Variable-Length Channel Quantization for Minimum Outage Probability

Authors: Erdem Koyuncu, Hamid Jafarkhani

Abstract: We identify a practical vector quantizer design problem where any fixed-length quantizer (FLQ) yields non-zero distortion at any finite rate, while there is a variable-length quantizer (VLQ) that can achieve zero distortion with arbitrarily low rate. The problem arises in a $t \times 1$ multiple-antenna fading channel where we would like to minimize the channel outage probability by employing beam… ▽ More We identify a practical vector quantizer design problem where any fixed-length quantizer (FLQ) yields non-zero distortion at any finite rate, while there is a variable-length quantizer (VLQ) that can achieve zero distortion with arbitrarily low rate. The problem arises in a $t \times 1$ multiple-antenna fading channel where we would like to minimize the channel outage probability by employing beamforming via quantized channel state information at the transmitter (CSIT). It is well-known that in such a scenario, finite-rate FLQs cannot achieve the full-CSIT (zero distortion) outage performance. We construct VLQs that can achieve the full-CSIT performance with finite rate. In particular, with $P$ denoting the power constraint of the transmitter, we show that the necessary and sufficient VLQ rate that guarantees the full-CSIT performance is $Θ(1/P)$. We also discuss several extensions (e.g. to precoding) of this result. △ Less

Submitted 31 October, 2012; originally announced October 2012.

arXiv:1011.5699 [pdf, ps, other]

The Necessity of Relay Selection

Authors: Erdem Koyuncu, Hamid Jafarkhani

Abstract: We determine necessary conditions on the structure of symbol error rate (SER) optimal quantizers for limited feedback beamforming in wireless networks with one transmitter-receiver pair and R parallel amplify-and-forward relays. We call a quantizer codebook "small" if its cardinality is less than R, and "large" otherwise. A "d-codebook" depends on the power constraints and can be optimized accordi… ▽ More We determine necessary conditions on the structure of symbol error rate (SER) optimal quantizers for limited feedback beamforming in wireless networks with one transmitter-receiver pair and R parallel amplify-and-forward relays. We call a quantizer codebook "small" if its cardinality is less than R, and "large" otherwise. A "d-codebook" depends on the power constraints and can be optimized accordingly, while an "i-codebook" remains fixed. It was previously shown that any i-codebook that contains the single-relay selection (SRS) codebook achieves the full-diversity order, R. We prove the following: Every full-diversity i-codebook contains the SRS codebook, and thus is necessarily large. In general, as the power constraints grow to infinity, the limit of an optimal large d-codebook contains an SRS codebook, provided that it exists. For small codebooks, the maximal diversity is equal to the codebook cardinality. Every diversity-optimal small i-codebook is an orthogonal multiple-relay selection (OMRS) codebook. Moreover, the limit of an optimal small d-codebook is an OMRS codebook. We observe that SRS is nothing but a special case of OMRS for codebooks with cardinality equal to R. As a result, we call OMRS as "the universal necessary condition" for codebook optimality. Finally, we confirm our analytical findings through simulations. △ Less

Submitted 25 November, 2010; originally announced November 2010.

Comments: 29 pages, 4 figures

arXiv:1007.5514 [pdf, ps, other]

Distributed Beamforming in Wireless Multiuser Relay-Interference Networks with Quantized Feedback

Authors: Erdem Koyuncu, Hamid Jafarkhani

Abstract: We study quantized beamforming in wireless amplify-and-forward relay-interference networks with any number of transmitters, relays, and receivers. We design the quantizer of the channel state information to minimize the probability that at least one receiver incorrectly decodes its desired symbol(s). Correspondingly, we introduce a generalized diversity measure that encapsulates the conventional o… ▽ More We study quantized beamforming in wireless amplify-and-forward relay-interference networks with any number of transmitters, relays, and receivers. We design the quantizer of the channel state information to minimize the probability that at least one receiver incorrectly decodes its desired symbol(s). Correspondingly, we introduce a generalized diversity measure that encapsulates the conventional one as the first-order diversity. Additionally, it incorporates the second-order diversity, which is concerned with the transmitter power dependent logarithmic terms that appear in the error rate expression. First, we show that, regardless of the quantizer and the amount of feedback that is used, the relay-interference network suffers a second-order diversity loss compared to interference-free networks. Then, two different quantization schemes are studied: First, using a global quantizer, we show that a simple relay selection scheme can achieve maximal diversity. Then, using the localization method, we construct both fixed-length and variable-length local (distributed) quantizers (fLQs and vLQs). Our fLQs achieve maximal first-order diversity, whereas our vLQs achieve maximal diversity. Moreover, we show that all the promised diversity and array gains can be obtained with arbitrarily low feedback rates when the transmitter powers are sufficiently large. Finally, we confirm our analytical findings through simulations. △ Less

Submitted 30 July, 2010; originally announced July 2010.

Comments: 41 pages, 14 figures, submitted to IEEE Transactions on Information Theory, July 2010. This work was presented in part at IEEE Global Communications Conference (GLOBECOM), Nov. 2009

Showing 1–31 of 31 results for author: Koyuncu, E