Search | arXiv e-print repository

OffRAC: Offloading Through Remote Accelerator Calls

Authors: Ziyi Yang, Krishnan B. Iyer, Yixi Chen, Ran Shu, Zsolt István, Marco Canini, Suhaib A. Fahmy

Abstract: Modern applications increasingly demand ultra-low latency for data processing, often facilitated by host-controlled accelerators like GPUs and FPGAs. However, significant delays result from host involvement in accessing accelerators. To address this limitation, we introduce a novel paradigm we call Offloading through Remote Accelerator Calls (OffRAC), which elevates accelerators to first-class com… ▽ More Modern applications increasingly demand ultra-low latency for data processing, often facilitated by host-controlled accelerators like GPUs and FPGAs. However, significant delays result from host involvement in accessing accelerators. To address this limitation, we introduce a novel paradigm we call Offloading through Remote Accelerator Calls (OffRAC), which elevates accelerators to first-class compute resources. OffRAC enables direct calls to FPGA-based accelerators without host involvement. Utilizing the stateless function abstraction of serverless computing, with applications decomposed into simpler stateless functions, offloading promotes efficient acceleration and distribution of computational loads across the network. To realize this proposal, we present a prototype design and implementation of an OffRAC platform for FPGAs that assembles diverse requests from multiple clients into complete accelerator calls with multi-tenancy performance isolation. This design minimizes the implementation complexity for accelerator users while ensuring isolation and programmability. Results show that the OffRAC approach reduces the latency of network calls to accelerators down to approximately 10.5 us, as well as sustaining high application throughput up to 85Gbps, demonstrating scalability and efficiency, making it compelling for the next generation of low-latency applications. △ Less

Submitted 8 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

Comments: 19 pages

arXiv:2411.19730 [pdf, other]

Ten Ways in which Virtual Reality Differs from Video Streaming

Authors: Gustavo de Veciana, Sonia Fahmy, George Kesidis, Voicu Popescu

Abstract: Virtual Reality (VR) applications have a number of unique characteristics that set them apart from traditional video streaming. These characteristics have major implications on the design of VR rendering, adaptation, prefetching, caching, and transport mechanisms. This paper contrasts VR to video streaming, stored 2D video streaming in particular, and discusses how to rethink system and network su… ▽ More Virtual Reality (VR) applications have a number of unique characteristics that set them apart from traditional video streaming. These characteristics have major implications on the design of VR rendering, adaptation, prefetching, caching, and transport mechanisms. This paper contrasts VR to video streaming, stored 2D video streaming in particular, and discusses how to rethink system and network support for VR. △ Less

Submitted 29 November, 2024; originally announced November 2024.

arXiv:2409.02553 [pdf, other]

ResiLogic: Leveraging Composability and Diversity to Design Fault and Intrusion Resilient Chips

Authors: Ahmad T. Sheikh, Ali Shoker, Suhaib A. Fahmy, Paulo Esteves-Verissimo

Abstract: A long-standing challenge is the design of chips resilient to faults and glitches. Both fine-grained gate diversity and coarse-grained modular redundancy have been used in the past. However, these approaches have not been well-studied under other threat models where some stakeholders in the supply chain are untrusted. Increasing digital sovereignty tensions raise concerns regarding the use of fore… ▽ More A long-standing challenge is the design of chips resilient to faults and glitches. Both fine-grained gate diversity and coarse-grained modular redundancy have been used in the past. However, these approaches have not been well-studied under other threat models where some stakeholders in the supply chain are untrusted. Increasing digital sovereignty tensions raise concerns regarding the use of foreign off-the-shelf tools and IPs, or off-sourcing fabrication, driving research into the design of resilient chips under this threat model. This paper addresses a threat model considering three pertinent attacks to resilience: distribution, zonal, and compound attacks. To mitigate these attacks, we introduce the \texttt{ResiLogic} framework that exploits \textit{Diversity by Composability}: constructing diverse circuits composed of smaller diverse ones by design. This gives designer the capability to create circuits at design time without requiring extra redundancy in space or cost. Using this approach at different levels of granularity is shown to improve the resilience of circuit design in \texttt{ResiLogic} against the three considered attacks by a factor of five. Additionally, we also make a case to show how E-Graphs can be utilized to generate diverse circuits under given rewrite rules. △ Less

Submitted 15 April, 2025; v1 submitted 4 September, 2024; originally announced September 2024.

arXiv:2406.18117 [pdf, other]

Resilient and Secure Programmable System-on-Chip Accelerator Offload

Authors: Inês Pinto Gouveia, Ahmad T. Sheikh, Ali Shoker, Suhaib A. Fahmy, Paulo Esteves-Verissimo

Abstract: Computational offload to hardware accelerators is gaining traction due to increasing computational demands and efficiency challenges. Programmable hardware, like FPGAs, offers a promising platform in rapidly evolving application areas, with the benefits of hardware acceleration and software programmability. Unfortunately, such systems composed of multiple hardware components must consider integrit… ▽ More Computational offload to hardware accelerators is gaining traction due to increasing computational demands and efficiency challenges. Programmable hardware, like FPGAs, offers a promising platform in rapidly evolving application areas, with the benefits of hardware acceleration and software programmability. Unfortunately, such systems composed of multiple hardware components must consider integrity in the case of malicious components. In this work, we propose Samsara, the first secure and resilient platform that derives, from Byzantine Fault Tolerant (BFT), protocols to enhance the computing resilience of programmable hardware. Samsara uses a novel lightweight hardware-based BFT protocol for Systems-on-Chip, called H-Quorum, that implements the theoretical-minimum latency between applications and replicated compute nodes. To withstand malicious behaviors, Samsara supports hardware rejuvenation, which is used to replace, relocate, or diversify faulty compute nodes. Samsara's architecture ensures the security of the entire workflow while keeping the latency overhead, of both computation and rejuvenation, close to the non-replicated counterpart. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: To be published in The 43rd International Symposium on Reliable Distributed Systems (SRDS 2024)

arXiv:2311.17521 [pdf]

doi 10.1088/1742-6596/2128/1/012015

Spinal Muscle Atrophy Disease Modelling as Bayesian Network

Authors: Mohammed Ezzat Helal, Manal Ezzat Helal, Sherif Fadel Fahmy

Abstract: We investigate the molecular gene expressions studies and public databases for disease modelling using Probabilistic Graphical Models and Bayesian Inference. A case study on Spinal Muscle Atrophy Genome-Wide Association Study results is modelled and analyzed. The genes up and down-regulated in two stages of the disease development are linked to prior knowledge published in the public domain and co… ▽ More We investigate the molecular gene expressions studies and public databases for disease modelling using Probabilistic Graphical Models and Bayesian Inference. A case study on Spinal Muscle Atrophy Genome-Wide Association Study results is modelled and analyzed. The genes up and down-regulated in two stages of the disease development are linked to prior knowledge published in the public domain and co-expressions network is created and analyzed. The Molecular Pathways triggered by these genes are identified. The Bayesian inference posteriors distributions are estimated using a variational analytical algorithm and a Markov chain Monte Carlo sampling algorithm. Assumptions, limitations and possible future work are concluded. △ Less

Submitted 29 November, 2023; originally announced November 2023.

ACM Class: I.2.4

Journal ref: Journal of Physics: Conference Series 2128 (2021) 012015

arXiv:2306.09831 [pdf]

INDCOR White Paper 5: Addressing Societal Issues in Interactive Digital Narratives

Authors: Claudia Silva, Juan Miguel Aguado, Dren Gerguri, Ledia Kazazi, Bjorn Berg Marklund, Rocio Zamora Medina, Shahira S. Fahmy, Jose Manuel Noguera Vivo, Eliane Bettocchi, Tao Papaioannou, Maite Gil, Lissa Holloway-Attaway, Hartmut Koenitz

Abstract: This white paper introduces Interactive Digital Narratives (IDN) as a powerful tool for tackling the complex challenges we face in today's society. In the scope of COST Action 18230 - Interactive Narrative Design for Complexity Representation (INDCOR), a group of researchers dedicated to studying media selected five case studies of IDNs, including educational games and news media, that confront an… ▽ More This white paper introduces Interactive Digital Narratives (IDN) as a powerful tool for tackling the complex challenges we face in today's society. In the scope of COST Action 18230 - Interactive Narrative Design for Complexity Representation (INDCOR), a group of researchers dedicated to studying media selected five case studies of IDNs, including educational games and news media, that confront and challenge the existing traditional media landscape. These case studies cover a wide range of important societal issues, such as racism, coloniality, cultural heritage, war, and disinformation. By exploring this broad range of examples, we aim to demonstrate how IDN can effectively address social complexity in an interactive, participatory, and engaging manner. We encourage you to examine these cases and discover for yourself how IDN can be used as a creative tool to address complex societal issues. This white paper might be inspiring for journalists, digital content creators, game designers, developers, educators using information and communication technologies in the classroom, or anyone interested in learning how to use IDN tools to tackle complex societal issues. In this sense, along with key scientific references, we offer key takeaways at the end of this white paper that might be helpful for media practitioners at large, in two main ways: 1) Designing IDNs to address complex societal issues and 2) Using IDNs to engage audiences with complex societal issues. △ Less

Submitted 27 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2201.03950 [pdf, other]

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

Authors: Kamalavasan Kamalakkannan, Istvan Z. Reguly, Suhaib A. Fahmy, Gihan R. Mudalige

Abstract: We present a design space exploration for synthesizing optimized, high-throughput implementations of multiple multi-dimensional tridiagonal system solvers on FPGAs. Re-evaluating the characteristics of algorithms for the direct solution of tridiagonal systems, we develop a new tridiagonal solver library aimed at implementing high-performance computing applications on Xilinx FPGA hardware. Key new… ▽ More We present a design space exploration for synthesizing optimized, high-throughput implementations of multiple multi-dimensional tridiagonal system solvers on FPGAs. Re-evaluating the characteristics of algorithms for the direct solution of tridiagonal systems, we develop a new tridiagonal solver library aimed at implementing high-performance computing applications on Xilinx FPGA hardware. Key new features of the library are (1) the unification of standard state-of-the-art techniques for implementing implicit numerical solvers with a number of novel high-gain optimizations such as vectorization and batching, motivated by multi-dimensional systems in real-world applications, (2) data-flow techniques that provide application specific optimizations for both 2D and 3D problems, including integration of explicit loops commonplace in real workloads, and (3) the development of an analytic model to explore the design space, and obtain rapid performance estimates. The new library provide an order of magnitude better performance for solving large batches of systems compared to Xilinx's current tridiagonal solver library. Two representative applications are implemented using the new solver on a Xilinx Alveo U280 FPGA, demonstrating over 85% predictive model accuracy. These are compared with a current state-of-the-art GPU library for solving multi-dimensional tridiagonal systems on an Nvidia V100 GPU, analyzing time to solution, bandwidth, and energy consumption. Results show the FPGAs achieving competitive or better runtime performance for a range of multi-dimensional problems compared to the V100 GPU. Additionally, the significant energy savings offered by FPGA implementations, over 30% for the most complex application, are quantified. We discuss the algorithmic trade-offs required to obtain good performance on FPGAs, giving insights into the feasibility and profitability of FPGA implementations. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: Under review

arXiv:2111.01108 [pdf, other]

doi 10.1145/3552326.3567485

Resource-Efficient Federated Learning

Authors: Ahmed M. Abdelmoniem, Atal Narayan Sahu, Marco Canini, Suhaib A. Fahmy

Abstract: Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant s… ▽ More Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device capabilities, and participant availability as deployments scale, which can impact both model convergence and bias. Existing FL schemes use random participant selection to improve fairness; however, this can result in inefficient use of resources and lower quality training. In this work, we systematically address the question of resource efficiency in FL, showing the benefits of intelligent participant selection, and incorporation of updates from straggling participants. We demonstrate how these factors enable resource efficiency while also improving trained model quality. △ Less

Submitted 4 November, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted to appear in ACM EuroSys 2023

arXiv:2101.01177 [pdf, other]

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

Authors: Kamalavasan Kamalakkannan, Gihan R. Mudalige, Istvan Z. Reguly, Suhaib A. Fahmy

Abstract: This paper presents a workflow for synthesizing near-optimal FPGA implementations for structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class, its computation-communication pattern, and the architectural capabilities of the FPGA to accelerate solvers from the high-performance computing domain. Key new features of the workflow are (… ▽ More This paper presents a workflow for synthesizing near-optimal FPGA implementations for structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class, its computation-communication pattern, and the architectural capabilities of the FPGA to accelerate solvers from the high-performance computing domain. Key new features of the workflow are (1) the unification of standard state-of-the-art techniques with a number of high-gain optimizations such as batching and spatial blocking/tiling, motivated by increasing throughput for real-world work loads and (2) the development and use of a predictive analytic model for exploring the design space, resource estimates and performance. Three representative applications are implemented using the design workflow on a Xilinx Alveo U280 FPGA, demonstrating near-optimal performance and over 85% predictive model accuracy. These are compared with equivalent highly-optimized implementations of the same applications on modern HPC-grade GPUs (Nvidia V100) analyzing time to solution, bandwidth and energy consumption. Performance results indicate equivalent runtime performance of the FPGA implementations to the V100 GPU, with over 2x energy savings, for the largest non-trivial application synthesized on the FPGA compared to the best performing GPU-based solution. Our investigation shows the considerable challenges in gaining high performance on current generation FPGAs compared to traditional architectures. We discuss determinants for a given stencil code to be amenable to FPGA implementation, providing insights into the feasibility and profitability of a design and its resulting performance. △ Less

Submitted 7 January, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: Preprint - Accepted to the 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2021), May 2021, Portland, Oregon USA

arXiv:2002.04186 [pdf, other]

Infinity Learning: Learning Markov Chains from Aggregate Steady-State Observations

Authors: Jianfei Gao, Mohamed A. Zahran, Amit Sheoran, Sonia Fahmy, Bruno Ribeiro

Abstract: We consider the task of learning a parametric Continuous Time Markov Chain (CTMC) sequence model without examples of sequences, where the training data consists entirely of aggregate steady-state statistics. Making the problem harder, we assume that the states we wish to predict are unobserved in the training data. Specifically, given a parametric model over the transition rates of a CTMC and some… ▽ More We consider the task of learning a parametric Continuous Time Markov Chain (CTMC) sequence model without examples of sequences, where the training data consists entirely of aggregate steady-state statistics. Making the problem harder, we assume that the states we wish to predict are unobserved in the training data. Specifically, given a parametric model over the transition rates of a CTMC and some known transition rates, we wish to extrapolate its steady state distribution to states that are unobserved. A technical roadblock to learn a CTMC from its steady state has been that the chain rule to compute gradients will not work over the arbitrarily long sequences necessary to reach steady state ---from where the aggregate statistics are sampled. To overcome this optimization challenge, we propose $\infty$-SGD, a principled stochastic gradient descent method that uses randomly-stopped estimators to avoid infinite sums required by the steady state computation, while learning even when only a subset of the CTMC states can be observed. We apply $\infty$-SGD to a real-world testbed and synthetic experiments showcasing its accuracy, ability to extrapolate the steady state distribution to unobserved states under unobserved conditions (heavy loads, when training under light loads), and succeeding in difficult scenarios where even a tailor-made extension of existing methods fails. △ Less

Submitted 10 February, 2020; originally announced February 2020.

Journal ref: Published as a conference paper at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

arXiv:1710.05154 [pdf, other]

High Throughput 2D Spatial Image Filters on FPGAs

Authors: Abdullah Al-Dujaili, Suhaib A. Fahmy

Abstract: FPGAs are well established in the signal processing domain, where their fine-grained programmable nature allows the inherent parallelism in these applications to be exploited for enhanced performance. As architectures have evolved, FPGA vendors have added more heterogeneous resources to allow often-used functions to be implemented with higher performance, at lower power and using less area. DSP bl… ▽ More FPGAs are well established in the signal processing domain, where their fine-grained programmable nature allows the inherent parallelism in these applications to be exploited for enhanced performance. As architectures have evolved, FPGA vendors have added more heterogeneous resources to allow often-used functions to be implemented with higher performance, at lower power and using less area. DSP blocks, for example, have evolved from basic multipliers to support the multiply-accumulate operations that are the core of many signal processing tasks. While more features were added to DSP blocks, their structure and connectivity has been optimised primarily for one-dimensional signal processing. Basic operations in image processing are similar, but performed in a two-dimensional structure, and hence, many of the optimisations in newer DSP blocks are not exploited when mapping image processing algorithms to them. We present a detailed study of two-dimensional spatial filter implementation on FPGAs, showing how to maximise performance through exploitation of DSP block capabilities, while also presenting a lean border pixel management policy. △ Less

Submitted 17 October, 2017; v1 submitted 14 October, 2017; originally announced October 2017.

arXiv:1706.02358 [pdf, other]

Mind Your Credit: Assessing the Health of the Ripple Credit Network

Authors: Pedro Moreno-Sanchez, Navin Modi, Raghuvir Songhela, Aniket Kate, Sonia Fahmy

Abstract: The Ripple credit network has emerged as a payment backbone with key advantages for financial institutions and the remittance industry. Its path-based IOweYou (IOU) settlements across different (crypto)currencies conceptually distinguishes the Ripple blockchain from cryptocurrencies, and makes it highly suitable to an orthogonal yet vast set of applications in the remittance world for cross-border… ▽ More The Ripple credit network has emerged as a payment backbone with key advantages for financial institutions and the remittance industry. Its path-based IOweYou (IOU) settlements across different (crypto)currencies conceptually distinguishes the Ripple blockchain from cryptocurrencies, and makes it highly suitable to an orthogonal yet vast set of applications in the remittance world for cross-border transactions and beyond. This work studies the structure and evolution of the Ripple network since its inception, and investigates its vulnerability to devilry attacks that affect the credit of linnet users' wallets. We find that about 13M USD are at risk in the current Ripple network due to inappropriate configuration of the rippling flag on credit links, facilitating undesired redistribution of credit across those links. Although the Ripple network has grown around a few highly connected hub (gateway) wallets that constitute the network's core and provide high liquidity to users, such a credit link distribution results in a user base of around 112,000 wallets that can be financially isolated by as few as 10 highly connected gateway wallets. Indeed, today about 4.9M USD cannot be withdrawn by their owners from the Ripple network due to PayRoutes, a gateway tagged as faulty by the Ripple community. Finally, we observe that stale exchange offers pose a real problem, and exchanges (market makers) have not always been vigilant about periodically updating their exchange offers according to current real-world exchange rates. For example, stale offers were used by 84 Ripple wallets to gain more than 4.5M USD from mid-July to mid-August 2017. Our findings should prompt the Ripple community to improve the health of the network by educating its users on increasing their connectivity, and by appropriately maintaining the credit limits, rippling flags, and exchange offers on their credit links. △ Less

Submitted 11 March, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1705.02730 [pdf, other]

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

Authors: Abhishek Kumar Jain, Douglas L. Maskell, Suhaib A. Fahmy

Abstract: FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the FPGA pl… ▽ More FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the FPGA place and route times) are a major stumbling block when using OpenCL tools from FPGA vendors. The long compilation times mean that the tools cannot effectively use just-in-time (JIT) compilation or runtime performance scaling. Coarse-grained overlays represent a possible solution by virtue of their coarse granularity and fast compilation. In this paper, we present a methodology for run-time compilation of OpenCL kernels to a DSP block based coarse-grained overlay, rather than directly to the fine-grained FPGA fabric. The proposed methodology allows JIT compilation and on-demand resource-aware kernel replication to better utilize available overlay resources, raising the abstraction level while reducing compile times significantly. We further demonstrate that this approach can even be used for run-time compilation of OpenCL kernels on the ARM processor of the embedded heterogeneous Zynq device. △ Less

Submitted 7 May, 2017; originally announced May 2017.

Comments: Presented at 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) arXiv:1704.08802

Report number: OLAF/2017/02

arXiv:1703.03652 [pdf, other]

doi 10.1145/2960407

Security in Automotive Networks: Lightweight Authentication and Authorization

Authors: Philipp Mundhenk, Andrew Paverd, Artur Mrowca, Sebastian Steinhorst, Martin Lukasiewycz, Suhaib A. Fahmy, Samarjit Chakraborty

Abstract: With the increasing amount of interconnections between vehicles, the attack surface of internal vehicle networks is rising steeply. Although these networks are shielded against external attacks, they often do not have any internal security to protect against malicious components or adversaries who breach the network perimeter. To secure the in-vehicle network, all communicating components must be… ▽ More With the increasing amount of interconnections between vehicles, the attack surface of internal vehicle networks is rising steeply. Although these networks are shielded against external attacks, they often do not have any internal security to protect against malicious components or adversaries who breach the network perimeter. To secure the in-vehicle network, all communicating components must be authenticated, and only authorized components should be allowed to send and receive messages. This is achieved using an authentication framework. Cryptography is widely used to authenticate communicating parties and provide secure communication channels (e.g., Internet communication). However, the real-time performance requirements of in-vehicle networks restrict the types of cryptographic algorithms and protocols that may be used. In particular, asymmetric cryptography is computationally infeasible during vehicle operation. In this work, we address the challenges of designing authentication protocols for automotive systems. We present Lightweight Authentication for Secure Automotive Networks (LASAN), a full lifecycle authentication approach. We describe the core LASAN protocols and show how they protect the internal vehicle network while complying with the real-time constraints and low computational resources of this domain. Unlike previous work, we also explain how this framework can be integrated into all aspects of the automotive lifecycle, including manufacturing, vehicle maintenance, and software updates. We evaluate LASAN in two different ways: First, we analyze the security properties of the protocols using established protocol verification techniques based on formal methods. Second, we evaluate the timing requirements of LASAN and compare these to other frameworks using a new highly modular discrete event simulator for in-vehicle networks, which we have developed for this evaluation. △ Less

Submitted 15 March, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

Comments: Authors' preprint of an article to appear in ACM Transactions on Design Automation of Electronic Systems (ACM TODAES) 2017

arXiv:1606.06460 [pdf]

An Area-Efficient FPGA Overlay using DSP Block based Time-multiplexed Functional Units

Authors: Xiangwei Li, Abhishek Jain, Douglas Maskell, Suhaib A. Fahmy

Abstract: Coarse grained overlay architectures improve FPGA design productivity by providing fast compilation and software-like programmability. Throughput oriented spatially configurable overlays typically suffer from area overheads due to the requirement of one functional unit for each compute kernel operation. Hence, these overlays have often been of limited size, supporting only relatively small compute… ▽ More Coarse grained overlay architectures improve FPGA design productivity by providing fast compilation and software-like programmability. Throughput oriented spatially configurable overlays typically suffer from area overheads due to the requirement of one functional unit for each compute kernel operation. Hence, these overlays have often been of limited size, supporting only relatively small compute kernels while consuming considerable FPGA resources. This paper examines the possibility of sharing the functional units among kernel operations for reducing area overheads. We propose a linear interconnected array of time-multiplexed FUs as an overlay architecture with reduced instruction storage and interconnect resource requirements, which uses a fully-pipelined, architecture-aware FU design supporting a fast context switching time. The results presented show a reduction of up to 85% in FPGA resource requirements compared to existing throughput oriented overlay architectures, with an operating frequency which approaches the theoretical limit for the FPGA device. △ Less

Submitted 21 June, 2016; originally announced June 2016.

Comments: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.08149

Report number: OLAF/2016/02

arXiv:1603.08015 [pdf]

On Determining the Fair Bandwidth Share for ABR Connections in ATM Networks

Authors: Sonia Fahmy, Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Bobby Vandalore

Abstract: In a multi-service network such as ATM, adaptive data services(such as ABR) share the bandwidth unused by higher priority services. The network indicates to the ABR sources the fair and efficient rates at which they should transmit to minimize their cell loss. In this paper, we propose a new method for determining the "effective" number of active connections, and the fair bandwidth share for each… ▽ More In a multi-service network such as ATM, adaptive data services(such as ABR) share the bandwidth unused by higher priority services. The network indicates to the ABR sources the fair and efficient rates at which they should transmit to minimize their cell loss. In this paper, we propose a new method for determining the "effective" number of active connections, and the fair bandwidth share for each connection. △ Less

Submitted 25 March, 2016; originally announced March 2016.

Comments: Journal of High Speed Networking, 2002

arXiv:cs/9809078 [pdf]

Buffer Requirements For TCP/IP Over ABR

Authors: Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal

Abstract: We study the buffering requirements for zero cell loss for TCP over ABR. We show that the maximum buffers required at the switch is proportional to the maximum round trip time (RTT) of all VCs through the link. The number of round-trips depends upon the the switch algorithm used. With our ERICA [erica-final] switch algorithm, we find that the buffering required is independent of the number of TC… ▽ More We study the buffering requirements for zero cell loss for TCP over ABR. We show that the maximum buffers required at the switch is proportional to the maximum round trip time (RTT) of all VCs through the link. The number of round-trips depends upon the the switch algorithm used. With our ERICA [erica-final] switch algorithm, we find that the buffering required is independent of the number of TCP sources. We substantiate our arguments with simulation results. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proc. IEEE ATM'96 Workshop, San Fransisco, August 23-24, 1996

ACM Class: C.2.1

arXiv:cs/9809077 [pdf]

doi 10.1109/35.544194

Source Behavior for ATM ABR Traffic Management: An Explanation

Authors: Raj Jain, Shiv Kalyanaraman, Sonia Fahmy, Rohit Goyal, S. Kim

Abstract: The Available Bit Rate (ABR) service has been developed to support data applications over Asynchronous Transfer Mode (ATM) networks. The network continuously monitors its traffic and provides feedback to the source end systems. This paper explains the rules that the sources have to follow to achieve a fair and efficient allocation of network resources. The Available Bit Rate (ABR) service has been developed to support data applications over Asynchronous Transfer Mode (ATM) networks. The network continuously monitors its traffic and provides feedback to the source end systems. This paper explains the rules that the sources have to follow to achieve a fair and efficient allocation of network resources. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: IEEE Communications Magazine, November 1, 1996, vol 34, no11, pp50-57

ACM Class: C.2.1

arXiv:cs/9809076 [pdf]

A Survey of Congestion Control Techniques and Data Link Protocols in Satellite Networks

Authors: Sonia Fahmy, Raj Jain, Fang Lu, Shivkumar Kalyanaraman

Abstract: Satellite communication systems are the means of realizing a global broadband integrated services digital network. Due to the statistical nature of the integrated services traffic, the resulting rate fluctuations and burstiness render congestion control a complicated, yet indispensable function. The long propagation delay of the earth-satellite link further imposes severe demands and constraints… ▽ More Satellite communication systems are the means of realizing a global broadband integrated services digital network. Due to the statistical nature of the integrated services traffic, the resulting rate fluctuations and burstiness render congestion control a complicated, yet indispensable function. The long propagation delay of the earth-satellite link further imposes severe demands and constraints on the congestion control schemes, as well as the media access control techniques and retransmission protocols that can be employed in a satellite network. The problems in designing satellite network protocols, as well as some of the solutions proposed to tackle these problems, will be the primary focus of this survey. △ Less

Submitted 23 September, 1998; originally announced September 1998.

ACM Class: C.2.1

arXiv:cs/9809075 [pdf]

On Source Rules for ABR Service on ATM Networks with Satellite Links

Authors: Sonia Fahmy, Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Fang Lu

Abstract: During the design of ABR traffic management at the ATM Forum, we performed several analyses to ensure that the ABR service will operate efficiently over satellite links. In the cases where the performance was unacceptable, we suggested modifications to the traffic management specifications. This paper describes one such issue related to the count of missing resource management cells (Crm) parame… ▽ More During the design of ABR traffic management at the ATM Forum, we performed several analyses to ensure that the ABR service will operate efficiently over satellite links. In the cases where the performance was unacceptable, we suggested modifications to the traffic management specifications. This paper describes one such issue related to the count of missing resource management cells (Crm) parameter of the ABR source behavior. The analysis presented here led to the changes which are now part of the ATM traffic management (TM 4.0) specification. In particular, the size of the transient buffer exposure (TBE) parameter was set to 24 bits, and no size was enforced for the Crm parameter. This simple change improved the throughput over OC-3 satellite links from 45 Mbps to 140 Mbps. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proceedings of the First International Workshop on Satellite-based Information Services, Rye, New York, November 1996, pp108-115

ACM Class: C.2.1

arXiv:cs/9809074 [pdf]

Performance of TCP/IP Using ATM ABR and UBR Services over Satellite Networks

Authors: Shiv Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy, Seong-Cheol Kim

Abstract: We study the buffering requirements for zero cell loss for TCP/IP over satellite links using the available bit rate (ABR) and unspecified bit rate (UBR) services of asynchronous transfer mode (ATM) networks. For the ABR service, we explore the effect of feedback delay (a factor which depends upon the position of the bottleneck), the switch scheme used, and background variable bit rate (VBR) traf… ▽ More We study the buffering requirements for zero cell loss for TCP/IP over satellite links using the available bit rate (ABR) and unspecified bit rate (UBR) services of asynchronous transfer mode (ATM) networks. For the ABR service, we explore the effect of feedback delay (a factor which depends upon the position of the bottleneck), the switch scheme used, and background variable bit rate (VBR) traffic. It is shown that the buffer requirement for TCP over ABR is independent of the number of TCP sources, but depends on the aforementioned factors. For the UBR service, we show that the buffer requirement is the sum of the TCP receiver window sizes. We substantiate our arguments with simulation results. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: IEEE Communication Society Workshop on Computer-Aided Modeling, Analysis and Design of Communication Links and Networks, Mclean, VA, October 20, 1996

ACM Class: C.2.1

arXiv:cs/9809073 [pdf]

doi 10.1109/35.685383

Performance and Buffering Requirements of Internet Protocols over ATM ABR and UBR Services

Authors: Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Seong-Cheol Kim

Abstract: The Asynchronous Transfer Mode (ATM) networks are quickly being adopted as backbones over various parts of the Internet. This paper analyzes the performance of TCP/IP protocols over ATM network's Available Bit Rate (ABR) and Unspecified Bit Rate (UBR) services. It is shown that ABR pushes congestion to the edges of the ATM network while UBR leaves it inside the ATM portion. The Asynchronous Transfer Mode (ATM) networks are quickly being adopted as backbones over various parts of the Internet. This paper analyzes the performance of TCP/IP protocols over ATM network's Available Bit Rate (ABR) and Unspecified Bit Rate (UBR) services. It is shown that ABR pushes congestion to the edges of the ATM network while UBR leaves it inside the ATM portion. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: IEEE Communications Magazine, Vol 36, no 6, pp152-157

ACM Class: C.2.1

arXiv:cs/9809072 [pdf]

Performance of TCP over ABR on ATM backbone and with various VBR traffic patterns

Authors: Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Jianping Jiang, Seong-Cheol Kim

Abstract: We extend our earlier studies of buffer requirements of TCP over ABR in two directions. First, we study the performance of TCP over ABR in an ATM backbone. On the backbone, the TCP queues are at the edge router and not inside the ATM network. The router requires buffer equal to the sum of the receiver window sizes of the participating TCP connections. Second, we introduce various patterns of VBR… ▽ More We extend our earlier studies of buffer requirements of TCP over ABR in two directions. First, we study the performance of TCP over ABR in an ATM backbone. On the backbone, the TCP queues are at the edge router and not inside the ATM network. The router requires buffer equal to the sum of the receiver window sizes of the participating TCP connections. Second, we introduce various patterns of VBR background traffic. The VBR background introduces variance in the ABR capacity and the TCP traffic introduces variance in the ABR demand. Some simple switch schemes are unable to keep up with the combined effect of highly varying demands and highly varying ABR capacity. We present our experiences with refining the ERICA+ switch scheme to handle these conditions. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: ICC'97, Montreal, June 1997

ACM Class: C.2.1

arXiv:cs/9809071 [pdf]

UBR+: Improving Performance of TCP over ATM-UBR service

Authors: Rohit Goyal, Raj Jain, Shiv Kalyanaraman, Sonia Fahmy, Seong-Cheol Kim

Abstract: ATM-UBR switches respond to congestion by dropping cells when their buffers become full. TCP connections running over UBR experience low throughput and high unfairness. For 100% TCP throughput each switch needs buffers equal to the sum of the window sizes of all the TCP connections. Intelligent drop policies can improve the performance of TCP over UBR with limited buffers. The UBR+ service propo… ▽ More ATM-UBR switches respond to congestion by dropping cells when their buffers become full. TCP connections running over UBR experience low throughput and high unfairness. For 100% TCP throughput each switch needs buffers equal to the sum of the window sizes of all the TCP connections. Intelligent drop policies can improve the performance of TCP over UBR with limited buffers. The UBR+ service proposes enhancements to UBR for intelligent drop. Early Packet Discard improves throughput but does not attempt to improve fairness. Selective packet drop based on per-connection buffer occupancy improves fairness. The Fair Buffer Allocation scheme further improves both throughput and fairness. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: ICC'97, Montreal, June 1997, pp1042-1048

ACM Class: C.2.1

arXiv:cs/9809070 [pdf]

Use-it or Lose-it Policies for the Available Bit Rate (ABR) Service in ATM Networks

Authors: Shivkumar Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy, Seong-Cheol Kim

Abstract: The Available Bit Rate (ABR) service has been developed to support 21st century data applications over Asynchronous Transfer Mode (ATM). The ABR service uses a closed-loop rate-based traffic management framework where the network divides left-over bandwidth among contending sources. The ATM Forum traffic management group also incorporated open-loop control capabilities to make the ABR service ro… ▽ More The Available Bit Rate (ABR) service has been developed to support 21st century data applications over Asynchronous Transfer Mode (ATM). The ABR service uses a closed-loop rate-based traffic management framework where the network divides left-over bandwidth among contending sources. The ATM Forum traffic management group also incorporated open-loop control capabilities to make the ABR service robust to temporary network failures and source inactivity. An important problem addressed was whether rate allocations of sources should be taken away if sources do not use them. The proposed solutions, popularly known as the Use-It-or-Lose-It (UILI) policies, have had significant impact on the ABR service capabilities. In this paper we discuss the design, development, and the final shape of these policies and their impact on the ABR service. We compare the various alternatives through a performance evaluation. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: 25 pages

ACM Class: C.2.1

arXiv:cs/9809069 [pdf]

Design Considerations for the Virtual Source/Virtual Destination (VS/VD) Feature in the ABR Service of ATM Networks

Authors: Shiv Kalyanaraman, Raj Jain, Jianping Jiang, Rohit Goyal, Sonia Fahmy, Pradeep Samudra

Abstract: The Available Bit Rate (ABR) service in ATM networks has been specified to allow fair and efficient support of data applications over ATM utilizing capacity left over after servicing higher priority classes. One of the architectural features in the ABR specification [tm4] is the Virtual Source/Virtual Destination (VS/VD) option. This option allows a switch to divide an end-to-end ABR connection… ▽ More The Available Bit Rate (ABR) service in ATM networks has been specified to allow fair and efficient support of data applications over ATM utilizing capacity left over after servicing higher priority classes. One of the architectural features in the ABR specification [tm4] is the Virtual Source/Virtual Destination (VS/VD) option. This option allows a switch to divide an end-to-end ABR connection into separately controlled ABR segments by acting like a destination on one segment, and like a source on the other. The coupling in the VS/VD switch between the two ABR control segments is implementation specific. In this paper, we model a VS/VD ATM switch and study the issues in designing coupling between ABR segments. We identify a number of implementation options for the coupling. A good choice significantly improves the stability and transient performance of the system and reduces the buffer requirements at the switches. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: 25 pages

ACM Class: C.2.1

arXiv:cs/9809067 [pdf]

A Survey of Protocols and Open Issues in ATM Multipoint Communication

Authors: Sonia Fahmy, Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Bobby Vandalore, Xiangrong Cai

Abstract: Asynchronous transfer mode (ATM) networks must define multicast capabilities in order to efficiently support numerous applications, such as video conferencing and distributed applications, in addition to LAN emulation (LANE) and Internet protocol (IP) multicasting. Several problems and issues arise in ATM multicasting, such as signaling, routing, connection admission control, and traffic managem… ▽ More Asynchronous transfer mode (ATM) networks must define multicast capabilities in order to efficiently support numerous applications, such as video conferencing and distributed applications, in addition to LAN emulation (LANE) and Internet protocol (IP) multicasting. Several problems and issues arise in ATM multicasting, such as signaling, routing, connection admission control, and traffic management problems. IP integrated services over ATM poses further challenges to ATM multicasting. Scalability and simplicity are the two main concerns for ATM multicasting. This paper provides a survey of the current work on multicasting problems in general, and ATM multicasting in particular. A number of proposed schemes is examined, such as the schemes MARS, MCS, SEAM, SMART, RSVP, and various multipoint traffic management and transport-layer schemes. The paper also indicates a number of key open issues that remain unresolved. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: OSU Technical Report, August 21, 1997

ACM Class: C.2.1

arXiv:cs/9809066 [pdf]

TCP Selective Acknowledgments and UBR Drop Policies to Improve ATM-UBR Performance over Terrestrial and Satellite Networks

Authors: Rohit Goyal, Raj Jain, Shivkumar Kalyanaraman, Sonia Fahmy, Bobby Vandalore, Sastri Kota

Abstract: We study the performance of Selective Acknowledgments with TCP over the ATM-UBR service category. We examine various UBR drop policies, TCP mechanisms and network configurations to recommend optimal parameters for TCP over UBR. We discuss various TCP congestion control mechanisms compare their performance for LAN and WAN networks. We describe the effect of satellite delays on TCP performance ove… ▽ More We study the performance of Selective Acknowledgments with TCP over the ATM-UBR service category. We examine various UBR drop policies, TCP mechanisms and network configurations to recommend optimal parameters for TCP over UBR. We discuss various TCP congestion control mechanisms compare their performance for LAN and WAN networks. We describe the effect of satellite delays on TCP performance over UBR and present simulation results for LAN, WAN and satellite networks. SACK TCP improves the performance of TCP over UBR, especially for large delay networks. Intelligent drop policies at the switches are an important factor for good performance in local area networks. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proc. ICCCN97, Las Vegas, September 1997 pp17-27

ACM Class: C.2.1

arXiv:cs/9809065 [pdf]

doi 10.1109/INFCOM.1998.662910

Feedback Consolidation Algorithms for ABR Point-to-Multipoint Connections in ATM Networks

Authors: Sonia Fahmy, Raj Jain, Rohit Goyal, Bobby Vandalore, Shivkumar Kalyanaraman, Sastri Kota, Pradeep Samudra

Abstract: ABR traffic management for point-to-multipoint connections controls the source rate to the minimum rate supported by all the branches of the multicast tree. A number of algorithms have been developed for extending ABR congestion avoidance algorithms to perform feedback consolidation at the branch points. This paper discusses various design options and implementation alternatives for the consolid… ▽ More ABR traffic management for point-to-multipoint connections controls the source rate to the minimum rate supported by all the branches of the multicast tree. A number of algorithms have been developed for extending ABR congestion avoidance algorithms to perform feedback consolidation at the branch points. This paper discusses various design options and implementation alternatives for the consolidation algorithms, and proposes a number of new algorithms. The performance of the proposed algorithms and the previous algorithms is compared under a variety of conditions. Results indicate that the algorithms we propose eliminate the consolidation noise (caused if the feedback is returned before all branches respond), while exhibiting a fast transient response. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proceedings of IEEE INFOCOM 1998, March 1998, volume 3, pp. 1004-1013

ACM Class: C.2.1

arXiv:cs/9809063 [pdf]

doi 10.1117/12.325884

Performance of Bursty World Wide Web (WWW) Sources over ABR

Authors: Bobby Vandalore, Shivkumar Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy, Seong-Cheol Kim

Abstract: We model World Wide Web (WWW) servers and clients running over an ATM network using the ABR (available bit rate) service. The WWW servers are modeled using a variant of the SPECweb96 benchmark, while the WWW clients are based on a model by Mah. The traffic generated by this application is typically bursty, i.e., it has active and idle periods in transmission. A timeout occurs after given amount… ▽ More We model World Wide Web (WWW) servers and clients running over an ATM network using the ABR (available bit rate) service. The WWW servers are modeled using a variant of the SPECweb96 benchmark, while the WWW clients are based on a model by Mah. The traffic generated by this application is typically bursty, i.e., it has active and idle periods in transmission. A timeout occurs after given amount of idle period. During idle period the underlying TCP congestion windows remain open until a timeout expires. These open windows may be used to send data in a burst when the application becomes active again. This raises the possibility of large switch queues if the source rates are not controlled by ABR. We study this problem and show that ABR scales well with a large number of bursty TCP sources in the system. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Submitted to WebNet `97, Toronto, November 97

ACM Class: C.2.1

arXiv:cs/9809059 [pdf]

The ERICA Switch Algorithm for ABR Traffic Management in ATM Networks

Authors: Shivkumar Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Bobby Vandalore

Abstract: We propose an explicit rate indication scheme for congestion avoidance in ATM networks. In this scheme, the network switches monitor their load on each link, determining a load factor, the available capacity, and the number of currently active virtual channels. This information is used to advise the sources about the rates at which they should transmit. The algorithm is designed to achieve effic… ▽ More We propose an explicit rate indication scheme for congestion avoidance in ATM networks. In this scheme, the network switches monitor their load on each link, determining a load factor, the available capacity, and the number of currently active virtual channels. This information is used to advise the sources about the rates at which they should transmit. The algorithm is designed to achieve efficiency, fairness, controlled queueing delays, and fast transient response. The algorithm is also robust to measurement errors caused due to variation in ABR demand and capacity. We present performance analysis of the scheme using both analytical arguments and simulation results. The scheme is being implemented by several ATM switch manufacturers. △ Less

Submitted 23 September, 1998; originally announced September 1998.

ACM Class: C.2.1

arXiv:cs/9809057 [pdf]

doi 10.1109/ICC.1998.683072

On Determining the Fair Bandwidth Share for ABR Connections in ATM Networks

Authors: Sonia Fahmy, Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Bobby Vandalore

Abstract: The ABR service is designed to fairly allocate the bandwidth unused by higher priority services. The network indicates to the ABR sources the rates at which they should transmit to minimize their cell loss. Switches must constantly measure the demand and available capacity, and divide the capacity fairly among the contending connections. In order to compute the fair and efficient allocation for… ▽ More The ABR service is designed to fairly allocate the bandwidth unused by higher priority services. The network indicates to the ABR sources the rates at which they should transmit to minimize their cell loss. Switches must constantly measure the demand and available capacity, and divide the capacity fairly among the contending connections. In order to compute the fair and efficient allocation for each connection, a switch needs to determine the effective number of active connections. In this paper, we propose a method for determining the number of active connections and the fair bandwidth share for each. We prove the efficiency and fairness of the proposed method analytically, and simulate it by incorporating it into the ERICA switch algorithm. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proceedings of the IEEE International Conference on Communications (ICC) 1998, June 1998

ACM Class: C.2.1

arXiv:cs/9809055 [pdf]

Providing Rate Guarantees to TCP over the ATM GFR Service

Authors: Rohit Goyal, Raj Jain, Sonia Fahmy, Bobby Vandalore

Abstract: The ATM Guaranteed Frame Rate (GFR) service is intended for best effort traffic that can benefit from minimum throughput guarantees. Edge devices connecting LANs to an ATM network can use GFR to transport multiple TCP/IP connections over a single GFR VC.These devices would typically multiplex VCs into a single FIFO queue. It has been shown that in general, FIFO queuing is not sufficient to provi… ▽ More The ATM Guaranteed Frame Rate (GFR) service is intended for best effort traffic that can benefit from minimum throughput guarantees. Edge devices connecting LANs to an ATM network can use GFR to transport multiple TCP/IP connections over a single GFR VC.These devices would typically multiplex VCs into a single FIFO queue. It has been shown that in general, FIFO queuing is not sufficient to provide rate guarantees, and per-VC queuing with scheduling is needed. We show that under conditions of low buffer allocation, it is possible to control TCP rates with FIFO queuing and buffer management. We present analysis and simulation results on controlling TCP rates by buffer management. We present a buffer management policy that provides loose rate guarantees to SACK TCP sources when the total buffer allocation is low. We study the performance of this buffer management scheme by simulation. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Submitted to LCN'98

ACM Class: C.2.1

arXiv:cs/9809054 [pdf]

Design Issues for providing Minimum Rate Guarantees to the ATM Unspecified Bit Rate Service

Authors: Rohit Goyal, Raj Jain, Sonia Fahmy, Bobby Vandalore, Shivkumar Kalyanaraman

Abstract: Recent enhancements have been proposed to the ATM Unspecified Bit Rate (UBR) service that guarantee a minimum rate at the frame level to the UBR VCs. These enhancements have been called Guaranteed Frame Rate (GFR). In this paper, we discuss the motivation, design and implementation issues for GFR. We present the design of buffer management and policing mechanisms to implement GFR. We study the e… ▽ More Recent enhancements have been proposed to the ATM Unspecified Bit Rate (UBR) service that guarantee a minimum rate at the frame level to the UBR VCs. These enhancements have been called Guaranteed Frame Rate (GFR). In this paper, we discuss the motivation, design and implementation issues for GFR. We present the design of buffer management and policing mechanisms to implement GFR. We study the effects of policing, per-VC buffer allocation, and per-VC queuing on providing GFR to TCP/IP traffic. We conclude that per-VC scheduling is necessary to provide minimum rate guarantees to TCP traffic. We examine the role of frame tagging in the presence of scheduling and buffer management for providing minumum rate guarantees. The use of GFR to support the Internet Controlled Load Service is also discussed. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Proceedings of ATM98, Fairfax, May 1998

ACM Class: C.2.1

arXiv:cs/9809053 [pdf]

Improving the Performance of TCP over the ATM-UBR service

Authors: ohit Goyal, Raj Jain, Shiv Kalyanaraman, Sonia Fahmy, Bobby Vandalore

Abstract: In this paper we study the design issues in improving TCP performance over the ATM UBR service. ATM-UBR switches respond to congestion by dropping cells when their buffers become full. TCP connections running over UBR can experience low throughput and high unfairness. Intelligent switch drop policies and end-system policies can improve the performance of TCP over UBR with limited buffers. We des… ▽ More In this paper we study the design issues in improving TCP performance over the ATM UBR service. ATM-UBR switches respond to congestion by dropping cells when their buffers become full. TCP connections running over UBR can experience low throughput and high unfairness. Intelligent switch drop policies and end-system policies can improve the performance of TCP over UBR with limited buffers. We describe the various design options available to the network as well as to the end systems to improve TCP performance over UBR. We study the effects of Early Packet Discard, and two per-VC accounting based buffer management policies. We also study the effects of various TCP end system congestion control policies including slow start and congestion avoidance, fast retransmit and recovery and selective acknowledgments. We present simulation results for various small and large latency configurations with varying buffer sizes and number of sources. △ Less

Submitted 23 September, 1998; originally announced September 1998.

ACM Class: C.2.1

arXiv:cs/9809052 [pdf]

Analysis and Simulation of Delay and Buffer Requirements of satellite-ATM Networks for TCP/IP Traffic

Authors: Rohit Goyal, Sastri Kota, Raj Jain, Sonia Fahmy, Bobby Vandalore, Jerry Kallaus

Abstract: In this paper we present a model to study the end-to-end delay performance of a satellite-ATM netowrk. We describe a satellite-ATM network architecture. The architecture presents a trade-off between the on-board switching/processing features and the complexity of the satellite communication systems. The end-to-end delay of a connection passing through a satellite constellation consists of the tr… ▽ More In this paper we present a model to study the end-to-end delay performance of a satellite-ATM netowrk. We describe a satellite-ATM network architecture. The architecture presents a trade-off between the on-board switching/processing features and the complexity of the satellite communication systems. The end-to-end delay of a connection passing through a satellite constellation consists of the transmission delay, the uplink and downlink ground terminal-satellite propagation delay, the inter-satellite link delays, the on-board switching, processing and buffering delays. In a broadband satellite network, the propagation and the buffering delays have the most impact on the overall delay. We present an analysis of the propagation and buffering delay components for GEO and LEO systems. We model LEO constellations as satellites evenly spaced in circular orbits around the earth. A simple routing algorithm for LEO systems calculates locally optimal paths for the end-to-end connection. This is used to calculate the end-to-end propagation delays for LEO networks. We present a simulation model to calculate the buffering delay for TCP/IP traffic over ATM ABR and UBR service categories. We apply this model to calculate total end-to-end delays for TCP/IP over satellite-ATM networks. △ Less

Submitted 23 September, 1998; originally announced September 1998.

Comments: Submitted to IEEE Journal of Selected Areas in Communications, March 1998

ACM Class: C.2.1

arXiv:cs/9809047 [pdf]

Modeling Traffic Management in ATM Networks with OPNET

Authors: Rohit Goyal, Raj Jain, Sonia Fahmy, Shobana Narayanaswamy

Abstract: Asynchronous transfer mode (ATM) is the new generation of computer and communication networks that are being deployed throughout the telecommunication industry as well as in campus backbones. ATM technology distinguishes itself from the previous networking protocols in that it has the latest traffic management technology and thus allows guaranteeing delay, throughput, and other performance measu… ▽ More Asynchronous transfer mode (ATM) is the new generation of computer and communication networks that are being deployed throughout the telecommunication industry as well as in campus backbones. ATM technology distinguishes itself from the previous networking protocols in that it has the latest traffic management technology and thus allows guaranteeing delay, throughput, and other performance measures. This in turn, allows users to integrate voice, video, and data on the same network. Available bit rate (ABR) service in ATM has been designed to fairly distribute all unused capacity to data traffic and is specified in the ATM Forum's Traffic Management (TM4.0) standard. This paper will describe the OPNET models that have been developed for ATM and ABR design and analysis. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Proc. of OPNETWORK'98, Washington DC., May 1998

ACM Class: C.2.1

arXiv:cs/9809046 [pdf]

doi 10.1117/12.325859

Fairness for ABR multipoint-to-point connections

Authors: Sonia Fahmy, Raj Jain, Rohit Goyal, Bobby Vandalore

Abstract: In multipoint-to-point connections, the traffic at the root (destination) is the combination of all traffic originating at the leaves. A crucial concern in the case of multiple senders is how to define fairness within a multicast group and among groups and point-to-point connections. Fairness definition can be complicated since the multipoint connection can have the same identifier (VPI/VCI) on… ▽ More In multipoint-to-point connections, the traffic at the root (destination) is the combination of all traffic originating at the leaves. A crucial concern in the case of multiple senders is how to define fairness within a multicast group and among groups and point-to-point connections. Fairness definition can be complicated since the multipoint connection can have the same identifier (VPI/VCI) on each link, and senders might not be distinguishable in this case. Many rate allocation algorithms implicitly assume that there is only one sender in each VC, which does not hold for multipoint-to-point cases. We give various possibilities for defining fairness for multipoint connections, and show the tradeoffs involved. In addition, we show that ATM bandwidth allocation algorithms need to be adapted to give fair allocations for multipoint-to-point connections. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Proceedings of SPIE 98, November 1998

ACM Class: C.2.1

arXiv:cs/9809045 [pdf]

Performance of TCP over ABR with Long-Range Dependent VBR Background Traffic over Terrestrial and Satellite ATM networks

Authors: Shivkumar Kalyanaraman, Bobby Vandalore, Raj Jain, Rohit Goyal, Sonia Fahmy, Seong-Cheol Kim, Sastri Kota

Abstract: Compressed video is well known to be self-similar in nature. We model VBR carrying Long-Range Dependent (LRD), multiplexed MPEG-2 video sources. The actual traffic for the model is generated using fast-fourier transform of generate the fractional gaussian noise (FGN) sequence. Our model of compressed video sources bears similarity to an MPEG-2 Transport Stream carrying video, i.e., it is long-ra… ▽ More Compressed video is well known to be self-similar in nature. We model VBR carrying Long-Range Dependent (LRD), multiplexed MPEG-2 video sources. The actual traffic for the model is generated using fast-fourier transform of generate the fractional gaussian noise (FGN) sequence. Our model of compressed video sources bears similarity to an MPEG-2 Transport Stream carrying video, i.e., it is long-range dependent and generates traffic in a piecewise-CBR fashion. We study the effect of such VBR traffic on ABR carrying TCP traffic. The effect of such VBR traffic is that the ABR capacity is highly variant. We find that a switch algorithm like ERICA+ can tolerate this variance in ABR capacity while maintaining high throughput and low delay. We present simulation results for terrestrial and satellite configurations. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Proceedings of LCN `98

ACM Class: C.2.1

arXiv:cs/9809043 [pdf]

Worst Case Buffer Requirements For Tcp Over ABR

Authors: Bobby Vandalore, Shivkumar Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy

Abstract: ATM (asynchronous transfer mode) is the technology chosen for the Broadband Integrated Services Digital Network (B-ISDN). The ATM ABR (available bit rate) service can be used to transport ``best-effort'' traffic. In this paper, we extend our earlier work on the buffer requirements problem for TCP over ABR. Here, a worst case scenario is generated such that TCP sources send a burst of data at the… ▽ More ATM (asynchronous transfer mode) is the technology chosen for the Broadband Integrated Services Digital Network (B-ISDN). The ATM ABR (available bit rate) service can be used to transport ``best-effort'' traffic. In this paper, we extend our earlier work on the buffer requirements problem for TCP over ABR. Here, a worst case scenario is generated such that TCP sources send a burst of data at the time when the sources have large congestion windows and the ACRs (allowed cell rates) for ABR are high. We find that ABR using the ERICA+ switch algorithm can control the maximum queue lengths (hence the buffer requirements) even for the worst case. We present analytical arguments for the expected queue length and simulation results for different number of sources values and parameter values. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: SICON'98, June 98

ACM Class: C.2.1

arXiv:cs/9809042 [pdf]

A Definition of General Weighted Fairness and its Support in Explicit Rate Switch Algorithms

Authors: Bobby Vandalore, Sonia Fahmy, Raj Jain, Rohit Goyal, Mukul Goyal

Abstract: In this paper we give a general definition of weighted fairness and show how this can achieve various fairness definitions, such as those mentioned in the ATM Forum TM 4.0 Specifications. We discuss how a pricing policy can be mapped to general weighted (GW) fairness. The GW fairness can be achieved by calculating the $ExcessFairshare$ (weighted fairshare of the left over bandwidth) for each VC.… ▽ More In this paper we give a general definition of weighted fairness and show how this can achieve various fairness definitions, such as those mentioned in the ATM Forum TM 4.0 Specifications. We discuss how a pricing policy can be mapped to general weighted (GW) fairness. The GW fairness can be achieved by calculating the $ExcessFairshare$ (weighted fairshare of the left over bandwidth) for each VC. We show how a switch algorithm can be modified to support the GW fairness by using the $ExcessFairshare$. We use ERICA+ as an example switch algorithm and show how it can be modified to achieve the general fairness. Simulations results are presented to demonstrate that the modified switch algorithm achieves GW fairness. An analytical proof for convergence of the modified ERICA+ algorithm is given in the appendix. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Proceedings of ICNP'98, October1998

ACM Class: C.2.1

arXiv:cs/9809041 other]

Design and Analysis of Queue Control Functions for Explicit Rate Switch Schemes

Authors: Bobby Vandalore, Raj Jain, Rohit Goyal, Sonia Fahmy

Abstract: The main goals of a switch scheme are high utilization, low queuing delay and fairness. To achieve high utilization the switch scheme can maintain non-zero (small) queues in steady state which can be used if the sources do not have data to send. Queue length (delay) can be controlled if part of the link capacity is used for draining queues in the event of queue build up. In most schemes a simple… ▽ More The main goals of a switch scheme are high utilization, low queuing delay and fairness. To achieve high utilization the switch scheme can maintain non-zero (small) queues in steady state which can be used if the sources do not have data to send. Queue length (delay) can be controlled if part of the link capacity is used for draining queues in the event of queue build up. In most schemes a simple threshold function is used for queue control. Better control of the queue and hence delay can be achieved by using sophisticated queue control functions. It is very important to design and analyze such queue control functions. We study step, linear, hyperbolic and inverse hyperbolic queue control functions. Analytical explanation and simulation results consistent with analysis are presented. From the study, we conclude that inverse hyperbolic is the best control function and to reduce complexity the linear control function can be used since it performs satisfactorily in most cases. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Proceedings of IC3N'98, October 1998

ACM Class: C.2.1

arXiv:cs/9809040 other]

Overload Based Explicit Rate Switch Schemes with MCR Guarantees

Authors: Bobby Vandalore, Sonia Fahmy, Raj Jain, Rohit Goyal, Mukul Goyal

Abstract: An explicit rate switch scheme monitors the load at each link and gives feedback to the sources. We define the overload factor as the ratio of the input rate to the available capacity. In this paper, we present four overload based ABR switch schemes which provide MCR guarantees. The switch schemes proposed use the overload factor and other quantities to calculate feedback rates. A dynamic queue… ▽ More An explicit rate switch scheme monitors the load at each link and gives feedback to the sources. We define the overload factor as the ratio of the input rate to the available capacity. In this paper, we present four overload based ABR switch schemes which provide MCR guarantees. The switch schemes proposed use the overload factor and other quantities to calculate feedback rates. A dynamic queue control mechanism is used to achieve efficient usage of the link, control queues and, achieve constant queuing delay at steady state. The proposed algorithms are studied and compared using several configurations. The configurations were chosen to test the performance of the algorithms in presence of link bottlenecks, source bottlenecks and transient sources. A comparison of the proposed algorithms based on the simulation results is presented. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: Submitted to the INFOCOM '99

ACM Class: C.2.1

arXiv:cs/9809039 [pdf]

doi 10.1109/MNET.1998.730745

ABR Flow Control for Multipoint Connections

Authors: Sonia Fahmy, Raj Jain

Abstract: Multipoint capabilities are essential for ATM networks to efficiently support many applications, including IP multicasting and overlay applications. The current signaling and routing specifications for ATM define point-to-multipoint capabilities. Multipoint-to-point connection support is also being discussed by the signaling and PNNI groups, and will be defined in the near future for the unspeci… ▽ More Multipoint capabilities are essential for ATM networks to efficiently support many applications, including IP multicasting and overlay applications. The current signaling and routing specifications for ATM define point-to-multipoint capabilities. Multipoint-to-point connection support is also being discussed by the signaling and PNNI groups, and will be defined in the near future for the unspecified bit rate (UBR) service. We examine point-to-multipoint and multipoint-to-point flow control for the available bit rate (ABR) service, as discussed in the traffic management working group. △ Less

Submitted 22 September, 1998; originally announced September 1998.

Comments: 5 pages, 2 figures submitted to IEEE Network Magazine, ATM Forum Perspectives column

ACM Class: C.2.1

Showing 1–44 of 44 results for author: Fahmy, S