Search | arXiv e-print repository

Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning

Authors: Mahmoud Salhab, Marwan Elghitany, Shameed Sait, Syed Sibghat Ullah, Mohammad Abusheikh, Hasan Abusheikh

Abstract: Automatic speech recognition (ASR) is crucial for human-machine interaction in diverse applications like conversational agents, industrial robotics, call center automation, and automated subtitling. However, developing high-performance ASR models remains challenging, particularly for low-resource languages like Arabic, due to the scarcity of large, labeled speech datasets, which are costly and lab… ▽ More Automatic speech recognition (ASR) is crucial for human-machine interaction in diverse applications like conversational agents, industrial robotics, call center automation, and automated subtitling. However, developing high-performance ASR models remains challenging, particularly for low-resource languages like Arabic, due to the scarcity of large, labeled speech datasets, which are costly and labor-intensive to produce. In this work, we employ weakly supervised learning to train an Arabic ASR model using the Conformer architecture. Our model is trained from scratch on 15,000 hours of weakly annotated speech data covering both Modern Standard Arabic (MSA) and Dialectal Arabic (DA), eliminating the need for costly manual transcriptions. Despite the absence of human-verified labels, our approach achieves state-of-the-art (SOTA) results in Arabic ASR, surpassing both open and closed-source models on standard benchmarks. By demonstrating the effectiveness of weak supervision as a scalable, cost-efficient alternative to traditional supervised approaches, paving the way for improved ASR systems in low resource settings. △ Less

Submitted 19 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

arXiv:2304.04094 [pdf, other]

doi 10.1109/TCOMM.2023.3265123

Energy-Efficient Optimization of Multi-User NOMA-Assisted Cooperative THz-SIMO MEC Systems

Authors: Omar Maraqa, Saad Al-Ahmadi, Aditya Rajasekaran, Hamza Sokun, Halim Yanikomeroglu, Sadiq M. Sait

Abstract: The various requirements in terms of data rates and latency in beyond 5G and 6G networks have motivated the integration of a variety of communications schemes and technologies to meet these requirements in such networks. Among these schemes are Terahertz (THz) communications, cooperative non-orthogonal multiple-access (NOMA)-enabled schemes, and mobile edge computing (MEC). THz communications offe… ▽ More The various requirements in terms of data rates and latency in beyond 5G and 6G networks have motivated the integration of a variety of communications schemes and technologies to meet these requirements in such networks. Among these schemes are Terahertz (THz) communications, cooperative non-orthogonal multiple-access (NOMA)-enabled schemes, and mobile edge computing (MEC). THz communications offer abundant bandwidth for high-data-rate short-distance applications and NOMA-enabled schemes are promising schemes to realize the target spectral efficiencies and low latency requirements in future networks, while MEC would allow distributed processing and data offloading for the emerging applications in these networks. In this paper, an energy-efficient scheme of multi-user NOMA-assisted cooperative THz single-input multiple-output (SIMO) MEC systems is proposed to allow the uplink transmission of offloaded data from the far cell-edge users to the more computing resources in the base station (BS) through the cell-center users. To reinforce the performance of the proposed scheme, two optimization problems are formulated and solved, namely, the first problem minimizes the total users' energy consumption while the second problem maximizes the total users' computation energy efficiency (CEE) for the proposed scheme. In both problems, the NOMA user pairing, the BS receive beamforming, the transmission time allocation, and the NOMA transmission power allocation coefficients are optimized, while taking into account the full-offloading requirements of each user as well as the predefined latency constraint of the system. The obtained results reveal new insights into the performance and design of multi-user NOMA-assisted cooperative THz-SIMO MEC systems. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: Accepted for publication in IEEE Transactions on Communications

arXiv:2209.11272 [pdf, other]

doi 10.1007/s11227-022-04787-8

Optimization of FPGA-based CNN Accelerators Using Metaheuristics

Authors: Sadiq M. Sait, Aiman El-Maleh, Mohammad Altakrouri, Ahmad Shawahna

Abstract: In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general CPUs unable to deliver the desired real-time performance. At the same time, FPGAs have seen a surge in interest for accelerating CNN inference. This is… ▽ More In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general CPUs unable to deliver the desired real-time performance. At the same time, FPGAs have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to GPUs. The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver optimal performance more challenging. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a Multi-CLP accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors, block RAMs, and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The SA-/TS-based Multi-CLP achieves 1.31x - 2.37x higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 23 pages, 7 figures, 9 tables. in The Journal of Supercomputing, 2022

arXiv:2203.12091 [pdf, other]

doi 10.1109/ACCESS.2022.3157893

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point Representation

Authors: Ahmad Shawahna, Sadiq M. Sait, Aiman El-Maleh, Irfan Ahmad

Abstract: Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and… ▽ More Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 30 pages, 12 figures, 5 tables. in IEEE Access, 2022

Report number: Electronic ISSN: 2169-3536 MSC Class: 68T01 ACM Class: I.2.0; I.4.0; I.5.0; C.1.0

arXiv:2203.05195 [pdf, other]

A Review of Open Source Software Tools for Time Series Analysis

Authors: Yunus Parvej Faniband, Iskandar Ishak, Sadiq M. Sait

Abstract: Time series data is used in a wide range of real world applications. In a variety of domains , detailed analysis of time series data (via Forecasting and Anomaly Detection) leads to a better understanding of how events associated with a specific time instance behave. Time Series Analysis (TSA) is commonly performed with plots and traditional models. Machine Learning (ML) approaches , on the other… ▽ More Time series data is used in a wide range of real world applications. In a variety of domains , detailed analysis of time series data (via Forecasting and Anomaly Detection) leads to a better understanding of how events associated with a specific time instance behave. Time Series Analysis (TSA) is commonly performed with plots and traditional models. Machine Learning (ML) approaches , on the other hand , have seen an increase in the state of the art for Forecasting and Anomaly Detection because they provide comparable results when time and data constraints are met. A number of time series toolboxes are available that offer rich interfaces to specific model classes (ARIMA/filters , neural networks) or framework interfaces to isolated time series modelling tasks (forecasting , feature extraction , annotation , classification). Nonetheless , open source machine learning capabilities for time series remain limited , and existing libraries are frequently incompatible with one another. The goal of this paper is to provide a concise and user friendly overview of the most important open source tools for time series analysis. This article examines two related toolboxes (1) forecasting and (2) anomaly detection. This paper describes a typical Time Series Analysis (TSA) framework with an architecture and lists the main features of TSA framework. The tools are categorized based on the criteria of analysis tasks completed , data preparation methods employed , and evaluation methods for results generated. This paper presents quantitative analysis and discusses the current state of actively developed open source Time Series Analysis frameworks. Overall , this article considered 60 time series analysis tools , and 32 of which provided forecasting modules , and 21 packages included anomaly detection. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 21 Pages, 2 Figures

ACM Class: I.2; I.2.5

arXiv:2105.12821 [pdf, other]

doi 10.3390/s21113705

On the Achievable Max-Min User Rates in Multi-Carrier Centralized NOMA-VLC Networks

Authors: Omar Maraqa, Umair F. Siddiqi, Saad Al-Ahmadi, Sadiq M. Sait

Abstract: Visible light communications (VLC) is gaining interest as one of the enablers of short-distance, high-data-rate applications, in future beyond 5G networks. Moreover, non-orthogonal multiple-access (NOMA)-enabled schemes have recently emerged as a promising multiple-access scheme for these networks that would allow realization of the target spectral efficiency and user fairness requirements. The in… ▽ More Visible light communications (VLC) is gaining interest as one of the enablers of short-distance, high-data-rate applications, in future beyond 5G networks. Moreover, non-orthogonal multiple-access (NOMA)-enabled schemes have recently emerged as a promising multiple-access scheme for these networks that would allow realization of the target spectral efficiency and user fairness requirements. The integration of NOMA in the widely adopted orthogonal frequency-division multiplexing (OFDM)-based VLC networks would require an optimal resource allocation for the pair or the cluster of users sharing the same subcarrier(s). In this paper, the max-min rate of a multi-cell indoor centralized VLC network is maximized through optimizing user pairing, subcarrier allocation, and power allocation. The joint complex optimization problem is tackled using a low-complexity solution. At first, the user pairing is assumed to follow the divide-and-next-largest-difference user-pairing algorithm (D-NLUPA) that can ensure fairness among the different clusters. Then, subcarrier allocation and power allocation are solved iteratively through both the Simulated Annealing (SA) meta-heuristic algorithm and the bisection method. The obtained results quantify the achievable max-min user rates for the different relevant variants of NOMA-enabled schemes and shed new light on both the performance and design of multi-user multi-carrier NOMA-enabled centralized VLC networks. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: May, 2021: Accepted for publication in Sensors Journal, MDPI Publisher

arXiv:2104.05391 [pdf, other]

Energy-Efficient Coverage Enhancement of Indoor THz-MISO Systems: An FD-NOMA Approach

Authors: Omar Maraqa, Aditya S. Rajasekaran, Hamza U. Sokun, Saad Al-Ahmadi, Halim Yanikomeroglu, Sadiq M. Sait

Abstract: Terahertz (THz) communication is gaining more interest as one of the envisioned enablers of high-data-rate short-distance indoor applications in beyond 5G networks. Moreover, non-orthogonal multiple-access (NOMA)-enabled schemes are promising schemes to realize the target spectral efficiency, low latency, and user fairness requirements in future networks. In this paper, an energy-efficient coopera… ▽ More Terahertz (THz) communication is gaining more interest as one of the envisioned enablers of high-data-rate short-distance indoor applications in beyond 5G networks. Moreover, non-orthogonal multiple-access (NOMA)-enabled schemes are promising schemes to realize the target spectral efficiency, low latency, and user fairness requirements in future networks. In this paper, an energy-efficient cooperative NOMA (CNOMA) scheme that guarantees the minimum required rate for the cell-edge users in an indoor THz-MISO communications network is proposed. The proposed cooperative scheme consists of three stages: (i) beamforming stage that allocates base-station (BS) beams to THz cooperating cell-center users using analog beamforming with the aid of the cosine similarity metric, (ii) user pairing stage that is tackled using the Hungarian algorithm, and (iii) power allocation stage for both the BS THz-NOMA transmit power and the cooperation power of the cooperating cell-center users, which are optimized sequentially. The obtained results quantify the energy efficiency (EE) of the proposed scheme and shed new light on the performance of multi-user THz-NOMA-enabled networks. △ Less

Submitted 19 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted for publication in proceedings of 2021 IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2021)

arXiv:1909.08011 [pdf, other]

doi 10.1109/COMST.2020.3013514

A Survey of Rate-optimal Power Domain NOMA with Enabling Technologies of Future Wireless Networks

Authors: Omar Maraqa, Aditya S. Rajasekaran, Saad Al-Ahmadi, Halim Yanikomeroglu, Sadiq M. Sait

Abstract: The ambitious high data-rate applications in the envisioned future B5G networks require new solutions, including the advent of more advanced architectures than the ones already used in 5G networks, and the coalition of different communications schemes and technologies to enable these applications requirements. Among the candidate schemes for future wireless networks are NOMA schemes that allow ser… ▽ More The ambitious high data-rate applications in the envisioned future B5G networks require new solutions, including the advent of more advanced architectures than the ones already used in 5G networks, and the coalition of different communications schemes and technologies to enable these applications requirements. Among the candidate schemes for future wireless networks are NOMA schemes that allow serving more than one user in the same resource block by multiplexing users in other domains than frequency or time. In this way, NOMA schemes tend to offer several advantages over OMA schemes such as improved user fairness and spectral efficiency, higher cell-edge throughput, massive connectivity support, and low transmission latency. With these merits, NOMA-enabled transmission schemes are being increasingly looked at as promising multiple access schemes for future wireless networks. When the power domain is used to multiplex the users, it is referred to as PD-NOMA. In this paper, we survey the integration of PD-NOMA with the enabling communications schemes and technologies that are expected to meet the various requirements of B5G networks. In particular, this paper surveys the different rate optimization scenarios studied in the literature when PD-NOMA is combined with one or more of the candidate schemes and technologies for B5G networks including MISO, MIMO, mMIMO, advanced antenna architectures, mmWave and THz, CoMP, cooperative communications, cognitive radio, VLC, UAV and others. The considered system models, the optimization methods utilized to maximize the achievable rates, and the main lessons learnt on the optimization and the performance of these NOMA-enabled schemes and technologies are discussed in detail along with the future research directions for these combined schemes. Moreover, the role of machine learning in optimizing these NOMA-enabled technologies is addressed. △ Less

Submitted 30 July, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

Comments: Accepted for publication in IEEE Surveys and Tutorials, July 2020

arXiv:1901.00121 [pdf, other]

doi 10.1109/ACCESS.2018.2890150

FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review

Authors: Ahmad Shawahna, Sadiq M. Sait, Aiman El-Maleh

Abstract: Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolution neural networks (CNNs) have demonstrated their effectiveness in image detection and recognition applications. However… ▽ More Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not possible before. In particular, convolution neural networks (CNNs) have demonstrated their effectiveness in image detection and recognition applications. However, they require intensive CPU operations and memory bandwidth that make general CPUs fail to achieve desired performance levels. Consequently, hardware accelerators that use application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), and graphic processing units (GPUs) have been employed to improve the throughput of CNNs. More precisely, FPGAs have been recently adopted for accelerating the implementation of deep learning networks due to their ability to maximize parallelism as well as due to their energy efficiency. In this paper, we review recent existing techniques for accelerating deep learning networks on FPGAs. We highlight the key features employed by the various techniques for improving the acceleration performance. In addition, we provide recommendations for enhancing the utilization of FPGAs for CNNs acceleration. The techniques investigated in this paper represent the recent trends in FPGA-based accelerators of deep learning networks. Thus, this review is expected to direct the future advances on efficient hardware accelerators and to be useful for deep learning researchers. △ Less

Submitted 1 January, 2019; originally announced January 2019.

Comments: This article has been accepted for publication in IEEE Access (December, 2018)

arXiv:1809.01930 [pdf, other]

doi 10.1109/ACCESS.2018.2879885

Multi-User Visible Light Communications: State-of-the-Art and Future Directions

Authors: Saad Al-Ahmadi, Omar Maraqa, Murat Uysal, Sadiq M. Sait

Abstract: Visible light communication (VLC) builds upon the dual use of existing lighting infrastructure for wireless data transmission. VLC has recently gained interest as cost-effective, secure, and energy-efficient wireless access technology particularly for indoor user-dense environments. While initial studies in this area are mainly limited to single-user point-to-point links, more recent efforts have… ▽ More Visible light communication (VLC) builds upon the dual use of existing lighting infrastructure for wireless data transmission. VLC has recently gained interest as cost-effective, secure, and energy-efficient wireless access technology particularly for indoor user-dense environments. While initial studies in this area are mainly limited to single-user point-to-point links, more recent efforts have focused on multi-user VLC systems in an effort to transform VLC into a scalable and fully networked wireless technology. In this paper, we provide a comprehensive overview of multi-user VLC systems discussing the recent advances on multi-user precoding, multiple access, resource allocation, and mobility management. We further provide possible directions of future research in this emerging topic. △ Less

Submitted 14 November, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

Comments: Version 3: Accepted for publication in IEEE Access, Nov. 2018

arXiv:1101.3859 [pdf]

doi 10.5121/ijcnc.2011.3111

OSPF Weight Setting Optimization for Single Link Failures

Authors: Mohammed H. Sqalli, Sadiq M. Sait, Syed Asadullah

Abstract: In operational networks, nodes are connected via multiple links for load sharing and redundancy. This is done to make sure that a failure of a link does not disconnect or isolate some parts of the network. However, link failures have an effect on routing, as the routers find alternate paths for the traffic originally flowing through the link which has failed. This effect is severe in case of failu… ▽ More In operational networks, nodes are connected via multiple links for load sharing and redundancy. This is done to make sure that a failure of a link does not disconnect or isolate some parts of the network. However, link failures have an effect on routing, as the routers find alternate paths for the traffic originally flowing through the link which has failed. This effect is severe in case of failure of a critical link in the network, such as backbone links or the links carrying higher traffic loads. When routing is done using the Open Shortest Path First (OSPF) routing protocol, the original weight selection for the normal state topology may not be as efficient for the failure state. In this paper, we investigate the single link failure issue with an objective to find a weight setting which results in efficient routing in normal and failure states. We engineer Tabu Search Iterative heuristic using two different implementation strategies to solve the OSPF weight setting problem for link failure scenarios. We evaluate these heuristics and show through experimental results that both heuristics efficiently handle weight setting for the failure state. A comparison of both strategies is also presented. △ Less

Submitted 20 January, 2011; originally announced January 2011.

Journal ref: International Journal of Computer Networks & Communications (IJCNC), pp:168-183, Vol. 3, No. 1, January 2011

Showing 1–11 of 11 results for author: Sait, S