-
ClearLines - Camera Calibration from Straight Lines
Authors:
Gregory Schroeder,
Mohamed Sabry,
Cristina Olaverri-Monreal
Abstract:
The problem of calibration from straight lines is fundamental in geometric computer vision, with well-established theoretical foundations. However, its practical applicability remains limited, particularly in real-world outdoor scenarios. These environments pose significant challenges due to diverse and cluttered scenes, interrupted reprojections of straight 3D lines, and varying lighting conditio…
▽ More
The problem of calibration from straight lines is fundamental in geometric computer vision, with well-established theoretical foundations. However, its practical applicability remains limited, particularly in real-world outdoor scenarios. These environments pose significant challenges due to diverse and cluttered scenes, interrupted reprojections of straight 3D lines, and varying lighting conditions, making the task notoriously difficult. Furthermore, the field lacks a dedicated dataset encouraging the development of respective detection algorithms. In this study, we present a small dataset named "ClearLines", and by detailing its creation process, provide practical insights that can serve as a guide for developing and refining straight 3D line detection algorithms.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Shadow Erosion and Nighttime Adaptability for Camera-Based Automated Driving Applications
Authors:
Mohamed Sabry,
Gregory Schroeder,
Joshua Varughese,
Cristina Olaverri-Monreal
Abstract:
Enhancement of images from RGB cameras is of particular interest due to its wide range of ever-increasing applications such as medical imaging, satellite imaging, automated driving, etc. In autonomous driving, various techniques are used to enhance image quality under challenging lighting conditions. These include artificial augmentation to improve visibility in poor nighttime conditions, illumina…
▽ More
Enhancement of images from RGB cameras is of particular interest due to its wide range of ever-increasing applications such as medical imaging, satellite imaging, automated driving, etc. In autonomous driving, various techniques are used to enhance image quality under challenging lighting conditions. These include artificial augmentation to improve visibility in poor nighttime conditions, illumination-invariant imaging to reduce the impact of lighting variations, and shadow mitigation to ensure consistent image clarity in bright daylight. This paper proposes a pipeline for Shadow Erosion and Nighttime Adaptability in images for automated driving applications while preserving color and texture details. The Shadow Erosion and Nighttime Adaptability pipeline is compared to the widely used CLAHE technique and evaluated based on illumination uniformity and visual perception quality metrics. The results also demonstrate a significant improvement over CLAHE, enhancing a YOLO-based drivable area segmentation algorithm.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
From Idea to Implementation: Evaluating the Influence of Large Language Models in Software Development -- An Opinion Paper
Authors:
Sargam Yadav,
Asifa Mehmood Qureshi,
Abhishek Kaushik,
Shubham Sharma,
Roisin Loughran,
Subramaniam Kazhuparambil,
Andrew Shaw,
Mohammed Sabry,
Niamh St John Lynch,
. Nikhil Singh,
Padraic O'Hara,
Pranay Jaiswal,
Roshan Chandru,
David Lillis
Abstract:
The introduction of transformer architecture was a turning point in Natural Language Processing (NLP). Models based on the transformer architecture such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformer (GPT) have gained widespread popularity in various applications such as software development and education. The availability of Large Language…
▽ More
The introduction of transformer architecture was a turning point in Natural Language Processing (NLP). Models based on the transformer architecture such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformer (GPT) have gained widespread popularity in various applications such as software development and education. The availability of Large Language Models (LLMs) such as ChatGPT and Bard to the general public has showcased the tremendous potential of these models and encouraged their integration into various domains such as software development for tasks such as code generation, debugging, and documentation generation. In this study, opinions from 11 experts regarding their experience with LLMs for software development have been gathered and analysed to draw insights that can guide successful and responsible integration. The overall opinion of the experts is positive, with the experts identifying advantages such as increase in productivity and reduced coding time. Potential concerns and challenges such as risk of over-dependence and ethical considerations have also been highlighted.
△ Less
Submitted 13 June, 2025; v1 submitted 10 March, 2025;
originally announced March 2025.
-
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Authors:
Guoyu Li,
Shengyu Ye,
Chunyun Chen,
Yang Wang,
Fan Yang,
Ting Cao,
Cheng Liu,
Mohamed M. Sabry,
Mao Yang
Abstract:
The emergence of neural network capabilities invariably leads to a significant surge in computational demands due to expanding model sizes and increased computational complexity. To reduce model size and lower inference costs, recent research has focused on simplifying models and designing hardware accelerators using low-bit quantization. However, due to numerical representation limits, scalar qua…
▽ More
The emergence of neural network capabilities invariably leads to a significant surge in computational demands due to expanding model sizes and increased computational complexity. To reduce model size and lower inference costs, recent research has focused on simplifying models and designing hardware accelerators using low-bit quantization. However, due to numerical representation limits, scalar quantization cannot reduce bit width lower than 1-bit, diminishing its benefits. To break through these limitations, we introduce LUT-DLA, a Look-Up Table (LUT) Deep Learning Accelerator Framework that utilizes vector quantization to convert neural network models into LUTs, achieving extreme low-bit quantization. The LUT-DLA framework facilitates efficient and cost-effective hardware accelerator designs and supports the LUTBoost algorithm, which helps to transform various DNN models into LUT-based models via multistage training, drastically cutting both computational and hardware overhead. Additionally, through co-design space exploration, LUT-DLA assesses the impact of various model and hardware parameters to fine-tune hardware configurations for different application scenarios, optimizing performance and efficiency. Our comprehensive experiments show that LUT-DLA achieves improvements in power efficiency and area efficiency with gains of $1.4$~$7.0\times$ and $1.5$~$146.1\times$, respectively, while maintaining only a modest accuracy drop. For CNNs, accuracy decreases by $0.1\%$~$3.1\%$ using the $L_2$ distance similarity, $0.1\%$~$3.4\%$ with the $L_1$ distance similarity, and $0.1\%$~$3.8\%$ when employing the Chebyshev distance similarity. For transformer-based models, the accuracy drop ranges from $1.4\%$ to $3.0\%$.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
Automated Vehicle Driver Monitoring Dataset from Real-World Scenarios
Authors:
Mohamed Sabry,
Walter Morales-Alvarez,
Cristina Olaverri-Monreal
Abstract:
From SAE Level 3 of automation onwards, drivers are allowed to engage in activities that are not directly related to driving during their travel. However, in level 3, a misunderstanding of the capabilities of the system might lead drivers to engage in secondary tasks, which could impair their ability to react to challenging traffic situations.
Anticipating driver activity allows for early detect…
▽ More
From SAE Level 3 of automation onwards, drivers are allowed to engage in activities that are not directly related to driving during their travel. However, in level 3, a misunderstanding of the capabilities of the system might lead drivers to engage in secondary tasks, which could impair their ability to react to challenging traffic situations.
Anticipating driver activity allows for early detection of risky behaviors, to prevent accidents. To be able to predict the driver activity, a Deep Learning network needs to be trained on a dataset. However, the use of datasets based on simulation for training and the migration to real-world data for prediction has proven to be suboptimal. Hence, this paper presents a real-world driver activity dataset, openly accessible on IEEE Dataport, which encompasses various activities that occur in autonomous driving scenarios under various illumination and weather conditions. Results from the training process showed that the dataset provides an excellent benchmark for implementing models for driver activity recognition.
△ Less
Submitted 26 March, 2025; v1 submitted 19 August, 2024;
originally announced August 2024.
-
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation
Authors:
Mohammed Khalil,
Mohammed Sabry
Abstract:
Classical Arabic represents a significant era, encompassing the golden age of Arab culture, philosophy, and scientific literature. With a broad consensus on the importance of translating these literatures to enrich knowledge dissemination across communities, the advent of large language models (LLMs) and translation systems offers promising tools to facilitate this goal. However, we have identifie…
▽ More
Classical Arabic represents a significant era, encompassing the golden age of Arab culture, philosophy, and scientific literature. With a broad consensus on the importance of translating these literatures to enrich knowledge dissemination across communities, the advent of large language models (LLMs) and translation systems offers promising tools to facilitate this goal. However, we have identified a scarcity of translation datasets in Classical Arabic, which are often limited in scope and topics, hindering the development of high-quality translation systems. In response, we present the ATHAR dataset, comprising 66,000 high-quality Classical Arabic to English translation samples that cover a wide array of subjects including science, culture, and philosophy. Furthermore, we assess the performance of current state-of-the-art LLMs under various settings, concluding that there is a need for such datasets in current systems. Our findings highlight how models can benefit from fine-tuning or incorporating this dataset into their pretraining pipelines. The dataset is publicly available on the HuggingFace Data Hub at \url{https://huggingface.co/datasets/mohamed-khalil/ATHAR}.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods
Authors:
Mohammed Sabry,
Anya Belz
Abstract:
As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a…
▽ More
As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques
Authors:
Mohammed Sabry,
Anya Belz
Abstract:
Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (…
▽ More
Recent parameter-efficient finetuning (PEFT) techniques aim to improve over the considerable cost of fully finetuning large pretrained language models (PLM). As different PEFT techniques proliferate, it is becoming difficult to compare them, in particular in terms of (i) the structure and functionality they add to the PLM, (ii) the different types and degrees of efficiency improvements achieved, (iii) performance at different downstream tasks, and (iv) how differences in structure and functionality relate to efficiency and task performance. To facilitate such comparisons, this paper presents a reference architecture which standardises aspects shared by different PEFT techniques, while isolating differences to specific locations and interactions with the standard components. Through this process of standardising and isolating differences, a modular view of PEFT techniques emerges, supporting not only direct comparison of different techniques and their efficiency and task performance, but also systematic exploration of reusability and composability of the different types of finetuned modules. We demonstrate how the reference architecture can be applied to understand properties and relative advantages of PEFT techniques, hence to inform selection of techniques for specific tasks, and design choices for new PEFT techniques.
△ Less
Submitted 19 October, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin
Authors:
Bonaventure F. P. Dossou,
Mohammed Sabry
Abstract:
From Word2Vec to GloVe, word embedding models have played key roles in the current state-of-the-art results achieved in Natural Language Processing. Designed to give significant and unique vectorized representations of words and entities, those models have proven to efficiently extract similarities and establish relationships reflecting semantic and contextual meaning among words and entities. Afr…
▽ More
From Word2Vec to GloVe, word embedding models have played key roles in the current state-of-the-art results achieved in Natural Language Processing. Designed to give significant and unique vectorized representations of words and entities, those models have proven to efficiently extract similarities and establish relationships reflecting semantic and contextual meaning among words and entities. African Languages, representing more than 31% of the worldwide spoken languages, have recently been subject to lots of research. However, to the best of our knowledge, there are currently very few to none word embedding models for those languages words and entities, and none for the languages under study in this paper. After describing Glove, Word2Vec, and Poincaré embeddings functionalities, we build Word2Vec and Poincaré word embedding models for Fon and Nobiin, which show promising results. We test the applicability of transfer learning between these models as a landmark for African Languages to jointly involve in mitigating the scarcity of their resources, and attempt to provide linguistic and social interpretations of our results. Our main contribution is to arouse more interest in creating word embedding models proper to African Languages, ready for use, and that can significantly improve the performances of Natural Language Processing downstream tasks on them. The official repository and implementation is at https://github.com/bonaventuredossou/afrivec
△ Less
Submitted 18 March, 2021; v1 submitted 8 March, 2021;
originally announced March 2021.
-
ArchiveSafe: Mass-Leakage-Resistant Storage from Proof-of-Work
Authors:
Moe Sabry,
Reza Samavi,
Douglas Stebila
Abstract:
Data breaches-mass leakage of stored information-are a major security concern. Encryption can provide confidentiality, but encryption depends on a key which, if compromised, allows the attacker to decrypt everything, effectively instantly. Security of encrypted data thus becomes a question of protecting the encryption keys. In this paper, we propose using keyless encryption to construct a mass lea…
▽ More
Data breaches-mass leakage of stored information-are a major security concern. Encryption can provide confidentiality, but encryption depends on a key which, if compromised, allows the attacker to decrypt everything, effectively instantly. Security of encrypted data thus becomes a question of protecting the encryption keys. In this paper, we propose using keyless encryption to construct a mass leakage resistant archiving system, where decryption of a file is only possible after the requester, whether an authorized user or an adversary, completes a proof of work in the form of solving a cryptographic puzzle. This proposal is geared towards protection of infrequently-accessed archival data, where any one file may not require too much work to decrypt, decryption of a large number of files-mass leakage-becomes increasingly expensive for an attacker. We present a prototype implementation realized as a user-space file system driver for Linux. We report experimental results of system behaviour under different file sizes and puzzle difficulty levels. Our keyless encryption technique can be added as a layer on top of traditional encryption: together they provide strong security against adversaries without the key and resistance against mass decryption by an attacker.
△ Less
Submitted 14 October, 2020; v1 submitted 31 August, 2020;
originally announced September 2020.
-
On the Reduction of Variance and Overestimation of Deep Q-Learning
Authors:
Mohammed Sabry,
Amr M. A. Khalifa
Abstract:
The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues…
▽ More
The breakthrough of deep Q-Learning on different types of environments revolutionized the algorithmic design of Reinforcement Learning to introduce more stable and robust algorithms, to that end many extensions to deep Q-Learning algorithm have been proposed to reduce the variance of the target values and the overestimation phenomena. In this paper, we examine new methodology to solve these issues, we propose using Dropout techniques on deep Q-Learning algorithm as a way to reduce variance and overestimation. We also present experiments conducted on benchmark environments, demonstrating the effectiveness of our methodology in enhancing stability and reducing both variance and overestimation in model performance.
△ Less
Submitted 14 April, 2024; v1 submitted 14 October, 2019;
originally announced October 2019.
-
TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks
Authors:
Lile Cai,
Anne-Maelle Barneche,
Arthur Herbout,
Chuan Sheng Foo,
Jie Lin,
Vijay Ramaseshan Chandrasekhar,
Mohamed M. Sabry
Abstract:
Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing hardware accelerators for CNNs that provide improved inference performa…
▽ More
Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing hardware accelerators for CNNs that provide improved inference performance and energy consumption compared to GPUs. Such embedded deep learning platforms differ in the amount of compute resources and memory-access bandwidth, which would affect performance and energy consumption of CNNs. It is therefore critical to consider the available hardware resources in the network architecture search. To this end, we introduce TEA-DNN, a NAS algorithm targeting multi-objective optimization of execution time, energy consumption, and classification accuracy of CNN workloads on embedded architectures. TEA-DNN leverages energy and execution time measurements on embedded hardware when exploring the Pareto-optimal curves across accuracy, execution time, and energy consumption and does not require additional effort to model the underlying hardware. We apply TEA-DNN for image classification on actual embedded platforms (NVIDIA Jetson TX2 and Intel Movidius Neural Compute Stick). We highlight the Pareto-optimal operating points that emphasize the necessity to explicitly consider hardware characteristics in the search process. To the best of our knowledge, this is the most comprehensive study of Pareto-optimal models across a range of hardware platforms using actual measurements on hardware to obtain objective values.
△ Less
Submitted 21 October, 2019; v1 submitted 29 November, 2018;
originally announced November 2018.