-
Accelerated Medicines Development using a Digital Formulator and a Self-Driving Tableting DataFactory
Authors:
Faisal Abbas,
Mohammad Salehian,
Peter Hou,
Jonathan Moores,
Jonathan Goldie,
Alexandros Tsioutsios,
Victor Portela,
Quentin Boulay,
Roland Thiolliere,
Ashley Stark,
Jean-Jacques Schwartz,
Jerome Guerin,
Andrew G. P. Maloney,
Alexandru A. Moldovan,
Gavin K. Reynolds,
Jérôme Mantanus,
Catriona Clark,
Paul Chapman,
Alastair Florence,
Daniel Markl
Abstract:
Pharmaceutical tablet formulation and process development, traditionally a complex and multi-dimensional decision-making process, necessitates extensive experimentation and resources, often resulting in suboptimal solutions. This study presents an integrated platform for tablet formulation and manufacturing, built around a Digital Formulator and a Self-Driving Tableting DataFactory. By combining p…
▽ More
Pharmaceutical tablet formulation and process development, traditionally a complex and multi-dimensional decision-making process, necessitates extensive experimentation and resources, often resulting in suboptimal solutions. This study presents an integrated platform for tablet formulation and manufacturing, built around a Digital Formulator and a Self-Driving Tableting DataFactory. By combining predictive modelling, optimisation algorithms, and automation, this system offers a material-to-product approach to predict and optimise critical quality attributes for different formulations, linking raw material attributes to key blend and tablet properties, such as flowability, porosity, and tensile strength. The platform leverages the Digital Formulator, an in-silico optimisation framework that employs a hybrid system of models - melding data-driven and mechanistic models - to identify optimal formulation settings for manufacturability. Optimised formulations then proceed through the self-driving Tableting DataFactory, which includes automated powder dosing, tablet compression and performance testing, followed by iterative refinement of process parameters through Bayesian optimisation methods. This approach accelerates the timeline from material characterisation to development of an in-specification tablet within 6 hours, utilising less than 5 grams of API, and manufacturing small batch sizes of up to 1,440 tablets with augmented and mixed reality enabled real-time quality control within 24 hours. Validation across multiple APIs and drug loadings underscores the platform's capacity to reliably meet target quality attributes, positioning it as a transformative solution for accelerated and resource-efficient pharmaceutical development.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection
Authors:
Areeg Fahad Rasheed,
M. Zarkoosh,
Shimam Amer Chasib,
Safa F. Abbas
Abstract:
The primary objective of this study is to demonstrate the impact of data augmentation using ChatGPT-4o-mini on food hazard and product analysis. The augmented data is generated using ChatGPT-4o-mini and subsequently used to train two large language models: RoBERTa-base and Flan-T5-base. The models are evaluated on test sets. The results indicate that using augmented data helped improve model perfo…
▽ More
The primary objective of this study is to demonstrate the impact of data augmentation using ChatGPT-4o-mini on food hazard and product analysis. The augmented data is generated using ChatGPT-4o-mini and subsequently used to train two large language models: RoBERTa-base and Flan-T5-base. The models are evaluated on test sets. The results indicate that using augmented data helped improve model performance across key metrics, including recall, F1 score, precision, and accuracy, compared to using only the provided dataset. The full code, including model training and the augmented dataset, can be found in this repository: https://github.com/AREEG94FAHAD/food-hazard-prdouct-cls
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Performance Evaluation of Deep Learning Models for Water Quality Index Prediction: A Comparative Study of LSTM, TCN, ANN, and MLP
Authors:
Muhammad Ismail,
Farkhanda Abbas,
Shahid Munir Shah,
Mahmoud Aljawarneh,
Lachhman Das Dhomeja,
Fazila Abbas,
Muhammad Shoaib,
Abdulwahed Fahad Alrefaei,
Mohammed Fahad Albeshr
Abstract:
Environmental monitoring and predictive modeling of the Water Quality Index (WQI) through the assessment of the water quality.
Environmental monitoring and predictive modeling of the Water Quality Index (WQI) through the assessment of the water quality.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks
Authors:
Areeg Fahad Rasheed,
M. Zarkoosh,
Safa F. Abbas,
Sana Sabah Al-Azzawi
Abstract:
This paper addresses the challenge of classifying and assigning programming tasks to experts, a process that typically requires significant effort, time, and cost. To tackle this issue, a novel dataset containing a total of 4,112 programming tasks was created by extracting tasks from various websites. Web scraping techniques were employed to collect this dataset of programming problems systematica…
▽ More
This paper addresses the challenge of classifying and assigning programming tasks to experts, a process that typically requires significant effort, time, and cost. To tackle this issue, a novel dataset containing a total of 4,112 programming tasks was created by extracting tasks from various websites. Web scraping techniques were employed to collect this dataset of programming problems systematically. Specific HTML tags were tracked to extract key elements of each issue, including the title, problem description, input-output, examples, problem class, and complexity score. Examples from the dataset are provided in the appendix to illustrate the variety and complexity of tasks included. The dataset's effectiveness has been evaluated and benchmarked using two approaches; the first approach involved fine-tuning the FLAN-T5 small model on the dataset, while the second approach used in-context learning (ICL) with the GPT-4o mini. The performance was assessed using standard metrics: accuracy, recall, precision, and F1-score. The results indicated that in-context learning with GPT-4o-mini outperformed the FLAN-T5 model.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images
Authors:
Syed Farhan Abbas,
Nguyen Thanh Duc,
Yoonguu Song,
Kyungwon Kim,
Ekta Srivastava,
Boreom Lee
Abstract:
Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural ne…
▽ More
Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural networks (CNNs) suffer from performance degradation when the batch size is small, and deeper networks experience the vanishing gradient problem. Methods: In this paper, we attempt to solve these limitations and propose the 3D cerebrovascular attention UNet method, named CV-AttentionUNet, for precise extraction of brain vessel images. We proposed a sequence of preprocessing techniques followed by deeply supervised UNet to improve the accuracy of segmentation of the brain vessels leading to a stroke. To combine the low and high semantics, we applied the attention mechanism. This mechanism focuses on relevant associations and neglects irrelevant anatomical information. Furthermore, the inclusion of deep supervision incorporates different levels of features that prove to be beneficial for network convergence. Results: We demonstrate the efficiency of the proposed method by cross-validating with an unlabeled dataset, which was further labeled by us. We believe that the novelty of this algorithm lies in its ability to perform well on both labeled and unlabeled data with image processing-based enhancement. The results indicate that our method performed better than the existing state-of-the-art methods on the TubeTK dataset. Conclusion: The proposed method will help in accurate segmentation of cerebrovascular structure leading to stroke
△ Less
Submitted 19 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Evaluation of biometric user authentication using an ensemble classifier with face and voice recognition
Authors:
Firas Abbaas,
Gursel Serpen
Abstract:
This paper presents a biometric user authentication system based on an ensemble design that employs face and voice recognition classifiers. The design approach entails development and performance evaluation of individual classifiers for face and voice recognition and subsequent integration of the two within an ensemble framework. Performance evaluation employed three benchmark datasets, which are…
▽ More
This paper presents a biometric user authentication system based on an ensemble design that employs face and voice recognition classifiers. The design approach entails development and performance evaluation of individual classifiers for face and voice recognition and subsequent integration of the two within an ensemble framework. Performance evaluation employed three benchmark datasets, which are NIST Feret face, Yale Extended face, and ELSDSR voice. Performance evaluation of the ensemble design on the three benchmark datasets indicates that the bimodal authentication system offers significant improvements for accuracy, precision, true negative rate, and true positive rate metrics at or above 99% while generating minimal false positive and negative rates of less than 1%.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
One Size Does Not Fit All: Modeling Users' Personal Curiosity in Recommender Systems
Authors:
Fakhri Abbas,
Xi Niu
Abstract:
Today's recommender systems are criticized for recommending items that are too obvious to arouse users' interest. That's why the recommender systems research community has advocated some "beyond accuracy" evaluation metrics such as novelty, diversity, coverage, and serendipity with the hope of promoting information discovery and sustain users' interest over a long period of time. While bringing in…
▽ More
Today's recommender systems are criticized for recommending items that are too obvious to arouse users' interest. That's why the recommender systems research community has advocated some "beyond accuracy" evaluation metrics such as novelty, diversity, coverage, and serendipity with the hope of promoting information discovery and sustain users' interest over a long period of time. While bringing in new perspectives, most of these evaluation metrics have not considered individual users' difference: an open-minded user may favor highly novel or diversified recommendations whereas a conservative user's appetite for novelty or diversity may not be that large. In this paper, we developed a model to approximate an individual's curiosity distribution over different levels of stimuli guided by the well-known Wundt curve in Psychology. We measured an item's surprise level to assess the stimulation level and whether it is in the range of the user's appetite for stimulus. We then proposed a recommendation system framework that considers both user preference and appetite for stimulus where the curiosity is maximally aroused. Our framework differs from a typical recommender system in that it leverages human's curiosity to promote intrinsic interest with the system. A series of evaluation experiments have been conducted to show that our framework is able to rank higher the items with not only high ratings but also high response likelihood. The recommendation list generated by our algorithm has higher potential of inspiring user curiosity compared to traditional approaches. The personalization factor for assessing the stimulus (surprise) strength further helps the recommender achieve smaller (better) inter-user similarity.
△ Less
Submitted 15 February, 2020; v1 submitted 28 June, 2019;
originally announced July 2019.
-
Deep Ptych: Subsampled Fourier Ptychography using Generative Priors
Authors:
Fahad Shamshad,
Farwa Abbas,
Ali Ahmed
Abstract:
This paper proposes a novel framework to regularize the highly ill-posed and non-linear Fourier ptychography problem using generative models. We demonstrate experimentally that our proposed algorithm, Deep Ptych, outperforms the existing Fourier ptychography techniques, in terms of quality of reconstruction and robustness against noise, using far fewer samples. We further modify the proposed appro…
▽ More
This paper proposes a novel framework to regularize the highly ill-posed and non-linear Fourier ptychography problem using generative models. We demonstrate experimentally that our proposed algorithm, Deep Ptych, outperforms the existing Fourier ptychography techniques, in terms of quality of reconstruction and robustness against noise, using far fewer samples. We further modify the proposed approach to allow the generative model to explore solutions outside the range, leading to improved performance.
△ Less
Submitted 22 December, 2018;
originally announced December 2018.