-
Detecting Distributed Denial of Service Attacks Using Logistic Regression and SVM Methods
Authors:
Mohammad Arafat Ullah,
Arthy Anjum,
Rashedul Amin Tuhin,
Shamim Akhter
Abstract:
A distributed denial-of-service (DDoS) attack is an attempt to produce humongous traffic within a network by overwhelming a targeted server or its neighboring infrastructure with a flood of service requests ceaselessly coming from multiple remotely controlled malware-infected computers or network-connected devices. Thus, exploring DDoS attacks by recognizing their functionalities and differentiati…
▽ More
A distributed denial-of-service (DDoS) attack is an attempt to produce humongous traffic within a network by overwhelming a targeted server or its neighboring infrastructure with a flood of service requests ceaselessly coming from multiple remotely controlled malware-infected computers or network-connected devices. Thus, exploring DDoS attacks by recognizing their functionalities and differentiating them from normal traffic services are the primary concerns of network security issues particularly for online businesses. In modern networks, most DDoS attacks occur in the network and application layer including HTTP flood, UDP flood, SIDDOS, SMURF, SNMP flood, IP NULL, etc. The goal of this paper is to detect DDoS attacks from all service requests and classify them according to DDoS classes. In this regard, a standard dataset is collected from the internet which contains several network-related attributes and their corresponding DDoS attack class name. Two(2) different machine learning approaches, SVM and Logistic Regression, are implemented in the dataset for detecting and classifying DDoS attacks, and a comparative study is accomplished among them in terms of accuracy, precision, and recall rates. Logistic Regression and SVM both achieve 98.65% classification accuracy which is the highest achieved accuracy among other previous experiments with the same dataset.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Physics Encoded Blocks in Residual Neural Network Architectures for Digital Twin Models
Authors:
Muhammad Saad Zia,
Ashiq Anjum,
Lu Liu,
Anthony Conway,
Anasol Pena Rios
Abstract:
Physics Informed Machine Learning has emerged as a popular approach for modeling and simulation in digital twins, enabling the generation of accurate models of processes and behaviors in real-world systems. However, existing methods either rely on simple loss regularizations that offer limited physics integration or employ highly specialized architectures that are difficult to generalize across di…
▽ More
Physics Informed Machine Learning has emerged as a popular approach for modeling and simulation in digital twins, enabling the generation of accurate models of processes and behaviors in real-world systems. However, existing methods either rely on simple loss regularizations that offer limited physics integration or employ highly specialized architectures that are difficult to generalize across diverse physical systems. This paper presents a generic approach based on a novel physics-encoded residual neural network (PERNN) architecture that seamlessly combines data-driven and physics-based analytical models to overcome these limitations. Our method integrates differentiable physics blocks-implementing mathematical operators from physics-based models with feed-forward learning blocks, while intermediate residual blocks ensure stable gradient flow during training. Consequently, the model naturally adheres to the underlying physical principles even when prior physics knowledge is incomplete, thereby improving generalizability with low data requirements and reduced model complexity. We investigate our approach in two application domains. The first is a steering model for autonomous vehicles in a simulation environment, and the second is a digital twin for climate modeling using an ordinary differential equation (ODE)-based model of Net Ecosystem Exchange (NEE) to enable gap-filling in flux tower data. In both cases, our method outperforms conventional neural network approaches as well as state-of-the-art Physics Informed Machine Learning methods.
△ Less
Submitted 7 July, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Synaptic Modulation using Interspike Intervals Increases Energy Efficiency of Spiking Neural Networks
Authors:
Dylan Adams,
Magda Zajaczkowska,
Ashiq Anjum,
Andrea Soltoggio,
Shirin Dora
Abstract:
Despite basic differences between Spiking Neural Networks (SNN) and Artificial Neural Networks (ANN), most research on SNNs involve adapting ANN-based methods for SNNs. Pruning (dropping connections) and quantization (reducing precision) are often used to improve energy efficiency of SNNs. These methods are very effective for ANNs whose energy needs are determined by signals transmitted on synapse…
▽ More
Despite basic differences between Spiking Neural Networks (SNN) and Artificial Neural Networks (ANN), most research on SNNs involve adapting ANN-based methods for SNNs. Pruning (dropping connections) and quantization (reducing precision) are often used to improve energy efficiency of SNNs. These methods are very effective for ANNs whose energy needs are determined by signals transmitted on synapses. However, the event-driven paradigm in SNNs implies that energy is consumed by spikes. In this paper, we propose a new synapse model whose weights are modulated by Interspike Intervals (ISI) i.e. time difference between two spikes. SNNs composed of this synapse model, termed ISI Modulated SNNs (IMSNN), can use gradient descent to estimate how the ISI of a neuron changes after updating its synaptic parameters. A higher ISI implies fewer spikes and vice-versa. The learning algorithm for IMSNNs exploits this information to selectively propagate gradients such that learning is achieved by increasing the ISIs resulting in a network that generates fewer spikes. The performance of IMSNNs with dense and convolutional layers have been evaluated in terms of classification accuracy and the number of spikes using the MNIST and FashionMNIST datasets. The performance comparison with conventional SNNs shows that IMSNNs exhibit upto 90% reduction in the number of spikes while maintaining similar classification accuracy.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Tensor Train Low-rank Approximation (TT-LoRA): Democratizing AI with Accelerated LLMs
Authors:
Afia Anjum,
Maksim E. Eren,
Ismael Boureima,
Boian Alexandrov,
Manish Bhattarai
Abstract:
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To ad…
▽ More
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing (NLP) tasks, such as question-answering, sentiment analysis, text summarization, and machine translation. However, the ever-growing complexity of LLMs demands immense computational resources, hindering the broader research and application of these models. To address this, various parameter-efficient fine-tuning strategies, such as Low-Rank Approximation (LoRA) and Adapters, have been developed. Despite their potential, these methods often face limitations in compressibility. Specifically, LoRA struggles to scale effectively with the increasing number of trainable parameters in modern large scale LLMs. Additionally, Low-Rank Economic Tensor-Train Adaptation (LoRETTA), which utilizes tensor train decomposition, has not yet achieved the level of compression necessary for fine-tuning very large scale models with limited resources. This paper introduces Tensor Train Low-Rank Approximation (TT-LoRA), a novel parameter-efficient fine-tuning (PEFT) approach that extends LoRETTA with optimized tensor train (TT) decomposition integration. By eliminating Adapters and traditional LoRA-based structures, TT-LoRA achieves greater model compression without compromising downstream task performance, along with reduced inference latency and computational overhead. We conduct an exhaustive parameter search to establish benchmarks that highlight the trade-off between model compression and performance. Our results demonstrate significant compression of LLMs while maintaining comparable performance to larger models, facilitating their deployment on resource-constraint platforms.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
The Ink Splotch Effect: A Case Study on ChatGPT as a Co-Creative Game Designer
Authors:
Asad Anjum,
Yuting Li,
Noelle Law,
M Charity,
Julian Togelius
Abstract:
This paper studies how large language models (LLMs) can act as effective, high-level creative collaborators and ``muses'' for game design. We model the design of this study after the exercises artists use by looking at amorphous ink splotches for creative inspiration. Our goal is to determine whether AI-assistance can improve, hinder, or provide an alternative quality to games when compared to the…
▽ More
This paper studies how large language models (LLMs) can act as effective, high-level creative collaborators and ``muses'' for game design. We model the design of this study after the exercises artists use by looking at amorphous ink splotches for creative inspiration. Our goal is to determine whether AI-assistance can improve, hinder, or provide an alternative quality to games when compared to the creative intents implemented by human designers. The capabilities of LLMs as game designers are stress tested by placing it at the forefront of the decision making process. Three prototype games are designed across 3 different genres: (1) a minimalist base game, (2) a game with features and game feel elements added by a human game designer, and (3) a game with features and feel elements directly implemented from prompted outputs of the LLM, ChatGPT. A user study was conducted and participants were asked to blindly evaluate the quality and their preference of these games. We discuss both the development process of communicating creative intent to an AI chatbot and the synthesized open feedback of the participants. We use this data to determine both the benefits and shortcomings of AI in a more design-centric role.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
BLP-2023 Task 2: Sentiment Analysis
Authors:
Md. Arid Hasan,
Firoj Alam,
Anika Anjum,
Shudipta Das,
Afiyat Anjum
Abstract:
We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop, co-located with EMNLP 2023. The task is defined as the detection of sentiment in a given piece of social media text. This task attracted interest from 71 participants, among whom 29 and 30 teams submitted systems during the development and evaluation phases, respectively. In total, partic…
▽ More
We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop, co-located with EMNLP 2023. The task is defined as the detection of sentiment in a given piece of social media text. This task attracted interest from 71 participants, among whom 29 and 30 teams submitted systems during the development and evaluation phases, respectively. In total, participants submitted 597 runs. However, a total of 15 teams submitted system description papers. The range of approaches in the submitted systems spans from classical machine learning models, fine-tuning pre-trained models, to leveraging Large Language Model (LLMs) in zero- and few-shot settings. In this paper, we provide a detailed account of the task setup, including dataset development and evaluation setup. Additionally, we provide a brief overview of the systems submitted by the participants. All datasets and evaluation scripts from the shared task have been made publicly available for the research community, to foster further research in this domain.
△ Less
Submitted 21 February, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis
Authors:
Md. Arid Hasan,
Shudipta Das,
Afiyat Anjum,
Firoj Alam,
Anika Anjum,
Avijit Sarker,
Sheak Rashed Haider Noori
Abstract:
The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the rec…
▽ More
The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.
△ Less
Submitted 4 April, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
The least-used key selection method for information retrieval in large-scale Cloud-based service repositories
Authors:
Jiayan Gu,
Ashiq Anjum,
Yan Wu,
Lu Liu,
John Panneerselvam,
Yao Lu,
Bo Yuan
Abstract:
As the number of devices connected to the Internet of Things (IoT) increases significantly, it leads to an exponential growth in the number of services that need to be processed and stored in the large-scale Cloud-based service repositories. An efficient service indexing model is critical for service retrieval and management of large-scale Cloud-based service repositories. The multilevel index mod…
▽ More
As the number of devices connected to the Internet of Things (IoT) increases significantly, it leads to an exponential growth in the number of services that need to be processed and stored in the large-scale Cloud-based service repositories. An efficient service indexing model is critical for service retrieval and management of large-scale Cloud-based service repositories. The multilevel index model is the state-of-art service indexing model in recent years to improve service discovery and combination. This paper aims to optimize the model to consider the impact of unequal appearing probability of service retrieval request parameters and service input parameters on service retrieval and service addition operations. The least-used key selection method has been proposed to narrow the search scope of service retrieval and reduce its time. The experimental results show that the proposed least-used key selection method improves the service retrieval efficiency significantly compared with the designated key selection method in the case of the unequal appearing probability of parameters in service retrieval requests under three indexing models.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Non-Fungible Tokens in Business and Management -- A Review
Authors:
Najam A. Anjum,
Mubashir Husain Rehmani
Abstract:
Non-Fungible Tokens (NFTs) are a new development in blockchain technology. News around NFTs is surrounded by skepticism because unrealistically high prices are being paid online for these NFTs which are in the form of apparently simple digital arts and photographs. It is not clear if this is a trend, a hype, a bubble, or a legitimate novel way of holding and trading value. A literature review of p…
▽ More
Non-Fungible Tokens (NFTs) are a new development in blockchain technology. News around NFTs is surrounded by skepticism because unrealistically high prices are being paid online for these NFTs which are in the form of apparently simple digital arts and photographs. It is not clear if this is a trend, a hype, a bubble, or a legitimate novel way of holding and trading value. A literature review of peer-reviewed scholarly studies, performed in the context of business and management, is presented here. Moreover, we also discuss open issues, and challenges, and present future research directions. Analysis of these studies reveal that schools of thoughts are divided on the validity of this form of digital tokens. On one hand, there is a lot of criticism but on the other hand, we can find novel business models and applications of NFTs especially the feature of smart contracts. It can, therefore, be concluded that NFTs, even if not in their current form, are here to stay and may promise new ways of protecting digital assets in an immutable and easily traceable form.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Digital Twinning Remote Laboratories for Online Practical Learning
Authors:
Claire Palmer,
Ben Roullier,
Muhammad Aamir,
Frank McQuade,
Leonardo Stella,
Ashiq Anjum
Abstract:
The COVID19 pandemic has demonstrated a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. It is also costly to employ teams of system analysts, developers and 3D artists. There is a requirement to provide a simple method to enable le…
▽ More
The COVID19 pandemic has demonstrated a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. It is also costly to employ teams of system analysts, developers and 3D artists. There is a requirement to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research has been undertaken into developing generic models to enable the semi-automatic creation of a virtual learning tools for subjects that require practical interactions with the lab resources. In addition to the system for creating digital twins, a case study describing the creation of a virtual learning application for an electrical laboratory tutorial has been presented.
△ Less
Submitted 21 July, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Automated News Summarization Using Transformers
Authors:
Anushka Gupta,
Diksha Chugh,
Anjum,
Rahul Katarya
Abstract:
The amount of text data available online is increasing at a very fast pace hence text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence generating automated summaries for the data and usi…
▽ More
The amount of text data available online is increasing at a very fast pace hence text summarization has become essential. Most of the modern recommender and text classification systems require going through a huge amount of data. Manually generating precise and fluent summaries of lengthy articles is a very tiresome and time-consuming task. Hence generating automated summaries for the data and using it to train machine learning models will make these models space and time-efficient. Extractive summarization and abstractive summarization are two separate methods of generating summaries. The extractive technique identifies the relevant sentences from the original document and extracts only those from the text. Whereas in abstractive summarization techniques, the summary is generated after interpreting the original text, hence making it more complicated. In this paper, we will be presenting a comprehensive comparison of a few transformer architecture based pre-trained models for text summarization. For analysis and comparison, we have used the BBC news dataset that contains text data that can be used for summarization and human generated summaries for evaluating and comparing the summaries generated by machine learning models.
△ Less
Submitted 23 April, 2021;
originally announced August 2021.
-
Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech
Authors:
Tashvik Dhamija,
Anjum,
Rahul Katarya
Abstract:
In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging fro…
▽ More
In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.
△ Less
Submitted 23 April, 2021;
originally announced August 2021.
-
Analysis of Online Toxicity Detection Using Machine Learning Approaches
Authors:
Anjum,
Rahul Katarya
Abstract:
Social media and the internet have become an integral part of how people spread and consume information. Over a period of time, social media evolved dramatically, and almost half of the population is using social media to express their views and opinions. Online hate speech is one of the drawbacks of social media nowadays, which needs to be controlled. In this paper, we will understand how hate sp…
▽ More
Social media and the internet have become an integral part of how people spread and consume information. Over a period of time, social media evolved dramatically, and almost half of the population is using social media to express their views and opinions. Online hate speech is one of the drawbacks of social media nowadays, which needs to be controlled. In this paper, we will understand how hate speech originated and what are the consequences of it; Trends of machine-learning algorithms to solve an online hate speech problem. This study contributes by providing a systematic approach to help researchers to identify a new research direction and elucidating the shortcomings of the studies and model, as well as providing future directions to advance the field.
△ Less
Submitted 23 April, 2021;
originally announced August 2021.
-
Analysing Cyberbullying using Natural Language Processing by Understanding Jargon in Social Media
Authors:
Bhumika Bhatia,
Anuj Verma,
Anjum,
Rahul Katarya
Abstract:
Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity, cyberbullying amongst children and other vulnerable groups are only growing over online classes, and increased access to social platforms, especially post COVID-19. It is paramount to detect and ensure minors' safety across social platforms so that any violence or hate-crime is automatically detected and strict action is…
▽ More
Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity, cyberbullying amongst children and other vulnerable groups are only growing over online classes, and increased access to social platforms, especially post COVID-19. It is paramount to detect and ensure minors' safety across social platforms so that any violence or hate-crime is automatically detected and strict action is taken against it. In our work, we explore binary classification by using a combination of datasets from various social media platforms that cover a wide range of cyberbullying such as sexism, racism, abusive, and hate-speech. We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus, achieving a higher precision in comparison to models without slang preprocessing.
△ Less
Submitted 23 April, 2021;
originally announced July 2021.
-
Cloud based Scalable Object Recognition from Video Streams using Orientation Fusion and Convolutional Neural Networks
Authors:
Muhammad Usman Yaseen,
Ashiq Anjum,
Giancarlo Fortino,
Antonio Liotta,
Amir Hussain
Abstract:
Object recognition from live video streams comes with numerous challenges such as the variation in illumination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method…
▽ More
Object recognition from live video streams comes with numerous challenges such as the variation in illumination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method based on orientation fusion for visual object recognition. The proposed cloud-based video analytics system pioneers the use of bi-dimensional empirical mode decomposition to split a video frame into intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz transform to produce monogenic object components, which are in turn used for the training of CNNs. Past works have demonstrated how the object orientation component may be used to pursue accuracy levels as high as 93\%. Herein we demonstrate how a feature-fusion strategy of the orientation components leads to further improving visual recognition accuracy to 97\%. We also assess the scalability of our method, looking at both the number and the size of the video streams under scrutiny. We carry out extensive experimentation on the publicly available Yale dataset, including also a self generated video datasets, finding significant improvements (both in accuracy and scale), in comparison to AlexNet, LeNet and SE-ResNeXt, which are the three most commonly used deep learning models for visual object recognition and classification.
△ Less
Submitted 19 June, 2021;
originally announced June 2021.
-
Virtual Reality based Digital Twin System for remote laboratories and online practical learning
Authors:
Claire Palmer,
Ben Roullier,
Muhammad Aamir,
Leonardo Stella,
Uchenna Diala,
Ashiq Anjum,
Frank Mcquade,
Keith Cox,
Alex Calvert
Abstract:
There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Resea…
▽ More
There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research is currently being undertaken into developing generic models to enable the semi-automatic creation of a virtual learning application. A case study describing the creation of a virtual learning application for an electrical laboratory tutorial is presented.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Optimization of Service Addition in Multilevel Index Model for Edge Computing
Authors:
Jiayan Gu,
Yan Wu,
Ashiq Anjum,
John Panneerselvam,
Yao Lu,
Bo Yuan
Abstract:
With the development of Edge Computing and Artificial Intelligence (AI) technologies, edge devices are witnessed to generate data at unprecedented volume. The Edge Intelligence (EI) has led to the emergence of edge devices in various application domains. The EI can provide efficient services to delay-sensitive applications, where the edge devices are deployed as edge nodes to host the majority of…
▽ More
With the development of Edge Computing and Artificial Intelligence (AI) technologies, edge devices are witnessed to generate data at unprecedented volume. The Edge Intelligence (EI) has led to the emergence of edge devices in various application domains. The EI can provide efficient services to delay-sensitive applications, where the edge devices are deployed as edge nodes to host the majority of execution, which can effectively manage services and improve service discovery efficiency. The multilevel index model is a well-known model used for indexing service, such a model is being introduced and optimized in the edge environments to efficiently services discovery whilst managing large volumes of data. However, effectively updating the multilevel index model by adding new services timely and precisely in the dynamic Edge Computing environments is still a challenge. Addressing this issue, this paper proposes a designated key selection method to improve the efficiency of adding services in the multilevel index models. Our experimental results show that in the partial index and the full index of multilevel index model, our method reduces the service addition time by around 84% and 76%, respectively when compared with the original key selection method and by around 78% and 66%, respectively when compared with the random selection method. Our proposed method significantly improves the service addition efficiency in the multilevel index model, when compared with existing state-of-the-art key selection methods, without compromising the service retrieval stability to any notable level.
△ Less
Submitted 19 June, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
MobChain: Three-Way Collusion Resistance in Witness-Oriented Location Proof Systems Using Distributed Consensus
Authors:
Faheem Zafar,
Abid Khan,
Saif Ur Rehman Malik,
Adeel Anjum,
Mansoor Ahmed
Abstract:
Smart devices have accentuated the importance of geolocation information. Geolocation identification using smart devices has paved the path for incentive-based location-based services (LBS). A location proof is a digital certificate of the geographical location of a user, which can be used to access various LBS. However, a user full control over a device allows the tampering of location proof. Ini…
▽ More
Smart devices have accentuated the importance of geolocation information. Geolocation identification using smart devices has paved the path for incentive-based location-based services (LBS). A location proof is a digital certificate of the geographical location of a user, which can be used to access various LBS. However, a user full control over a device allows the tampering of location proof. Initially, to resist false proofs, two-party trusted centralized location proof systems (LPS) were introduced to aid the users in generating secure location proofs mutually. However, two-party protocols suffered from the collusion attacks by the participants of the protocol. Consequently, many witness-oriented LPS have emerged to mitigate collusion attacks in two-party protocols. However, witness-oriented LPS presented the possibility of three-way collusion attacks (involving the user, location authority, and the witness). The three-way collusion attacks are inevitable in all existing witness-oriented schemes. To mitigate the inability to resist three-way collusion of existing schemes, in this paper, we introduce a decentralized consensus protocol called as MobChain, where the selection of a witness and location authority is achieved through a distributed consensus of nodes in an underlying P2P network of a private blockchain. The persistent provenance data over the blockchain provides strong security guarantees, as a result, the forging and manipulation become impractical. MobChain provides secure location provenance architecture, relying on decentralized decision making for the selection of participants of the protocol to resist three-way collusion problem. Our prototype implementation and comparison with the state-of-the-art solutions show that MobChain is computationally efficient, highly available while improving the security of LPS.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets
Authors:
Moeez M. Subhani,
Ashiq Anjum
Abstract:
Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a predict…
▽ More
Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Dimensionality Reduction for Sentiment Classification: Evolving for the Most Prominent and Separable Features
Authors:
Aftab Anjum,
Mazharul Islam,
Lin Wang
Abstract:
In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions. In order to make the data less sparse and more statistically significant, the dimensionality reduction techniques are needed. But in the existing dimensionality reduction techniq…
▽ More
In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions. In order to make the data less sparse and more statistically significant, the dimensionality reduction techniques are needed. But in the existing dimensionality reduction techniques, the number of components needs to be set manually which results in loss of the most prominent features, thus reducing the performance of the classifiers. Our prior work, i.e., Term Presence Count (TPC) and Term Presence Ratio (TPR) have proven to be effective techniques as they reject the less separable features. However, the most prominent and separable features might still get removed from the initial feature set despite having higher distributions among positive and negative tagged documents. To overcome this problem, we have proposed a new framework that consists of two-dimensionality reduction techniques i.e., Sentiment Term Presence Count (SentiTPC) and Sentiment Term Presence Ratio (SentiTPR). These techniques reject the features by considering term presence difference for SentiTPC and ratio of the distribution distinction for SentiTPR. Additionally, these methods also analyze the total distribution information. Extensive experimental results exhibit that the proposed framework reduces the feature dimension by a large scale, and thus significantly improve the classification performance.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
A Novel Neural Network-Based Symbolic Regression Method: Neuro-Encoded Expression Programming
Authors:
Aftab Anjum,
Fengyang Sun,
Lin Wang,
Jeff Orchard
Abstract:
Neuro-encoded expression programming(NEEP) that aims to offer a novel continuous representation of combinatorial encoding for genetic programming methods is proposed in this paper. Genetic programming with linear representation uses nature-inspired operators (e.g., crossover, mutation) to tune expressions and finally search out the best explicit function to simulate data. The encoding mechanism is…
▽ More
Neuro-encoded expression programming(NEEP) that aims to offer a novel continuous representation of combinatorial encoding for genetic programming methods is proposed in this paper. Genetic programming with linear representation uses nature-inspired operators (e.g., crossover, mutation) to tune expressions and finally search out the best explicit function to simulate data. The encoding mechanism is essential for genetic programmings to find a desirable solution efficiently. However, the linear representation methods manipulate the expression tree in discrete solution space, where a small change of the input can cause a large change of the output. The unsmooth landscapes destroy the local information and make difficulty in searching. The neuro-encoded expression programming constructs the gene string with recurrent neural network (RNN) and the weights of the network are optimized by powerful continuous evolutionary algorithms. The neural network mappings smoothen the sharp fitness landscape and provide rich neighborhood information to find the best expression. The experiments indicate that the novel approach improves training efficiency and reduces test errors on several well-known symbolic regression problems.
△ Less
Submitted 9 April, 2021; v1 submitted 6 April, 2019;
originally announced April 2019.
-
Towards In-Transit Analytics for Industry 4.0
Authors:
Richard Hill,
James Devitt,
Ashiq Anjum,
Muhammad Ali
Abstract:
Industry 4.0, or Digital Manufacturing, is a vision of inter-connected services to facilitate innovation in the manufacturing sector. A fundamental requirement of innovation is the ability to be able to visualise manufacturing data, in order to discover new insight for increased competitive advantage. This article describes the enabling technologies that facilitate In-Transit Analytics, which is a…
▽ More
Industry 4.0, or Digital Manufacturing, is a vision of inter-connected services to facilitate innovation in the manufacturing sector. A fundamental requirement of innovation is the ability to be able to visualise manufacturing data, in order to discover new insight for increased competitive advantage. This article describes the enabling technologies that facilitate In-Transit Analytics, which is a necessary precursor for Industrial Internet of Things (IIoT) visualisation.
△ Less
Submitted 20 September, 2017;
originally announced October 2017.
-
One-Minute Derivation of The Conjugate Gradient Algorithm
Authors:
Muhammad Ali Raza Anjum
Abstract:
One of the great triumphs in the history of numerical methods was the discovery of the Conjugate Gradient (CG) algorithm. It could solve a symmetric positive-definite system of linear equations of dimension N in exactly N steps. As many practical problems at that time belonged to this category, CG algorithm became rapidly popular. It remains popular even today due to its immense computational powe…
▽ More
One of the great triumphs in the history of numerical methods was the discovery of the Conjugate Gradient (CG) algorithm. It could solve a symmetric positive-definite system of linear equations of dimension N in exactly N steps. As many practical problems at that time belonged to this category, CG algorithm became rapidly popular. It remains popular even today due to its immense computational power. But despite its amazing computational ability, mathematics of this algorithm is not easy to learn. Lengthy derivations, redundant notations, and over-emphasis on formal presentation make it much difficult for a beginner to master this algorithm. This paper aims to serve as a starting point for such readers. It provides a curt, easy-to-follow but minimalist derivation of the algorithm by keeping the sufficient steps only, maintaining a uniform notation, and focusing entirely on the ease of reader.
△ Less
Submitted 30 August, 2016;
originally announced August 2016.
-
A New Approach to Linear Estimation Problem in Multi-user Massive MIMO Systems
Authors:
Muhammad Ali Raza Anjum
Abstract:
A novel approach for solving linear estimation problem in multi-user massive MIMO systems is proposed. In this approach, the difficulty of matrix inversion is attributed to the incomplete definition of the dot product. The general definition of dot product implies that the columns of channel matrix are always orthogonal whereas, in practice, they may be not. If the latter information can be incorp…
▽ More
A novel approach for solving linear estimation problem in multi-user massive MIMO systems is proposed. In this approach, the difficulty of matrix inversion is attributed to the incomplete definition of the dot product. The general definition of dot product implies that the columns of channel matrix are always orthogonal whereas, in practice, they may be not. If the latter information can be incorporated into dot product, then the unknowns can be directly computed from projections without inverting the channel matrix. By doing so, the proposed method is able to achieve an exact solution with a 25% reduction in computational complexity as compared to the QR method. Proposed method is stable, offers an extra flexibility of computing any single unknown, and can be implemented in just twelve lines of code.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Research Traceability using Provenance Services for Biomedical Analysis
Authors:
Ashiq Anjum,
Peter Bloodsworth,
Andrew Branson,
Irfan Habib,
Richard McClatchey,
Tony Solomonides,
the neuGRID Consortium
Abstract:
We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the work…
▽ More
We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the workflow information needed to facilitate users in conducting user analyses. We describe the architecture of the neuGRID provenance service and discuss how the CRISTAL system from CERN is being adapted to address the requirements of the project and then consider how a generalised approach for provenance management could emerge for more generic application to the (Health)Grid community.
△ Less
Submitted 2 March, 2014;
originally announced March 2014.
-
Providing Traceability for Neuroimaging Analyses
Authors:
R. McClatchey,
A. Branson,
A. Anjum,
P. Bloodsworth,
I. Habib,
K. Munir,
J. Shamdasani,
K. Soomro,
the neuGRID Consortium
Abstract:
With the increasingly digital nature of biomedical data and as the complexity of analyses in medical research increases, the need for accurate information capture, traceability and accessibility has become crucial to medical researchers in the pursuance of their research goals. Grid- or Cloud-based technologies, often based on so-called Service Oriented Architectures (SOA), are increasingly being…
▽ More
With the increasingly digital nature of biomedical data and as the complexity of analyses in medical research increases, the need for accurate information capture, traceability and accessibility has become crucial to medical researchers in the pursuance of their research goals. Grid- or Cloud-based technologies, often based on so-called Service Oriented Architectures (SOA), are increasingly being seen as viable solutions for managing distributed data and algorithms in the bio-medical domain. For neuroscientific analyses, especially those centred on complex image analysis, traceability of processes and datasets is essential but up to now this has not been captured in a manner that facilitates collaborative study. Over the past decade, we have been working with mammographers, paediatricians and neuroscientists in three generations of projects to provide the data management and provenance services now required for 21st century medical research. This paper outlines the finding of a requirements study and a resulting system architecture for the production of services to support neuroscientific studies of biomarkers for Alzheimers Disease. The paper proposes a software infrastructure and services that provide the foundation for such support. It introduces the use of the CRISTAL software to provide provenance management as one of a number of services delivered on a SOA, deployed to manage neuroimaging projects that have been studying biomarkers for Alzheimers disease.
△ Less
Submitted 24 February, 2014;
originally announced February 2014.
-
Context-Aware Service Utilisation in the Clouds and Energy Conservation
Authors:
Saad Liaquat Kiani,
Ashiq Anjum,
Nick Antonopoulos,
Michael Knappmeyer,
Nigel Baker,
Richard McClatchey
Abstract:
Ubiquitous computing environments are characterised by smart, interconnected artefacts embedded in our physical world that are projected to provide useful services to human inhabitants unobtrusively. Mobile devices are becoming the primary tools of human interaction with these embedded artefacts and utilisation of services available in smart computing environments such as clouds. Advancements in c…
▽ More
Ubiquitous computing environments are characterised by smart, interconnected artefacts embedded in our physical world that are projected to provide useful services to human inhabitants unobtrusively. Mobile devices are becoming the primary tools of human interaction with these embedded artefacts and utilisation of services available in smart computing environments such as clouds. Advancements in capabilities of mobile devices allow a number of user and environment related context consumers to be hosted on these devices. Without a coordinating component, these context consumers and providers are a potential burden on device resources; specifically the effect of uncoordinated computation and communication with cloud-enabled services can negatively impact the battery life. Therefore energy conservation is a major concern in realising the collaboration and utilisation of mobile device based context-aware applications and cloud based services. This paper presents the concept of a context-brokering component to aid in coordination and communication of context information between mobile devices and services deployed in a cloud infrastructure. A prototype context broker is experimentally analysed for effects on energy conservation when accessing and coordinating with cloud services on a smart device, with results signifying reduction in energy consumption.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
Research Traceability using Provenance Services for Biomedical Analysis
Authors:
Ashiq Anjum,
Peter Bloodsworth,
Andrew Branson,
Irfan Habib,
Richard McClatchey,
Tony Solomonides,
the neuGRID Consortium
Abstract:
We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the work…
▽ More
We outline the approach being developed in the neuGRID project to use provenance management techniques for the purposes of capturing and preserving the provenance data that emerges in the specification and execution of workflows in biomedical analyses. In the neuGRID project a provenance service has been designed and implemented that is intended to capture, store, retrieve and reconstruct the workflow information needed to facilitate users in conducting user analyses. We describe the architecture of the neuGRID provenance service and discuss how the CRISTAL system from CERN is being adapted to address the requirements of the project and then consider how a generalised approach for provenance management could emerge for more generic application to the (Health)Grid community.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
Reusable Services from the neuGRID Project for Grid-Based Health Applications
Authors:
Ashiq Anjum,
Peter Bloodsworth,
Irfan Habib,
Tom Lansdale,
Richard McClatchey,
Yasir Mehmood,
the neuGRID Consortium
Abstract:
By abstracting Grid middleware specific considerations from clinical research applications, re-usable services should be developed that will provide generic functionality aimed specifically at medical applications. In the scope of the neuGRID project, generic services are being designed and developed which will be applied to satisfy the requirements of neuroscientists. These services will bring to…
▽ More
By abstracting Grid middleware specific considerations from clinical research applications, re-usable services should be developed that will provide generic functionality aimed specifically at medical applications. In the scope of the neuGRID project, generic services are being designed and developed which will be applied to satisfy the requirements of neuroscientists. These services will bring together sources of data and computing elements into a single view as far as applications are concerned, making it possible to cope with centralised, distributed or hybrid data and provide native support for common medical file formats. Services will include querying, provenance, portal, anonymization and pipeline services together with a 'glueing' service for connection to Grid services. Thus lower-level services will hide the peculiarities of any specific Grid technology from upper layers, provide application independence and will enable the selection of 'fit-for-purpose' infrastructures. This paper outlines the design strategy being followed in neuGRID using the glueing and pipeline services as examples.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
A Fault Tolerant, Dynamic and Low Latency BDII Architecture for Grids
Authors:
Asif Osman,
Ashiq Anjum,
Naheed Batool,
Richard McClatchey
Abstract:
The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change…
▽ More
The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change in each BDDI information cycle and consequently it may not be necessary to update each parameter in a cycle. It has been demonstrated that significant performance gains can be achieved by exchanging only the information about records that changed during a cycle. Our investigations have led us to implement a low latency and fault tolerant BDII system that involves only minimal data transfer and facilitates secure transactions in a Grid environment.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
An Architecture for Integrated Intelligence in Urban Management using Cloud Computing
Authors:
Zaheer Khan,
David Ludlow,
Richard McClatchey,
Ashiq Anjum
Abstract:
With the emergence of new methodologies and technologies it has now become possible to manage large amounts of environmental sensing data and apply new integrated computing models to acquire information intelligence. This paper advocates the application of cloud capacity to support the information, communication and decision making needs of a wide variety of stakeholders in the complex business of…
▽ More
With the emergence of new methodologies and technologies it has now become possible to manage large amounts of environmental sensing data and apply new integrated computing models to acquire information intelligence. This paper advocates the application of cloud capacity to support the information, communication and decision making needs of a wide variety of stakeholders in the complex business of the management of urban and regional development. The complexity lies in the interactions and impacts embodied in the concept of the urban-ecosystem at various governance levels. This highlights the need for more effective integrated environmental management systems. This paper offers a user-orientated approach based on requirements for an effective management of the urban-ecosystem and the potential contributions that can be supported by the cloud computing community. Furthermore, the commonality of the influence of the drivers of change at the urban level offers the opportunity for the cloud computing community to develop generic solutions that can serve the needs of hundreds of cities from Europe and indeed globally.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
CMS Workflow Execution using Intelligent Job Scheduling and Data Access Strategies
Authors:
Khawar Hasham,
Antonio Delgado Peris,
Ashiq Anjum,
Dave Evans,
Dirk Hufnagel,
Eduardo Huedo,
José M. Hernández,
Richard McClatchey,
Stephen Gowdy,
Simon Metson
Abstract:
Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies for the individual workflow processes or actors. Minimizing these latencies will improve the overall execution time of a workflow and thus lead to a more efficien…
▽ More
Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies for the individual workflow processes or actors. Minimizing these latencies will improve the overall execution time of a workflow and thus lead to a more efficient and robust processing environment. In this paper, we propose a pilot job based infrastructure that has intelligent data reuse and job execution strategies to minimize the scheduling, queuing, execution and data access latencies. The results have shown that significant improvements in the overall turnaround time of a workflow can be achieved with this approach. The proposed approach has been evaluated, first using the CMS Tier0 data processing workflow, and then simulating the workflows to evaluate its effectiveness in a controlled environment.
△ Less
Submitted 24 February, 2012;
originally announced February 2012.
-
Secure Iris Authentication Using Visual Cryptography
Authors:
P. S. Revenkar,
Anisa Anjum,
W. Z. Gandhare
Abstract:
Biometrics deal with automated methods of identifying a person or verifying the identity of a person based on physiological or behavioral characteristics. Visual cryptography is a secret sharing scheme where a secret image is encrypted into the shares which independently disclose no information about the original secret image. As biometric template are stored in the centralized database, due to se…
▽ More
Biometrics deal with automated methods of identifying a person or verifying the identity of a person based on physiological or behavioral characteristics. Visual cryptography is a secret sharing scheme where a secret image is encrypted into the shares which independently disclose no information about the original secret image. As biometric template are stored in the centralized database, due to security threats biometric template may be modified by attacker. If biometric template is altered authorized user will not be allowed to access the resource. To deal this issue visual cryptography schemes can be applied to secure the iris template. Visual cryptography provides great means for helping such security needs as well as extra layer of authentication.
△ Less
Submitted 10 April, 2010;
originally announced April 2010.
-
Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments
Authors:
Richard McClatchey,
Ashiq Anjum,
Heinz Stockinger,
Arshad Ali,
Ian Willers,
Michael Thomas
Abstract:
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may re…
▽ More
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computation and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
The Requirements for Ontologies in Medical Data Integration: A Case Study
Authors:
Ashiq Anjum,
Peter Bloodsworth,
Andrew Branson,
Tamas Hauer,
Richard McClatchey,
Kamran Munir,
Dmitry Rogulin,
Jetendr Shamdasani
Abstract:
Evidence-based medicine is critically dependent on three sources of information: a medical knowledge base, the patients medical record and knowledge of available resources, including where appropriate, clinical protocols. Patient data is often scattered in a variety of databases and may, in a distributed model, be held across several disparate repositories. Consequently addressing the needs of a…
▽ More
Evidence-based medicine is critically dependent on three sources of information: a medical knowledge base, the patients medical record and knowledge of available resources, including where appropriate, clinical protocols. Patient data is often scattered in a variety of databases and may, in a distributed model, be held across several disparate repositories. Consequently addressing the needs of an evidence-based medicine community presents issues of biomedical data integration, clinical interpretation and knowledge management. This paper outlines how the Health-e-Child project has approached the challenge of requirements specification for (bio-) medical data integration, from the level of cellular data, through disease to that of patient and population. The approach is illuminated through the requirements elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three diseases being studied in the EC-funded Health-e-Child project.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
PhantomOS: A Next Generation Grid Operating System
Authors:
Irfan Habib,
Kamran Soomro,
Ashiq Anjum,
Richard McClatchey,
Arshad Ali,
Peter Bloodsworth
Abstract:
Grid Computing has made substantial advances in the past decade; these are primarily due to the adoption of standardized Grid middleware. However Grid computing has not yet become pervasive because of some barriers that we believe have been caused by the adoption of middleware centric approaches. These barriers include: scant support for major types of applications such as interactive applicatio…
▽ More
Grid Computing has made substantial advances in the past decade; these are primarily due to the adoption of standardized Grid middleware. However Grid computing has not yet become pervasive because of some barriers that we believe have been caused by the adoption of middleware centric approaches. These barriers include: scant support for major types of applications such as interactive applications; lack of flexible, autonomic and scalable Grid architectures; lack of plug-and-play Grid computing and, most importantly, no straightforward way to setup and administer Grids. PhantomOS is a project which aims to address many of these barriers. Its goal is the creation of a user friendly pervasive Grid computing platform that facilitates the rapid deployment and easy maintenance of Grids whilst providing support for major types of applications on Grids of almost any topology. In this paper we present the detailed system architecture and an overview of its implementation.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
DIANA Scheduling Hierarchies for Optimizing Bulk Job Scheduling
Authors:
A. Anjum,
R. McClatchey,
H. Stockinger,
A. Ali,
I. Willers,
M. Thomas,
M. Sagheer,
K. Hasham,
O. Alvi
Abstract:
The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in h…
▽ More
The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-to-peer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for coping with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
Mobile Computing in Physics Analysis - An Indicator for eScience
Authors:
A. Ali,
A. Anjum,
T. Azim,
J. Bunn,
A. Ikram,
R. McClatchey,
H. Newman,
C. Steenberg,
M. Thomas,
I. Willers
Abstract:
This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld…
▽ More
This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld devices the capability to launch heavy computational tasks on computational and data Grids, monitor the jobs status during execution, and retrieve results after job completion. Users carry their jobs on their handheld devices in the form of executables (and associated libraries). Users can transparently view the status of their jobs and get back their outputs without having to know where they are being executed. In this way, our system is able to act as a high-throughput computing environment where devices ranging from powerful desktop machines to small handhelds can employ the power of the Grid. The results shown in this paper are readily applicable to the wider eScience community.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
A Multi Interface Grid Discovery System
Authors:
A. Ali,
A. Anjum,
J. Bunn,
F. Khan,
R. McClatchey,
H. Newman,
C. Steenberg,
M. Thomas,
Ian Willers
Abstract:
Discovery Systems (DS) can be considered as entry points for global loosely coupled distributed systems. An efficient Discovery System in essence increases the performance, reliability and decision making capability of distributed systems. With the rapid increase in scale of distributed applications, existing solutions for discovery systems are fast becoming either obsolete or incapable of handl…
▽ More
Discovery Systems (DS) can be considered as entry points for global loosely coupled distributed systems. An efficient Discovery System in essence increases the performance, reliability and decision making capability of distributed systems. With the rapid increase in scale of distributed applications, existing solutions for discovery systems are fast becoming either obsolete or incapable of handling such complexity. They are particularly ineffective when handling service lifetimes and providing up-to-date information, poor at enabling dynamic service access and they can also impose unwanted restrictions on interfaces to widely available information repositories. In this paper we present essential the design characteristics, an implementation and a performance analysis for a discovery system capable of overcoming these deficiencies in large, globally distributed environments.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.
-
Bulk Scheduling with the DIANA Scheduler
Authors:
Ashiq Anjum,
Richard McClatchey,
Arshad Ali,
Ian Willers
Abstract:
Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation a…
▽ More
Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient sing can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach.
△ Less
Submitted 8 August, 2006;
originally announced August 2006.
-
From Grid Middleware to a Grid Operating System
Authors:
Arshad Ali,
Richard McClatchey,
Ashiq Anjum,
Irfan Habib,
Kamran Soomro,
Mohammed Asif,
Ali Adil,
Athar Mohsin
Abstract:
Grid computing has made substantial advances during the last decade. Grid middleware such as Globus has contributed greatly in making this possible. There are, however, significant barriers to the adoption of Grid computing in other fields, most notably day-to-day user computing environments. We will demonstrate in this paper that this is primarily due to the limitations of the existing Grid mid…
▽ More
Grid computing has made substantial advances during the last decade. Grid middleware such as Globus has contributed greatly in making this possible. There are, however, significant barriers to the adoption of Grid computing in other fields, most notably day-to-day user computing environments. We will demonstrate in this paper that this is primarily due to the limitations of the existing Grid middleware which does not take into account the needs of everyday scientific and business users. In this paper we will formally advocate a Grid Operating System and propose an architecture to migrate Grid computing into a Grid operating system which we believe would help remove most of the technical barriers to the adoption of Grid computing and make it relevant to the day-to-day user. We believe this proposed transition to a Grid operating system will drive more pervasive Grid computing research and application development and deployment in future.
△ Less
Submitted 8 August, 2006;
originally announced August 2006.
-
Bulk Scheduling with DIANA Scheduler
Authors:
Ashiq Anjum,
Richard McClatchey,
Arshad Ali,
Ian Willers
Abstract:
Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necessa…
▽ More
Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necessarily can have significant bearing on the scheduling of data intensive applications. If the input or output files must be retrieved from a remote location, then the time required transferring the files must also be taken into consideration when scheduling compute resources for the given application. The central problem in this study is the coordinated management of computation and data at multiple locations and not simply data movement. However, this can be a very costly operation and efficient scheduling can be a challenge if compute and data resources are mapped without network cost. We have implemented an adaptive algorithm within the DIANA Scheduler which takes into account data location and size, network performance and computation capability to make efficient global scheduling decisions. DIANA is a performance-aware as well as an economy-guided Meta Scheduler. It iteratively allocates each job to the site that is likely to produce the best performance as well as optimizing the global queue for any remaining pending jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results suggest that considerable performance improvements are to be gained by adopting the DIANA scheduling approach.
△ Less
Submitted 7 February, 2006;
originally announced February 2006.
-
JClarens: A Java Framework for Developing and Deploying Web Services for Grid Computing
Authors:
Michael Thomas,
Conrad Steenberg,
Frank van Lingen,
Harvey Newman,
Julian Bunn,
Arshad Ali,
Richard McClatchey,
Ashiq Anjum,
Tahir Azim,
Waqas ur Rehman,
Faisal Khan,
Jang Uk In
Abstract:
High Energy Physics (HEP) and other scientific communities have adopted Service Oriented Architectures (SOA) as part of a larger Grid computing effort. This effort involves the integration of many legacy applications and programming libraries into a SOA framework. The Grid Analysis Environment (GAE) is such a service oriented architecture based on the Clarens Grid Services Framework and is being…
▽ More
High Energy Physics (HEP) and other scientific communities have adopted Service Oriented Architectures (SOA) as part of a larger Grid computing effort. This effort involves the integration of many legacy applications and programming libraries into a SOA framework. The Grid Analysis Environment (GAE) is such a service oriented architecture based on the Clarens Grid Services Framework and is being developed as part of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at European Laboratory for Particle Physics (CERN). Clarens provides a set of authorization, access control, and discovery services, as well as XMLRPC and SOAP access to all deployed services. Two implementations of the Clarens Web Services Framework (Python and Java) offer integration possibilities for a wide range of programming languages. This paper describes the Java implementation of the Clarens Web Services Framework called JClarens. and several web services of interest to the scientific and Grid community that have been deployed using JClarens.
△ Less
Submitted 11 April, 2005;
originally announced April 2005.
-
Heterogeneous Relational Databases for a Grid-enabled Analysis Environment
Authors:
Arshad Ali,
Ashiq Anjum,
Tahir Azim,
Julian Bunn,
Saima Iqbal,
Richard McClatchey,
Harvey Newman,
S. Yousaf Shah,
Tony Solomonides,
Conrad Steenberg,
Michael Thomas,
Frank van Lingen,
Ian Willers
Abstract:
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a…
▽ More
Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid.
△ Less
Submitted 10 April, 2005;
originally announced April 2005.
-
Resource Management Services for a Grid Analysis Environment
Authors:
Arshad Ali,
Ashiq Anjum,
Tahir Azim,
Julian Bunn,
Atif Mehmood,
Richard McClatchey,
Harvey Newman,
Waqas ur Rehman,
Conrad Steenberg,
Michael Thomas,
Frank van Lingen,
Ian Willers,
Muhammad Adeel Zafar
Abstract:
Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users…
▽ More
Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users little or no control over the entire process. To solve this problem, a more interactive set of services and middleware is desired that provides users more information about Grid weather, and gives them more control over the decision making process. This paper presents a set of services that have been developed to provide more interactive resource management capabilities within the Grid Analysis Environment (GAE) being developed collaboratively by Caltech, NUST and several other institutes. These include a steering service, a job monitoring service and an estimator service that have been designed and written using a common Grid-enabled Web Services framework named Clarens. The paper also presents a performance analysis of the developed services to show that they have indeed resulted in a more interactive and powerful system for user-centric Grid-enabled physics analysis.
△ Less
Submitted 10 April, 2005;
originally announced April 2005.
-
A Grid-enabled Interface to Condor for Interactive Analysis on Handheld and Resource-limited Devices
Authors:
Arshad Ali,
Ashiq Anjum,
Tahir Azim,
Julian Bunn,
Ahsan Ikram,
Richard McClatchey,
Harvey Newman,
Conrad Steenberg,
Michael Thomas,
Ian Willers
Abstract:
This paper was withdrawn by the authors.
This paper was withdrawn by the authors.
△ Less
Submitted 30 September, 2004; v1 submitted 5 July, 2004;
originally announced July 2004.
-
Distributed Analysis and Load Balancing System for Grid Enabled Analysis on Hand-held devices using Multi-Agents Systems
Authors:
Naveed Ahmad,
Arshad Ali,
Ashiq Anjum,
Tahir Azim,
Julian Bunn,
Ali Hassan,
Ahsan Ikram,
Frank van Lingen,
Richard McClatchey,
Harvey Newman,
Conrad Steenberg,
Michael Thomas,
Ian Willers
Abstract:
Handheld devices, while growing rapidly, are inherently constrained and lack the capability of executing resource hungry applications. This paper presents the design and implementation of distributed analysis and load-balancing system for hand-held devices using multi-agents system. This system enables low resource mobile handheld devices to act as potential clients for Grid enabled applications…
▽ More
Handheld devices, while growing rapidly, are inherently constrained and lack the capability of executing resource hungry applications. This paper presents the design and implementation of distributed analysis and load-balancing system for hand-held devices using multi-agents system. This system enables low resource mobile handheld devices to act as potential clients for Grid enabled applications and analysis environments. We propose a system, in which mobile agents will transport, schedule, execute and return results for heavy computational jobs submitted by handheld devices. Moreover, in this way, our system provides high throughput computing environment for hand-held devices.
△ Less
Submitted 5 July, 2004;
originally announced July 2004.
-
A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment
Authors:
Arshad Ali,
Ashiq Anjum,
Atif Mehmood,
Richard McClatchey,
Ian Willers,
Julian Bunn,
Harvey Newman,
Michael Thomas,
Conrad Steenberg
Abstract:
The concept of coupling geographically distributed resources for solving large scale problems is becoming increasingly popular forming what is popularly called grid computing. Management of resources in the Grid environment becomes complex as the resources are geographically distributed, heterogeneous in nature and owned by different individuals and organizations each having their own resource m…
▽ More
The concept of coupling geographically distributed resources for solving large scale problems is becoming increasingly popular forming what is popularly called grid computing. Management of resources in the Grid environment becomes complex as the resources are geographically distributed, heterogeneous in nature and owned by different individuals and organizations each having their own resource management policies and different access and cost models. There have been many projects that have designed and implemented the resource management systems with a variety of architectures and services. In this paper we have presented the general requirements that a Resource Management system should satisfy. The taxonomy has also been defined based on which survey of resource management systems in different existing Grid projects has been conducted to identify the key areas where these systems lack the desired functionality.
△ Less
Submitted 14 January, 2018; v1 submitted 5 July, 2004;
originally announced July 2004.