-
FRAME-C: A knowledge-augmented deep learning pipeline for classifying multi-electrode array electrophysiological signals
Authors:
Nisal Ranasinghe,
Dzung Do-Ha,
Simon Maksour,
Tamasha Malepathirana,
Sachith Seneviratne,
Lezanne Ooi,
Saman Halgamuge
Abstract:
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by motor neuron degeneration, with alterations in neural excitability serving as key indicators. Recent advancements in induced pluripotent stem cell (iPSC) technology have enabled the generation of human iPSC-derived neuronal cultures, which, when combined with multi-electrode array (MEA) electrophysiology, pr…
▽ More
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disorder characterized by motor neuron degeneration, with alterations in neural excitability serving as key indicators. Recent advancements in induced pluripotent stem cell (iPSC) technology have enabled the generation of human iPSC-derived neuronal cultures, which, when combined with multi-electrode array (MEA) electrophysiology, provide rich spatial and temporal electrophysiological data. Traditionally, MEA data is analyzed using handcrafted features based on potentially imperfect domain knowledge, which while useful may not fully capture all useful characteristics inherent in the data. Machine learning, particularly deep learning, has the potential to automatically learn relevant characteristics from raw data without solely relying on handcrafted feature extraction. However, handcrafted features remain critical for encoding domain knowledge and improving interpretability, especially with limited or noisy data. This study introduces FRAME-C, a knowledge-augmented machine learning pipeline that combines domain knowledge, raw spike waveform data, and deep learning techniques to classify MEA signals and identify ALS-specific phenotypes. FRAME-C leverages deep learning to learn important features from spike waveforms while incorporating handcrafted features such as spike amplitude, inter-spike interval, and spike duration, preserving key spatial and temporal information. We validate FRAME-C on both simulated and real MEA data from human iPSC-derived neuronal cultures, demonstrating superior performance over existing classification methods. FRAME-C shows over 11% improvement on real data and up to 25% on simulated data. We also show FRAME-C can evaluate handcrafted feature importance, providing insights into ALS phenotypes.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Graph-Eq: Discovering Mathematical Equations using Graph Generative Models
Authors:
Nisal Ranasinghe,
Damith Senanayake,
Saman Halgamuge
Abstract:
The ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to…
▽ More
The ability to discover meaningful, accurate, and concise mathematical equations that describe datasets is valuable across various domains. Equations offer explicit relationships between variables, enabling deeper insights into underlying data patterns. Most existing equation discovery methods rely on genetic programming, which iteratively searches the equation space but is often slow and prone to overfitting. By representing equations as directed acyclic graphs, we leverage the use of graph neural networks to learn the underlying semantics of equations, and generate new, previously unseen equations. Although graph generative models have been shown to be successful in discovering new types of graphs in many fields, there application in discovering equations remains largely unexplored. In this work, we propose Graph-EQ, a deep graph generative model designed for efficient equation discovery. Graph-EQ uses a conditional variational autoencoder (CVAE) to learn a rich latent representation of the equation space by training it on a large corpus of equations in an unsupervised manner. Instead of directly searching the equation space, we employ Bayesian optimization to efficiently explore this learned latent space. We show that the encoder-decoder architecture of Graph-Eq is able to accurately reconstruct input equations. Moreover, we show that the learned latent representation can be sampled and decoded into valid equations, including new and previously unseen equations in the training data. Finally, we assess Graph-Eq's ability to discover equations that best fit a dataset by exploring the latent space using Bayesian optimization. Latent space exploration is done on 20 dataset with known ground-truth equations, and Graph-Eq is shown to successfully discover the grountruth equation in the majority of datasets.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning
Authors:
Deshani Geethika Poddenige,
Sachith Seneviratne,
Damith Senanayake,
Mahesan Niranjan,
PN Suganthan,
Saman Halgamuge
Abstract:
Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational A…
▽ More
Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Unmasking the Unknown: Facial Deepfake Detection in the Open-Set Paradigm
Authors:
Nadarasar Bahavan,
Sanjay Saha,
Ken Chen,
Sachith Seneviratne,
Sanka Rasnayaka,
Saman Halgamuge
Abstract:
Facial forgery methods such as deepfakes can be misused for identity manipulation and spreading misinformation. They have evolved alongside advancements in generative AI, leading to new and more sophisticated forgery techniques that diverge from existing 'known' methods. Conventional deepfake detection methods use the closedset paradigm, thus limiting their applicability to detecting forgeries cre…
▽ More
Facial forgery methods such as deepfakes can be misused for identity manipulation and spreading misinformation. They have evolved alongside advancements in generative AI, leading to new and more sophisticated forgery techniques that diverge from existing 'known' methods. Conventional deepfake detection methods use the closedset paradigm, thus limiting their applicability to detecting forgeries created using methods that are not part of the training dataset. In this paper, we propose a shift from the closed-set paradigm for deepfake detection. In the open-set paradigm, models are designed not only to identify images created by known facial forgery methods but also to identify and flag those produced by previously unknown methods as 'unknown' and not as unforged/real/unmanipulated. In this paper, we propose an open-set deepfake classification algorithm based on supervised contrastive learning. The open-set paradigm used in our model allows it to function as a more robust tool capable of handling emerging and unseen deepfake techniques, enhancing reliability and confidence, and complementing forensic analysis. In open-set paradigm, we identify three groups including the "unknown group that is neither considered known deepfake nor real. We investigate deepfake open-set classification across three scenarios, classifying deepfakes from unknown methods not as real, distinguishing real images from deepfakes, and classifying deepfakes from known methods, using the FaceForensics++ dataset as a benchmark. Our method achieves state of the art results in the first two tasks and competitive results in the third task.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
SphOR: A Representation Learning Perspective on Open-set Recognition for Identifying Unknown Classes in Deep Learning Models
Authors:
Nadarasar Bahavan,
Sachith Seneviratne,
Saman Halgamuge
Abstract:
The widespread use of deep learning classifiers necessitates Open-set recognition (OSR), which enables the identification of input data not only from classes known during training but also from unknown classes that might be present in test data. Many existing OSR methods are computationally expensive due to the reliance on complex generative models or suffer from high training costs. We investigat…
▽ More
The widespread use of deep learning classifiers necessitates Open-set recognition (OSR), which enables the identification of input data not only from classes known during training but also from unknown classes that might be present in test data. Many existing OSR methods are computationally expensive due to the reliance on complex generative models or suffer from high training costs. We investigate OSR from a representation-learning perspective, specifically through spherical embeddings. We introduce SphOR, a computationally efficient representation learning method that models the feature space as a mixture of von Mises-Fisher distributions. This approach enables the use of semantically ambiguous samples during training, to improve the detection of samples from unknown classes. We further explore the relationship between OSR performance and key representation learning properties which influence how well features are structured in high-dimensional space. Extensive experiments on multiple OSR benchmarks demonstrate the effectiveness of our method, producing state-of-the-art results, with improvements up-to 6% that validate its performance. Code at https://github.com/nadarasarbahavan/SpHOR
△ Less
Submitted 19 March, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
TT-MPD: Test Time Model Pruning and Distillation
Authors:
Haihang Wu,
Wei Wang,
Tamasha Malepathirana,
Sachith Seneviratne,
Denny Oetomo,
Saman Halgamuge
Abstract:
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. However, access to the training data can be limited due to concerns such as data privacy and commercial confidentiality. Furthermore, with covariate shift (disparities be…
▽ More
Pruning can be an effective method of compressing large pre-trained models for inference speed acceleration. Previous pruning approaches rely on access to the original training dataset for both pruning and subsequent fine-tuning. However, access to the training data can be limited due to concerns such as data privacy and commercial confidentiality. Furthermore, with covariate shift (disparities between test and training data distributions), pruning and finetuning with training datasets can hinder the generalization of the pruned model to test data. To address these issues, pruning and finetuning the model with test time samples becomes essential. However, test-time model pruning and fine-tuning incur additional computation costs and slow down the model's prediction speed, thus posing efficiency issues. Existing pruning methods are not efficient enough for test time model pruning setting, since finetuning the pruned model is needed to evaluate the importance of removable components. To address this, we propose two variables to approximate the fine-tuned accuracy. We then introduce an efficient pruning method that considers the approximated finetuned accuracy and potential inference latency saving. To enhance fine-tuning efficiency, we propose an efficient knowledge distillation method that only needs to generate pseudo labels for a small set of finetuning samples one time, thereby reducing the expensive pseudo-label generation cost. Experimental results demonstrate that our method achieves a comparable or superior tradeoff between test accuracy and inference latency, with a 32% relative reduction in pruning and finetuning time compared to the best existing method.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Rethinking Time Series Forecasting with LLMs via Nearest Neighbor Contrastive Learning
Authors:
Jayanie Bogahawatte,
Sachith Seneviratne,
Maneesha Perera,
Saman Halgamuge
Abstract:
Adapting Large Language Models (LLMs) that are extensively trained on abundant text data, and customizing the input prompt to enable time series forecasting has received considerable attention. While recent work has shown great potential for adapting the learned prior of LLMs, the formulation of the prompt to finetune LLMs remains challenging as prompt should be aligned with time series data. Addi…
▽ More
Adapting Large Language Models (LLMs) that are extensively trained on abundant text data, and customizing the input prompt to enable time series forecasting has received considerable attention. While recent work has shown great potential for adapting the learned prior of LLMs, the formulation of the prompt to finetune LLMs remains challenging as prompt should be aligned with time series data. Additionally, current approaches do not effectively leverage word token embeddings which embody the rich representation space learned by LLMs. This emphasizes the need for a robust approach to formulate the prompt which utilizes the word token embeddings while effectively representing the characteristics of the time series. To address these challenges, we propose NNCL-TLLM: Nearest Neighbor Contrastive Learning for Time series forecasting via LLMs. First, we generate time series compatible text prototypes such that each text prototype represents both word token embeddings in its neighborhood and time series characteristics via end-to-end finetuning. Next, we draw inspiration from Nearest Neighbor Contrastive Learning to formulate the prompt while obtaining the top-$k$ nearest neighbor time series compatible text prototypes. We then fine-tune the layer normalization and positional embeddings of the LLM, keeping the other layers intact, reducing the trainable parameters and decreasing the computational cost. Our comprehensive experiments demonstrate that NNCL-TLLM outperforms in few-shot forecasting while achieving competitive or superior performance over the state-of-the-art methods in long-term and short-term forecasting tasks.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Distributed solar generation forecasting using attention-based deep neural networks for cloud movement prediction
Authors:
Maneesha Perera,
Julian De Hoog,
Kasun Bandara,
Saman Halgamuge
Abstract:
Accurate forecasts of distributed solar generation are necessary to reduce negative impacts resulting from the increased uptake of distributed solar photovoltaic (PV) systems. However, the high variability of solar generation over short time intervals (seconds to minutes) caused by cloud movement makes this forecasting task difficult. To address this, using cloud images, which capture the second-t…
▽ More
Accurate forecasts of distributed solar generation are necessary to reduce negative impacts resulting from the increased uptake of distributed solar photovoltaic (PV) systems. However, the high variability of solar generation over short time intervals (seconds to minutes) caused by cloud movement makes this forecasting task difficult. To address this, using cloud images, which capture the second-to-second changes in cloud cover affecting solar generation, has shown promise. Recently, deep neural networks with "attention" that focus on important regions of an image have been applied with success in many computer vision applications. However, their use for forecasting cloud movement has not yet been extensively explored. In this work, we propose an attention-based convolutional long short-term memory network to forecast cloud movement and apply an existing self-attention-based method previously proposed for video prediction to forecast cloud movement. We investigate and discuss the impact of cloud forecasts from attention-based methods towards forecasting distributed solar generation, compared to cloud forecasts from non-attention-based methods. We further provide insights into the different solar forecast performances that can be achieved for high and low altitude clouds. We find that for clouds at high altitudes, the cloud predictions obtained using attention-based methods result in solar forecast skill score improvements of 5.86% or more compared to non-attention-based methods.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
GINN-KAN: Interpretability pipelining with applications in Physics Informed Neural Networks
Authors:
Nisal Ranasinghe,
Yu Xia,
Sachith Seneviratne,
Saman Halgamuge
Abstract:
Neural networks are powerful function approximators, yet their ``black-box" nature often renders them opaque and difficult to interpret. While many post-hoc explanation methods exist, they typically fail to capture the underlying reasoning processes of the networks. A truly interpretable neural network would be trained similarly to conventional models using techniques such as backpropagation, but…
▽ More
Neural networks are powerful function approximators, yet their ``black-box" nature often renders them opaque and difficult to interpret. While many post-hoc explanation methods exist, they typically fail to capture the underlying reasoning processes of the networks. A truly interpretable neural network would be trained similarly to conventional models using techniques such as backpropagation, but additionally provide insights into the learned input-output relationships. In this work, we introduce the concept of interpretability pipelineing, to incorporate multiple interpretability techniques to outperform each individual technique. To this end, we first evaluate several architectures that promise such interpretability, with a particular focus on two recent models selected for their potential to incorporate interpretability into standard neural network architectures while still leveraging backpropagation: the Growing Interpretable Neural Network (GINN) and Kolmogorov Arnold Networks (KAN). We analyze the limitations and strengths of each and introduce a novel interpretable neural network GINN-KAN that synthesizes the advantages of both models. When tested on the Feynman symbolic regression benchmark datasets, GINN-KAN outperforms both GINN and KAN. To highlight the capabilities and the generalizability of this approach, we position GINN-KAN as an alternative to conventional black-box networks in Physics-Informed Neural Networks (PINNs). We expect this to have far-reaching implications in the application of deep learning pipelines in the natural sciences. Our experiments with this interpretable PINN on 15 different partial differential equations demonstrate that GINN-KAN augmented PINNs outperform PINNs with black-box networks in solving differential equations and surpass the capabilities of both GINN and KAN.
△ Less
Submitted 28 August, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions
Authors:
Amanda Jayanetti,
Saman Halgamuge,
Rajkumar Buyya
Abstract:
Deep Reinforcement Learning (DRL) techniques have been successfully applied for solving complex decision-making and control tasks in multiple fields including robotics, autonomous driving, healthcare and natural language processing. The ability of DRL agents to learn from experience and utilize real-time data for making decisions makes it an ideal candidate for dealing with the complexities associ…
▽ More
Deep Reinforcement Learning (DRL) techniques have been successfully applied for solving complex decision-making and control tasks in multiple fields including robotics, autonomous driving, healthcare and natural language processing. The ability of DRL agents to learn from experience and utilize real-time data for making decisions makes it an ideal candidate for dealing with the complexities associated with the problem of workflow scheduling in highly dynamic cloud and edge computing environments. Despite the benefits of DRL, there are multiple challenges associated with the application of DRL techniques including multi-objectivity, curse of dimensionality, partial observability and multi-agent coordination. In this paper, we comprehensively analyze the challenges and opportunities associated with the design and implementation of DRL oriented solutions for workflow scheduling in cloud and edge computing environments. Based on the identified characteristics, we propose a taxonomy of workflow scheduling with DRL. We map reviewed works with respect to the taxonomy to identify their strengths and weaknesses. Based on taxonomy driven analysis, we propose novel future research directions for the field.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
A Deep Reinforcement Learning Approach for Cost Optimized Workflow Scheduling in Cloud Computing Environments
Authors:
Amanda Jayanetti,
Saman Halgamuge,
Rajkumar Buyya
Abstract:
Cost optimization is a common goal of workflow schedulers operating in cloud computing environments. The use of spot instances is a potential means of achieving this goal, as they are offered by cloud providers at discounted prices compared to their on-demand counterparts in exchange for reduced reliability. This is due to the fact that spot instances are subjected to interruptions when spare comp…
▽ More
Cost optimization is a common goal of workflow schedulers operating in cloud computing environments. The use of spot instances is a potential means of achieving this goal, as they are offered by cloud providers at discounted prices compared to their on-demand counterparts in exchange for reduced reliability. This is due to the fact that spot instances are subjected to interruptions when spare computing capacity used for provisioning them is needed back owing to demand variations. Also, the prices of spot instances are not fixed as pricing is dependent on long term supply and demand. The possibility of interruptions and pricing variations associated with spot instances adds a layer of uncertainty to the general problem of workflow scheduling across cloud computing environments. These challenges need to be efficiently addressed for enjoying the cost savings achievable with the use of spot instances without compromising the underlying business requirements. To this end, in this paper we use Deep Reinforcement Learning for developing an autonomous agent capable of scheduling workflows in a cost efficient manner by using an intelligent mix of spot and on-demand instances. The proposed solution is implemented in the open source container native Argo workflow engine that is widely used for executing industrial workflows. The results of the experiments demonstrate that the proposed scheduling method is capable of outperforming the current benchmarks.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
AniFaceDiff: Animating Stylized Avatars via Parametric Conditioned Diffusion Models
Authors:
Ken Chen,
Sachith Seneviratne,
Wei Wang,
Dongting Hu,
Sanjay Saha,
Md. Tarek Hasan,
Sanka Rasnayaka,
Tamasha Malepathirana,
Mingming Gong,
Saman Halgamuge
Abstract:
Animating stylized avatars with dynamic poses and expressions has attracted increasing attention for its broad range of applications. Previous research has made significant progress by training controllable generative models to synthesize animations based on reference characteristics, pose, and expression conditions. However, the mechanisms used in these methods to control pose and expression ofte…
▽ More
Animating stylized avatars with dynamic poses and expressions has attracted increasing attention for its broad range of applications. Previous research has made significant progress by training controllable generative models to synthesize animations based on reference characteristics, pose, and expression conditions. However, the mechanisms used in these methods to control pose and expression often inadvertently introduce unintended features from the target motion, while also causing a loss of expression-related details, particularly when applied to stylized animation. This paper proposes a new method based on Stable Diffusion, called AniFaceDiff, incorporating a new conditioning module for animating stylized avatars. First, we propose a refined spatial conditioning approach by Facial Alignment to prevent the inclusion of identity characteristics from the target motion. Then, we introduce an Expression Adapter that incorporates additional cross-attention layers to address the potential loss of expression-related information. Our approach effectively preserves pose and expression from the target video while maintaining input image consistency. Extensive experiments demonstrate that our method achieves state-of-the-art results, showcasing superior image quality, preservation of reference features, and expression accuracy, particularly for out-of-domain animation across diverse styles, highlighting its versatility and strong generalization capabilities. This work aims to enhance the quality of virtual stylized animation for positive applications. To promote responsible use in virtual environments, we contribute to the advancement of detection for generative content by evaluating state-of-the-art detectors, highlighting potential areas for improvement, and suggesting solutions.
△ Less
Submitted 2 December, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Authors:
Rashindrie Perera,
Saman Halgamuge
Abstract:
In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address ov…
▽ More
In this paper, we look at cross-domain few-shot classification which presents the challenging task of learning new classes in previously unseen domains with few labelled examples. Existing methods, though somewhat effective, encounter several limitations, which we alleviate through two significant improvements. First, we introduce a lightweight parameter-efficient adaptation strategy to address overfitting associated with fine-tuning a large number of parameters on small datasets. This strategy employs a linear transformation of pre-trained features, significantly reducing the trainable parameter count. Second, we replace the traditional nearest centroid classifier with a discriminative sample-aware loss function, enhancing the model's sensitivity to the inter- and intra-class variances within the training set for improved clustering in feature space. Empirical evaluations on the Meta-Dataset benchmark showcase that our approach not only improves accuracy up to 7.7\% and 5.3\% on previously seen and unseen datasets, respectively, but also achieves the above performance while being at least $\sim3\times$ more parameter-efficient than existing methods, establishing a new state-of-the-art in cross-domain few-shot learning. Our code is available at https://github.com/rashindrie/DIPA.
△ Less
Submitted 3 April, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Day-ahead regional solar power forecasting with hierarchical temporal convolutional neural networks using historical power generation and weather data
Authors:
Maneesha Perera,
Julian De Hoog,
Kasun Bandara,
Damith Senanayake,
Saman Halgamuge
Abstract:
Regional solar power forecasting, which involves predicting the total power generation from all rooftop photovoltaic systems in a region holds significant importance for various stakeholders in the energy sector. However, the vast amount of solar power generation and weather time series from geographically dispersed locations that need to be considered in the forecasting process makes accurate reg…
▽ More
Regional solar power forecasting, which involves predicting the total power generation from all rooftop photovoltaic systems in a region holds significant importance for various stakeholders in the energy sector. However, the vast amount of solar power generation and weather time series from geographically dispersed locations that need to be considered in the forecasting process makes accurate regional forecasting challenging. Therefore, previous work has limited the focus to either forecasting a single time series (i.e., aggregated time series) which is the addition of all solar generation time series in a region, disregarding the location-specific weather effects or forecasting solar generation time series of each PV site (i.e., individual time series) independently using location-specific weather data, resulting in a large number of forecasting models. In this work, we propose two deep-learning-based regional forecasting methods that can effectively leverage both types of time series (aggregated and individual) with weather data in a region. We propose two hierarchical temporal convolutional neural network architectures (HTCNN) and two strategies to adapt HTCNNs for regional solar power forecasting. At first, we explore generating a regional forecast using a single HTCNN. Next, we divide the region into multiple sub-regions based on weather information and train separate HTCNNs for each sub-region; the forecasts of each sub-region are then added to generate a regional forecast. The proposed work is evaluated using a large dataset collected over a year from 101 locations across Western Australia to provide a day ahead forecast. We compare our approaches with well-known alternative methods and show that the sub-region HTCNN requires fewer individual networks and achieves a forecast skill score of 40.2% reducing a statistically significant error by 6.5% compared to the best counterpart.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks
Authors:
Haihang Wu,
Wei Wang,
Tamasha Malepathirana,
Damith Senanayake,
Denny Oetomo,
Saman Halgamuge
Abstract:
Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced…
▽ More
Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced by the chosen policy for growth timing. While this regularization effect may mitigate the overfitting risk of the model, it may lead to a notable accuracy drop when the model underfits. Yet, current approaches have not addressed this issue due to their lack of consideration of the regularization effect from neural growth. Motivated by these findings, we propose an under/over fitting risk-aware growth timing policy, which automatically adjusts the growth timing informed by the level of potential under/overfitting risks to address both risks. Comprehensive experiments conducted using CIFAR-10/100 and ImageNet datasets show that the proposed policy achieves accuracy improvements of up to 1.3% in models prone to underfitting while achieving similar accuracies in models suffering from overfitting compared to the existing methods.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
GINN-LP: A Growing Interpretable Neural Network for Discovering Multivariate Laurent Polynomial Equations
Authors:
Nisal Ranasinghe,
Damith Senanayake,
Sachith Seneviratne,
Malin Premaratne,
Saman Halgamuge
Abstract:
Traditional machine learning is generally treated as a black-box optimization problem and does not typically produce interpretable functions that connect inputs and outputs. However, the ability to discover such interpretable functions is desirable. In this work, we propose GINN-LP, an interpretable neural network to discover the form and coefficients of the underlying equation of a dataset, when…
▽ More
Traditional machine learning is generally treated as a black-box optimization problem and does not typically produce interpretable functions that connect inputs and outputs. However, the ability to discover such interpretable functions is desirable. In this work, we propose GINN-LP, an interpretable neural network to discover the form and coefficients of the underlying equation of a dataset, when the equation is assumed to take the form of a multivariate Laurent Polynomial. This is facilitated by a new type of interpretable neural network block, named the "power-term approximator block", consisting of logarithmic and exponential activation functions. GINN-LP is end-to-end differentiable, making it possible to use backpropagation for training. We propose a neural network growth strategy that will enable finding the suitable number of terms in the Laurent polynomial that represents the data, along with sparsity regularization to promote the discovery of concise equations. To the best of our knowledge, this is the first model that can discover arbitrary multivariate Laurent polynomial terms without any prior information on the order. Our approach is first evaluated on a subset of data used in SRBench, a benchmark for symbolic regression. We first show that GINN-LP outperforms the state-of-the-art symbolic regression methods on datasets generated using 48 real-world equations in the form of multivariate Laurent polynomials. Next, we propose an ensemble method that combines our method with a high-performing symbolic regression method, enabling us to discover non-Laurent polynomial equations. We achieve state-of-the-art results in equation discovery, showing an absolute improvement of 7.1% over the best contender, by applying this ensemble method to 113 datasets within SRBench with known ground-truth equations.
△ Less
Submitted 14 February, 2024; v1 submitted 17 December, 2023;
originally announced December 2023.
-
NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization for Continual Learning
Authors:
Tamasha Malepathirana,
Damith Senanayake,
Saman Halgamuge
Abstract:
Catastrophic forgetting; the loss of old knowledge upon acquiring new knowledge, is a pitfall faced by deep neural networks in real-world applications. Many prevailing solutions to this problem rely on storing exemplars (previously encountered data), which may not be feasible in applications with memory limitations or privacy constraints. Therefore, the recent focus has been on Non-Exemplar based…
▽ More
Catastrophic forgetting; the loss of old knowledge upon acquiring new knowledge, is a pitfall faced by deep neural networks in real-world applications. Many prevailing solutions to this problem rely on storing exemplars (previously encountered data), which may not be feasible in applications with memory limitations or privacy constraints. Therefore, the recent focus has been on Non-Exemplar based Class Incremental Learning (NECIL) where a model incrementally learns about new classes without using any past exemplars. However, due to the lack of old data, NECIL methods struggle to discriminate between old and new classes causing their feature representations to overlap. We propose NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization, a framework that reduces this class overlap in NECIL. We draw inspiration from Neural Gas to learn the topological relationships in the feature space, identifying the neighboring classes that are most likely to get confused with each other. This neighborhood information is utilized to enforce strong separation between the neighboring classes as well as to generate old class representative prototypes that can better aid in obtaining a discriminative decision boundary between old and new classes. Our comprehensive experiments on CIFAR-100, TinyImageNet, and ImageNet-Subset demonstrate that NAPA-VQ outperforms the State-of-the-art NECIL methods by an average improvement of 5%, 2%, and 4% in accuracy and 10%, 3%, and 9% in forgetting respectively. Our code can be found in https://github.com/TamashaM/NAPA-VQ.git.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Undercover Deepfakes: Detecting Fake Segments in Videos
Authors:
Sanjay Saha,
Rashindrie Perera,
Sachith Seneviratne,
Tamasha Malepathirana,
Sanka Rasnayaka,
Deshani Geethika,
Terence Sim,
Saman Halgamuge
Abstract:
The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such…
▽ More
The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such generative techniques creates a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth. This paradigm has been under-explored by the current deepfake detection methods in the academic literature. In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels. To facilitate testing our method, we prepared a new benchmark dataset where videos have both real and fake frame sequences with very subtle transitions. We provide a benchmark on the proposed dataset with our detection method which utilizes the Vision Transformer based on Scaling and Shifting to learn spatial features, and a Timeseries Transformer to learn temporal features of the videos to help facilitate the interpretation of possible deepfakes. Extensive experiments on a variety of deepfake generation methods show excellent results by the proposed method on temporal segmentation and classical video-level predictions as well. In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes. All experiments can be reproduced at: github.com/rgb91/temporal-deepfake-segmentation.
△ Less
Submitted 24 August, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Interpretability and accessibility of machine learning in selected food processing, agriculture and health applications
Authors:
N. Ranasinghe,
A. Ramanan,
S. Fernando,
P. N. Hameed,
D. Herath,
T. Malepathirana,
P. Suganthan,
M. Niranjan,
S. Halgamuge
Abstract:
Artificial Intelligence (AI) and its data-centric branch of machine learning (ML) have greatly evolved over the last few decades. However, as AI is used increasingly in real world use cases, the importance of the interpretability of and accessibility to AI systems have become major research areas. The lack of interpretability of ML based systems is a major hindrance to widespread adoption of these…
▽ More
Artificial Intelligence (AI) and its data-centric branch of machine learning (ML) have greatly evolved over the last few decades. However, as AI is used increasingly in real world use cases, the importance of the interpretability of and accessibility to AI systems have become major research areas. The lack of interpretability of ML based systems is a major hindrance to widespread adoption of these powerful algorithms. This is due to many reasons including ethical and regulatory concerns, which have resulted in poorer adoption of ML in some areas. The recent past has seen a surge in research on interpretable ML. Generally, designing a ML system requires good domain understanding combined with expert knowledge. New techniques are emerging to improve ML accessibility through automated model design. This paper provides a review of the work done to improve interpretability and accessibility of machine learning in the context of global problems while also being relevant to developing countries. We review work under multiple levels of interpretability including scientific and mathematical interpretation, statistical interpretation and partial semantic interpretation. This review includes applications in three areas, namely food processing, agriculture and health.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Multi-Resolution, Multi-Horizon Distributed Solar PV Power Forecasting with Forecast Combinations
Authors:
Maneesha Perera,
Julian De Hoog,
Kasun Bandara,
Saman Halgamuge
Abstract:
Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increasing rate. This can cause major impacts on distribution networks and energy markets. As a result, there is a significant need for improved forecasting of the power generation of these systems at different time resolutions and horizons. However, the performance of forecasting models depends on the resolu…
▽ More
Distributed, small-scale solar photovoltaic (PV) systems are being installed at a rapidly increasing rate. This can cause major impacts on distribution networks and energy markets. As a result, there is a significant need for improved forecasting of the power generation of these systems at different time resolutions and horizons. However, the performance of forecasting models depends on the resolution and horizon. Forecast combinations (ensembles), that combine the forecasts of multiple models into a single forecast may be robust in such cases. Therefore, in this paper, we provide comparisons and insights into the performance of five state-of-the-art forecast models and existing forecast combinations at multiple resolutions and horizons. We propose a forecast combination approach based on particle swarm optimization (PSO) that will enable a forecaster to produce accurate forecasts for the task at hand by weighting the forecasts produced by individual models. Furthermore, we compare the performance of the proposed combination approach with existing forecast combination approaches. A comprehensive evaluation is conducted using a real-world residential PV power data set measured at 25 houses located in three locations in the United States. The results across four different resolutions and four different horizons show that the PSO-based forecast combination approach outperforms the use of any individual forecast model and other forecast combination counterparts, with an average Mean Absolute Scaled Error reduction by 3.81% compared to the best performing individual model. Our approach enables a solar forecaster to produce accurate forecasts for their application regardless of the forecast resolution or horizon.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
An Incentive-compatible Energy Trading Framework for Neighborhood Area Networks with Shared Energy Storage
Authors:
Chathurika P. Mediwaththe,
Marnie Shaw,
Saman Halgamuge,
David B. Smith,
Paul Scott
Abstract:
Here, a novel energy trading system is proposed for demand-side management of a neighborhood area network (NAN) consisting of a shared energy storage (SES) provider, users with non-dispatchable energy generation, and an electricity retailer. In a leader-follower Stackelberg game, the SES provider first maximizes their revenue by setting a price signal and trading energy with the grid. Then, by fol…
▽ More
Here, a novel energy trading system is proposed for demand-side management of a neighborhood area network (NAN) consisting of a shared energy storage (SES) provider, users with non-dispatchable energy generation, and an electricity retailer. In a leader-follower Stackelberg game, the SES provider first maximizes their revenue by setting a price signal and trading energy with the grid. Then, by following the SES provider's actions, the retailer minimizes social cost for the users, i.e., the sum of the total users' cost when they interact with the SES and the total cost for supplying grid energy to the users. A pricing strategy, which incorporates mechanism design, is proposed to make the system incentive-compatible by rewarding users who disclose true energy usage information. A unique Stackelberg equilibrium is achieved where the SES provider's revenue is maximized and the user-level social cost is minimized, which also rewards the retailer. A case study with realistic energy demand and generation data demonstrates 28\%~-~45\% peak demand reduction of the NAN, depending on the number of participating users, compared to a system without SES. Simulation results confirm that the retailer can also benefit financially, in addition to the SES provider and the users.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Self Organizing Nebulous Growths for Robust and Incremental Data Visualization
Authors:
Damith Senanayake,
Wei Wang,
Shalin H. Naik,
Saman Halgamuge
Abstract:
Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in providing visualizations for datasets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present Self-Organizing Nebulous Growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data…
▽ More
Non-parametric dimensionality reduction techniques, such as t-SNE and UMAP, are proficient in providing visualizations for datasets of fixed sizes. However, they cannot incrementally map and insert new data points into an already provided data visualization. We present Self-Organizing Nebulous Growths (SONG), a parametric nonlinear dimensionality reduction technique that supports incremental data visualization, i.e., incremental addition of new data while preserving the structure of the existing visualization. In addition, SONG is capable of handling new data increments, no matter whether they are similar or heterogeneous to the already observed data distribution. We test SONG on a variety of real and simulated datasets. The results show that SONG is superior to Parametric t-SNE, t-SNE and UMAP in incremental data visualization. Specifically, for heterogeneous increments, SONG improves over Parametric t-SNE by 14.98 % on the Fashion MNIST dataset and 49.73% on the MNIST dataset regarding the cluster quality measured by the Adjusted Mutual Information scores. On similar or homogeneous increments, the improvements are 8.36% and 42.26% respectively. Furthermore, even when the above datasets are presented all at once, SONG performs better or comparable to UMAP, and superior to t-SNE. We also demonstrate that the algorithmic foundations of SONG render it more tolerant to noise compared to UMAP and t-SNE, thus providing greater utility for data with high variance, high mixing of clusters, or noise.
△ Less
Submitted 1 October, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
The Fast Heuristic Algorithms and Post-Processing Techniques to Design Large and Low-Cost Communication Networks
Authors:
Yahui Sun,
Marcus Brazil,
Doreen Thomas,
Saman Halgamuge
Abstract:
It is challenging to design large and low-cost communication networks. In this paper, we formulate this challenge as the prize-collecting Steiner Tree Problem (PCSTP). The objective is to minimize the costs of transmission routes and the disconnected monetary or informational profits. Initially, we note that the PCSTP is MAX SNP-hard. Then, we propose some post-processing techniques to improve sub…
▽ More
It is challenging to design large and low-cost communication networks. In this paper, we formulate this challenge as the prize-collecting Steiner Tree Problem (PCSTP). The objective is to minimize the costs of transmission routes and the disconnected monetary or informational profits. Initially, we note that the PCSTP is MAX SNP-hard. Then, we propose some post-processing techniques to improve suboptimal solutions to PCSTP. Based on these techniques, we propose two fast heuristic algorithms: the first one is a quasilinear time heuristic algorithm that is faster and consumes less memory than other algorithms; and the second one is an improvement of a stateof-the-art polynomial time heuristic algorithm that can find high-quality solutions at a speed that is only inferior to the first one. We demonstrate the competitiveness of our heuristic algorithms by comparing them with the state-of-the-art ones on the largest existing benchmark instances (169 800 vertices and 338 551 edges). Moreover, we generate new instances that are even larger (1 000 000 vertices and 10 000 000 edges) to further demonstrate their advantages in large networks. The state-ofthe-art algorithms are too slow to find high-quality solutions for instances of this size, whereas our new heuristic algorithms can do this in around 6 to 45s on a personal computer. Ultimately, we apply our post-processing techniques to update the bestknown solution for a notoriously difficult benchmark instance to show that they can improve near-optimal solutions to PCSTP. In conclusion, we demonstrate the usefulness of our heuristic algorithms and post-processing techniques for designing large and low-cost communication networks.
△ Less
Submitted 8 January, 2019;
originally announced February 2019.
-
Improving MMD-GAN Training with Repulsive Loss Function
Authors:
Wei Wang,
Yuan Sun,
Saman Halgamuge
Abstract:
Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning o…
▽ More
Generative adversarial nets (GANs) are widely used to learn the data sampling process and their performance may heavily depend on the loss functions, given a limited computational budget. This study revisits MMD-GAN that uses the maximum mean discrepancy (MMD) as the loss function for GAN and makes two contributions. First, we argue that the existing MMD loss function may discourage the learning of fine details in data as it attempts to contract the discriminator outputs of real data. To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. Second, inspired by the hinge loss, we propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the repulsive loss function. The proposed methods are applied to the unsupervised image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets. Results show that the repulsive loss function significantly improves over the MMD loss at no additional computational cost and outperforms other representative loss functions. The proposed methods achieve an FID score of 16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral normalization.
△ Less
Submitted 8 February, 2019; v1 submitted 24 December, 2018;
originally announced December 2018.