-
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Authors:
Nam V. Nguyen,
Huy Nguyen,
Quang Pham,
Van Nguyen,
Savitha Ramasamy,
Nhat Ho
Abstract:
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel…
▽ More
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel mechanism to route tokens to experts with the highest neural response. Theoretically, we show that the competition mechanism enjoys a better sample efficiency than the traditional softmax routing. Furthermore, we develop CompeteSMoE, a simple yet effective algorithm to train large language models by deploying a router to learn the competition policy, thus enjoying strong performances at a low training overhead. Our extensive empirical evaluations on both the visual instruction tuning and language pre-training tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies. We have made the implementation available at: https://github.com/Fsoft-AIC/CompeteSMoE. This work is an improved version of the previous study at arXiv:2402.02526
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
How to Coordinate UAVs and UGVs for Efficient Mission Planning? Optimizing Energy-Constrained Cooperative Routing with a DRL Framework
Authors:
Md Safwan Mondal,
Subramanian Ramasamy,
Luca Russo,
James D. Humann,
James M. Dotterweich,
Pranav Bhounsule
Abstract:
Efficient mission planning for cooperative systems involving Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) requires addressing energy constraints, scalability, and coordination challenges between agents. UAVs excel in rapidly covering large areas but are constrained by limited battery life, while UGVs, with their extended operational range and capability to serve as mobile re…
▽ More
Efficient mission planning for cooperative systems involving Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) requires addressing energy constraints, scalability, and coordination challenges between agents. UAVs excel in rapidly covering large areas but are constrained by limited battery life, while UGVs, with their extended operational range and capability to serve as mobile recharging stations, are hindered by slower speeds. This heterogeneity makes coordination between UAVs and UGVs critical for achieving optimal mission outcomes. In this work, we propose a scalable deep reinforcement learning (DRL) framework to address the energy-constrained cooperative routing problem for multi-agent UAV-UGV teams, aiming to visit a set of task points in minimal time with UAVs relying on UGVs for recharging during the mission. The framework incorporates sortie-wise agent switching to efficiently manage multiple agents, by allocating task points and coordinating actions. Using an encoder-decoder transformer architecture, it optimizes routes and recharging rendezvous for the UAV-UGV team in the task scenario. Extensive computational experiments demonstrate the framework's superior performance over heuristic methods and a DRL baseline, delivering significant improvements in solution quality and runtime efficiency across diverse scenarios. Generalization studies validate its robustness, while dynamic scenario highlights its adaptability to real-time changes with a case study. This work advances UAV-UGV cooperative routing by providing a scalable, efficient, and robust solution for multi-agent mission planning.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Training Video Foundation Models with NVIDIA NeMo
Authors:
Zeeshan Patel,
Ethan He,
Parth Mannan,
Xiaowei Ren,
Ryan Wolf,
Niket Agarwal,
Jacob Huffman,
Zhuoyao Wang,
Carl Wang,
Jack Chang,
Yan Bai,
Tommy Huang,
Linnan Wang,
Sahil Jain,
Shanmugam Ramasamy,
Joseph Jennings,
Ekaterina Sirazitdinova,
Oleg Sudakov,
Mingyuan Ma,
Bobby Chen,
Forrest Lin,
Hao Wang,
Vasanth Rao Naik Sabavat,
Sriharsha Niverty,
Rong Ou
, et al. (4 additional authors not shown)
Abstract:
Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul…
▽ More
Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, multimodal data loading, and parallelized video diffusion model training and inference. We also provide a comprehensive performance analysis highlighting best practices for efficient VFM training and inference.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Sequence Transferability and Task Order Selection in Continual Learning
Authors:
Thinh Nguyen,
Cuong N. Nguyen,
Quang Pham,
Binh T. Nguyen,
Savitha Ramasamy,
Xiaoli Li,
Cuong V. Nguyen
Abstract:
In continual learning, understanding the properties of task sequences and their relationships to model performance is important for developing advanced algorithms with better accuracy. However, efforts in this direction remain underdeveloped despite encouraging progress in methodology development. In this work, we investigate the impacts of sequence transferability on continual learning and propos…
▽ More
In continual learning, understanding the properties of task sequences and their relationships to model performance is important for developing advanced algorithms with better accuracy. However, efforts in this direction remain underdeveloped despite encouraging progress in methodology development. In this work, we investigate the impacts of sequence transferability on continual learning and propose two novel measures that capture the total transferability of a task sequence, either in the forward or backward direction. Based on the empirical properties of these measures, we then develop a new method for the task order selection problem in continual learning. Our method can be shown to offer a better performance than the conventional strategy of random task selection.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Deep Reinforcement Learning Enabled Persistent Surveillance with Energy-Aware UAV-UGV Systems for Disaster Management Applications
Authors:
Md Safwan Mondal,
Subramanian Ramasamy,
Pranav Bhounsule
Abstract:
Integrating Unmanned Aerial Vehicles (UAVs) with Unmanned Ground Vehicles (UGVs) provides an effective solution for persistent surveillance in disaster management. UAVs excel at covering large areas rapidly, but their range is limited by battery capacity. UGVs, though slower, can carry larger batteries for extended missions. By using UGVs as mobile recharging stations, UAVs can extend mission dura…
▽ More
Integrating Unmanned Aerial Vehicles (UAVs) with Unmanned Ground Vehicles (UGVs) provides an effective solution for persistent surveillance in disaster management. UAVs excel at covering large areas rapidly, but their range is limited by battery capacity. UGVs, though slower, can carry larger batteries for extended missions. By using UGVs as mobile recharging stations, UAVs can extend mission duration through periodic refueling, leveraging the complementary strengths of both systems. To optimize this energy-aware UAV-UGV cooperative routing problem, we propose a planning framework that determines optimal routes and recharging points between a UAV and a UGV. Our solution employs a deep reinforcement learning (DRL) framework built on an encoder-decoder transformer architecture with multi-head attention mechanisms. This architecture enables the model to sequentially select actions for visiting mission points and coordinating recharging rendezvous between the UAV and UGV. The DRL model is trained to minimize the age periods (the time gap between consecutive visits) of mission points, ensuring effective surveillance. We evaluate the framework across various problem sizes and distributions, comparing its performance against heuristic methods and an existing learning-based model. Results show that our approach consistently outperforms these baselines in both solution quality and runtime. Additionally, we demonstrate the DRL policy's applicability in a real-world disaster scenario as a case study and explore its potential for online mission planning to handle dynamic changes. Adapting the DRL policy for priority-driven surveillance highlights the model's generalizability for real-time disaster response.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning
Authors:
Muhammad Anwar Ma'sum,
Mahardhika Pratama,
Savitha Ramasamy,
Lin Liu,
Habibullah Habibullah,
Ryszard Kowalczyk
Abstract:
Federated Class Incremental Learning (FCIL) is a new direction in continual learning (CL) for addressing catastrophic forgetting and non-IID data distribution simultaneously. Existing FCIL methods call for high communication costs and exemplars from previous classes. We propose a novel rehearsal-free method for FCIL named prototypes-injected prompt (PIP) that involves 3 main ideas: a) prototype in…
▽ More
Federated Class Incremental Learning (FCIL) is a new direction in continual learning (CL) for addressing catastrophic forgetting and non-IID data distribution simultaneously. Existing FCIL methods call for high communication costs and exemplars from previous classes. We propose a novel rehearsal-free method for FCIL named prototypes-injected prompt (PIP) that involves 3 main ideas: a) prototype injection on prompt learning, b) prototype augmentation, and c) weighted Gaussian aggregation on the server side. Our experiment result shows that the proposed method outperforms the current state of the arts (SOTAs) with a significant improvement (up to 33%) in CIFAR100, MiniImageNet and TinyImageNet datasets. Our extensive analysis demonstrates the robustness of PIP in different task sizes, and the advantage of requiring smaller participating local clients, and smaller global rounds. For further study, source codes of PIP, baseline, and experimental logs are shared publicly in https://github.com/anwarmaxsum/PIP.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Prompt-Based Spatio-Temporal Graph Transfer Learning
Authors:
Junfeng Hu,
Xu Liu,
Zhencheng Fan,
Yifang Yin,
Shili Xiang,
Savitha Ramasamy,
Roger Zimmermann
Abstract:
Spatio-temporal graph neural networks have proven efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. Yet, their performance is constrained by the reliance on extensive data for training on a specific task, thereby limiting their adaptability to new urban domains with varied task demands. Although transfer learning has been proposed to remedy this…
▽ More
Spatio-temporal graph neural networks have proven efficacy in capturing complex dependencies for urban computing tasks such as forecasting and kriging. Yet, their performance is constrained by the reliance on extensive data for training on a specific task, thereby limiting their adaptability to new urban domains with varied task demands. Although transfer learning has been proposed to remedy this problem by leveraging knowledge across domains, the cross-task generalization still remains under-explored in spatio-temporal graph transfer learning due to the lack of a unified framework. To bridge the gap, we propose Spatio-Temporal Graph Prompting (STGP), a prompt-based framework capable of adapting to multi-diverse tasks in a data-scarce domain. Specifically, we first unify different tasks into a single template and introduce a task-agnostic network architecture that aligns with this template. This approach enables capturing dependencies shared across tasks. Furthermore, we employ learnable prompts to achieve domain and task transfer in a two-stage prompting pipeline, facilitating the prompts to effectively capture domain knowledge and task-specific properties. Our extensive experiments demonstrate that STGP outperforms state-of-the-art baselines in three tasks-forecasting, kriging, and extrapolation-achieving an improvement of up to 10.7%.
△ Less
Submitted 7 November, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing
Authors:
Zhongzheng Qiao,
Xuan Huy Pham,
Savitha Ramasamy,
Xudong Jiang,
Erdal Kayacan,
Andriy Sarabakha
Abstract:
In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The…
▽ More
In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Authors:
Quang Pham,
Giang Do,
Huy Nguyen,
TrungTin Nguyen,
Chenghao Liu,
Mina Sartipi,
Binh T. Nguyen,
Savitha Ramasamy,
Xiaoli Li,
Steven Hoi,
Nhat Ho
Abstract:
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this…
▽ More
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Dynamic Long-Term Time-Series Forecasting via Meta Transformer Networks
Authors:
Muhammad Anwar Ma'sum,
MD Rasel Sarkar,
Mahardhika Pratama,
Savitha Ramasamy,
Sreenatha Anavatti,
Lin Liu,
Habibullah,
Ryszard Kowalczyk
Abstract:
A reliable long-term time-series forecaster is highly demanded in practice but comes across many challenges such as low computational and memory footprints as well as robustness against dynamic learning environments. This paper proposes Meta-Transformer Networks (MANTRA) to deal with the dynamic long-term time-series forecasting tasks. MANTRA relies on the concept of fast and slow learners where a…
▽ More
A reliable long-term time-series forecaster is highly demanded in practice but comes across many challenges such as low computational and memory footprints as well as robustness against dynamic learning environments. This paper proposes Meta-Transformer Networks (MANTRA) to deal with the dynamic long-term time-series forecasting tasks. MANTRA relies on the concept of fast and slow learners where a collection of fast learners learns different aspects of data distributions while adapting quickly to changes. A slow learner tailors suitable representations to fast learners. Fast adaptations to dynamic environments are achieved using the universal representation transformer layers producing task-adapted representations with a small number of parameters. Our experiments using four datasets with different prediction lengths demonstrate the advantage of our approach with at least $3\%$ improvements over the baseline algorithms for both multivariate and univariate settings. Source codes of MANTRA are publicly available in \url{https://github.com/anwarmaxsum/MANTRA}.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Authors:
Giang Do,
Khiem Le,
Quang Pham,
TrungTin Nguyen,
Thanh-Nam Doan,
Bint T. Nguyen,
Chenghao Liu,
Savitha Ramasamy,
Xiaoli Li,
Steven Hoi
Abstract:
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from rando…
▽ More
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces \HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of \HyperRouter compared to existing routing methods. Our implementation is publicly available at {\url{https://github.com/giangdip2410/HyperRouter}}.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
OptiRoute: A Heuristic-assisted Deep Reinforcement Learning Framework for UAV-UGV Collaborative Route Planning
Authors:
Md Safwan Mondal,
Subramanian Ramasamy,
Pranav Bhounsule
Abstract:
Unmanned aerial vehicles (UAVs) are capable of surveying expansive areas, but their operational range is constrained by limited battery capacity. The deployment of mobile recharging stations using unmanned ground vehicles (UGVs) significantly extends the endurance and effectiveness of UAVs. However, optimizing the routes of both UAVs and UGVs, known as the UAV-UGV cooperative routing problem, pose…
▽ More
Unmanned aerial vehicles (UAVs) are capable of surveying expansive areas, but their operational range is constrained by limited battery capacity. The deployment of mobile recharging stations using unmanned ground vehicles (UGVs) significantly extends the endurance and effectiveness of UAVs. However, optimizing the routes of both UAVs and UGVs, known as the UAV-UGV cooperative routing problem, poses substantial challenges, particularly with respect to the selection of recharging locations. Here in this paper, we leverage reinforcement learning (RL) for the purpose of identifying optimal recharging locations while employing constraint programming to determine cooperative routes for the UAV and UGV. Our proposed framework is then benchmarked against a baseline solution that employs Genetic Algorithms (GA) to select rendezvous points. Our findings reveal that RL surpasses GA in terms of reducing overall mission time, minimizing UAV-UGV idle time, and mitigating energy consumption for both the UAV and UGV. These results underscore the efficacy of incorporating heuristics to assist RL, a method we refer to as heuristics-assisted RL, in generating high-quality solutions for intricate routing problems.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Cooperative Multi-Agent Planning Framework for Fuel Constrained UAV-UGV Routing Problem
Authors:
Md Safwan Mondal,
Subramanian Ramasamy,
James D. Humann,
Jean-Paul F. Reddinger,
James M. Dotterweich,
Marshal A. Childers,
Pranav A. Bhounsule
Abstract:
Unmanned Aerial Vehicles (UAVs), although adept at aerial surveillance, are often constrained by limited battery capacity. By refueling on slow-moving Unmanned Ground Vehicles (UGVs), their operational endurance can be significantly enhanced. This paper explores the computationally complex problem of cooperative UAV-UGV routing for vast area surveillance within the speed and fuel constraints, pres…
▽ More
Unmanned Aerial Vehicles (UAVs), although adept at aerial surveillance, are often constrained by limited battery capacity. By refueling on slow-moving Unmanned Ground Vehicles (UGVs), their operational endurance can be significantly enhanced. This paper explores the computationally complex problem of cooperative UAV-UGV routing for vast area surveillance within the speed and fuel constraints, presenting a sequential multi-agent planning framework for achieving feasible and optimally satisfactory solutions. By considering the UAV fuel limits and utilizing a minimum set cover algorithm, we determine UGV refueling stops, which in turn facilitate UGV route planning at the first step and through a task allocation technique and energy constrained vehicle routing problem modeling with time windows (E-VRPTW) we achieve the UAV route at the second step of the framework. The effectiveness of our multi-agent strategy is demonstrated through the implementation on 30 different task scenarios across 3 different scales. This work offers significant insight into the collaborative advantages of UAV-UGV systems and introduces heuristic approaches to bypass computational challenges and swiftly reach high-quality solutions.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Scalable fabrication of gap-plasmon-based dynamic and chromogenic nanostructures by capillary-interaction driven self-assembly of liquid-metal
Authors:
Renu Raman Sahu,
Alwar Samy Ramasamy,
Santosh Bhonsle S,
Mark Vailshery D C,
Tapajyoti Das Gupta
Abstract:
Dynamically tunable nanoengineered structures for coloration show promising applications in sensing, displays, and communication. However, their potential challenge remains in having a scalable manufacturing process over large scales in tens of cm of area. For the first time, we report a novel approach for fabricating chromogenic nanostructures that respond to mechanical stimuli by utilizing the f…
▽ More
Dynamically tunable nanoengineered structures for coloration show promising applications in sensing, displays, and communication. However, their potential challenge remains in having a scalable manufacturing process over large scales in tens of cm of area. For the first time, we report a novel approach for fabricating chromogenic nanostructures that respond to mechanical stimuli by utilizing the fluidic properties of polydimethylsiloxane (PDMS) as a substrate and the interfacial tension of liquid metal-based plasmonic nanoparticles. Relying on the PDMS tunable property and a physical deposition method, our approach is single-step, scalable, and does not rely on high carbon footprint lithographic processes. By tuning the oligomer content in PDMS, we show that varieties of structural colors covering a significant gamut in CIE coordinates are achieved. We develop a model which depicts the formation of Ga nanodroplets from the capillary interaction of oligomers in PDMS with Ga. We showcase the capabilities of our processing technique by presenting prototypes of reflective displays and sensors for monitoring body parts, smart bandages, and the capacity of the nanostructured film to map force in real time. These examples illustrate this technology's broad range of applications, such as large-area displays, devices for human-computer interactions, healthcare, and visual communication.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Solving Vehicle Routing Problem for unmanned heterogeneous vehicle systems using Asynchronous Multi-Agent Architecture (A-teams)
Authors:
Subramanian Ramasamy,
Md Safwan Mondal,
Pranav A. Bhounsule
Abstract:
Fast moving but power hungry unmanned aerial vehicles (UAVs) can recharge on slow-moving unmanned ground vehicles (UGVs) to survey large areas in an effective and efficient manner. In order to solve this computationally challenging problem in a reasonable time, we created a two-level optimization heuristics. At the outer level, the UGV route is parameterized by few free parameters and at the inner…
▽ More
Fast moving but power hungry unmanned aerial vehicles (UAVs) can recharge on slow-moving unmanned ground vehicles (UGVs) to survey large areas in an effective and efficient manner. In order to solve this computationally challenging problem in a reasonable time, we created a two-level optimization heuristics. At the outer level, the UGV route is parameterized by few free parameters and at the inner level, the UAV route is solved by formulating and solving a vehicle routing problem with capacity constraints, time windows, and dropped visits. The UGV free parameters need to be optimized judiciously in order to create high quality solutions. We explore two methods for tuning the free UGV parameters: (1) a genetic algorithm, and (2) Asynchronous Multi-agent architecture (Ateams). The A-teams uses multiple agents to create, improve, and destroy solutions. The parallel asynchronous architecture enables A-teams to quickly optimize the parameters. Our results on test cases show that the A-teams produces similar solutions as genetic algorithm but with a speed up of 2-3 times.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Optimizing Fuel-Constrained UAV-UGV Routes for Large Scale Coverage: Bilevel Planning in Heterogeneous Multi-Agent Systems
Authors:
Md Safwan Mondal,
Subramanian Ramasamy,
Pranav Bhounsule
Abstract:
Fast moving unmanned aerial vehicles (UAVs) are well suited for aerial surveillance, but are limited by their battery capacity. To increase their endurance UAVs can be refueled on slow moving unmanned ground vehicles (UGVs). The cooperative routing of UAV-UGV multi-agent system to survey vast regions within their speed and fuel constraints is a computationally challenging problem, but can be simpl…
▽ More
Fast moving unmanned aerial vehicles (UAVs) are well suited for aerial surveillance, but are limited by their battery capacity. To increase their endurance UAVs can be refueled on slow moving unmanned ground vehicles (UGVs). The cooperative routing of UAV-UGV multi-agent system to survey vast regions within their speed and fuel constraints is a computationally challenging problem, but can be simplified with heuristics. Here we present multiple heuristics to enable feasible and sufficiently optimal solutions to the problem. Using the UAV fuel limits and the minimum set cover algorithm, the UGV refueling stops are determined. These refueling stops enable the allocation of mission points to the UAV and UGV. A standard traveling salesman formulation and a vehicle routing formulation with time windows, dropped visits, and capacity constraints is used to solve for the UGV and UAV route, respectively. Experimental validation on a small-scale testbed (http://tiny.cc/8or8vz) underscores the effectiveness of our multi-agent approach.
△ Less
Submitted 7 July, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Characterizing Error in Noncommutative Geometric Gait Analysis
Authors:
Capprin Bass,
Suresh Ramasamy,
Ross Hatton
Abstract:
A key problem in robotic locomotion is in finding optimal shape changes to effectively displace systems through the world. Variational techniques for gait optimization require estimates of body displacement per gait cycle; however, these estimates introduce error due to unincluded high order terms. In this paper, we formulate existing estimates for displacement, and describe the contribution of lo…
▽ More
A key problem in robotic locomotion is in finding optimal shape changes to effectively displace systems through the world. Variational techniques for gait optimization require estimates of body displacement per gait cycle; however, these estimates introduce error due to unincluded high order terms. In this paper, we formulate existing estimates for displacement, and describe the contribution of low order terms to these estimates. We additionally describe the magnitude of higher (third) order effects, and identify that choice of body coordinate, gait diameter, and starting phase influence these effects. We demonstrate that variation of such parameters on two example systems (the differential drive car and Purcell swimmer) effectively manages third order contributions.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Contrastive predictive coding for Anomaly Detection in Multi-variate Time Series Data
Authors:
Theivendiram Pranavan,
Terence Sim,
Arulmurugan Ambikapathi,
Savitha Ramasamy
Abstract:
Anomaly detection in multi-variate time series (MVTS) data is a huge challenge as it requires simultaneous representation of long term temporal dependencies and correlations across multiple variables. More often, this is solved by breaking the complexity through modeling one dependency at a time. In this paper, we propose a Time-series Representational Learning through Contrastive Predictive Codin…
▽ More
Anomaly detection in multi-variate time series (MVTS) data is a huge challenge as it requires simultaneous representation of long term temporal dependencies and correlations across multiple variables. More often, this is solved by breaking the complexity through modeling one dependency at a time. In this paper, we propose a Time-series Representational Learning through Contrastive Predictive Coding (TRL-CPC) towards anomaly detection in MVTS data. First, we jointly optimize an encoder, an auto-regressor and a non-linear transformation function to effectively learn the representations of the MVTS data sets, for predicting future trends. It must be noted that the context vectors are representative of the observation window in the MTVS. Next, the latent representations for the succeeding instants obtained through non-linear transformations of these context vectors, are contrasted with the latent representations of the encoder for the multi-variables such that the density for the positive pair is maximized. Thus, the TRL-CPC helps to model the temporal dependencies and the correlations of the parameters for a healthy signal pattern. Finally, fitting the latent representations are fit into a Gaussian scoring function to detect anomalies. Evaluation of the proposed TRL-CPC on three MVTS data sets against SOTA anomaly detection methods shows the superiority of TRL-CPC.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
PaRT: Parallel Learning Towards Robust and Transparent AI
Authors:
Mahsa Paknezhad,
Hamsawardhini Rengarajan,
Chenghao Yuan,
Sujanya Suresh,
Manas Gupta,
Savitha Ramasamy,
Hwee Kuan Lee
Abstract:
This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network re…
▽ More
This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network resources. Therefore, the trained network can share similar representations across various tasks, while also enabling independent task-related representations. The above allows for some crucial outcomes. (1) The parallel nature of our approach negates the issue of catastrophic forgetting. (2) The sharing of segments uses network resources more efficiently. (3) We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations. (4) Through examination of individual task-related and shared representations, the model offers transparency in the network and in the relationships across tasks in a multi-task setting. Evaluation of the proposed approach against complex competing approaches such as Continual Learning, Neural Architecture Search, and Multi-task learning shows that it is capable of learning robust representations. This is the first effort to train a DL model on multiple tasks in parallel. Our code is available at https://github.com/MahsaPaknezhad/PaRT
△ Less
Submitted 23 February, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Incremental Knowledge Tracing from Multiple Schools
Authors:
Sujanya Suresh,
Savitha Ramasamy,
P. N. Suganthan,
Cheryl Sze Yin Wong
Abstract:
Knowledge tracing is the task of predicting a learner's future performance based on the history of the learner's performance. Current knowledge tracing models are built based on an extensive set of data that are collected from multiple schools. However, it is impossible to pool learner's data from all schools, due to data privacy and PDPA policies. Hence, this paper explores the feasibility of bui…
▽ More
Knowledge tracing is the task of predicting a learner's future performance based on the history of the learner's performance. Current knowledge tracing models are built based on an extensive set of data that are collected from multiple schools. However, it is impossible to pool learner's data from all schools, due to data privacy and PDPA policies. Hence, this paper explores the feasibility of building knowledge tracing models while preserving the privacy of learners' data within their respective schools. This study is conducted using part of the ASSISTment 2009 dataset, with data from multiple schools being treated as separate tasks in a continual learning framework. The results show that learning sequentially with the Self Attentive Knowledge Tracing (SAKT) algorithm is able to achieve considerably similar performance to that of pooling all the data together.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series
Authors:
Astha Garg,
Wenyu Zhang,
Jules Samaran,
Savitha Ramasamy,
Chuan-Sheng Foo
Abstract:
Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Un…
▽ More
Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e. the scoring functions independently of each other, through a grid of 10 models and 4 scoring functions, comparing these variants to state of the art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time-points. Through experiments, we find that the existing evaluation metrics either do not take events into account, or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score ($Fc_1$), for evaluating time-series anomaly detection.
Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model - the Univariate Fully-Connected Auto-Encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state of the art algorithms.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Geometric analysis of gaits and optimal control for three-link kinematic swimmers
Authors:
Oren Wiezel,
Suresh Ramasamy,
Nathan Justus,
Yizhar Or,
Ross Hatton
Abstract:
Many robotic systems locomote using gaits - periodic changes of internal shape, whose mechanical interaction with the robot's environment generate characteristic net displacements. Prominent examples with two shape variables are the low Reynolds number 3-link "Purcell swimmer" with inputs of 2 joint angles and the "ideal fluid" swimmer. Gait analysis of these systems allows for intelligent decisio…
▽ More
Many robotic systems locomote using gaits - periodic changes of internal shape, whose mechanical interaction with the robot's environment generate characteristic net displacements. Prominent examples with two shape variables are the low Reynolds number 3-link "Purcell swimmer" with inputs of 2 joint angles and the "ideal fluid" swimmer. Gait analysis of these systems allows for intelligent decisions to be made about the swimmer's locomotive properties, increasing the potential for robotic autonomy. In this work, we present comparative analysis of gait optimization using two different methods. The first method is variational approach of "Pontryagin's maximum principle" (PMP) from optimal control theory. We apply PMP for several variants of 3-link swimmers, with and without incorporation of bounds on joint angles. The second method is differential-geometric analysis of the gaits based on curvature (total Lie bracket) of the local connection for 3-link swimmers. Using optimized body-motion coordinates, contour plots of the curvature in shape space give visualization that enables identifying distance-optimal gaits as zero level sets. Combining and comparing results of the two methods enables better understanding of changes in existence, shape and topology of distance-optimal gait trajectories, depending on the swimmers' parameters.
△ Less
Submitted 24 August, 2023; v1 submitted 14 September, 2021;
originally announced September 2021.
-
The Geometry of Optimal Gaits for Inertia-dominated Kinematic Systems
Authors:
Ross L. Hatton,
Zachary Brock,
Shuoqi Chen,
Howie Choset,
Hossein Faraji,
Ruijie Fu,
Nathan Justus,
Suresh Ramasamy
Abstract:
Isolated mechanical systems -- e.g., those floating in space, in free-fall, or on a frictionless surface -- are able to achieve net rotation by cyclically changing their shape, even if they have no net angular momentum. Similarly, swimmers immersed in "perfect fluids" are able to use cyclic shape changes to both translate and rotate even if the swimmer-fluid system has no net linear or angular mom…
▽ More
Isolated mechanical systems -- e.g., those floating in space, in free-fall, or on a frictionless surface -- are able to achieve net rotation by cyclically changing their shape, even if they have no net angular momentum. Similarly, swimmers immersed in "perfect fluids" are able to use cyclic shape changes to both translate and rotate even if the swimmer-fluid system has no net linear or angular momentum. Finally, systems fully constrained by direct nonholonomic constraints (e.g., passive wheels) can push against these constraints to move through the world. Previous work has demonstrated that the net displacement induced by these shape changes corresponds to the amount of *constraint curvature* that the gaits enclose.
To properly assess or optimize the utility of a gait, however, we must also consider the time or resources required to execute it: A gait that produces a small displacement per cycle, but that can be executed in a short time, may produce a faster average velocity than a gait that produces a large displacement per cycle, but takes much longer to complete a cycle at the same average instantaneous effort.
In this paper, we consider two effort-based cost functions for assessing the costs associated with executing these cycles. For each of these cost functions, we demonstrate that fixing the average instantaneous cost to a unit value allows us to transform the effort costs into time-to-execute costs for any given gait cycle. We then illustrate how the interaction between the constraint curvature and these costs leads to characteristic geometries for optimal cycles, in which the gait trajectories resemble elastic hoops distended from within by internal pressures.
△ Less
Submitted 29 September, 2021; v1 submitted 23 January, 2021;
originally announced February 2021.
-
Knowledge Capture and Replay for Continual Learning
Authors:
Saisubramaniam Gopalakrishnan,
Pranshu Ranjan Singh,
Haytham Fayek,
Savitha Ramasamy,
Arulmurugan Ambikapathi
Abstract:
Deep neural networks have shown promise in several domains, and the learned data (task) specific information is implicitly stored in the network parameters. Extraction and utilization of encoded knowledge representations are vital when data is no longer available in the future, especially in a continual learning scenario. In this work, we introduce {\em flashcards}, which are visual representation…
▽ More
Deep neural networks have shown promise in several domains, and the learned data (task) specific information is implicitly stored in the network parameters. Extraction and utilization of encoded knowledge representations are vital when data is no longer available in the future, especially in a continual learning scenario. In this work, we introduce {\em flashcards}, which are visual representations that {\em capture} the encoded knowledge of a network as a recursive function of predefined random image patterns. In a continual learning scenario, flashcards help to prevent catastrophic forgetting and consolidating knowledge of all the previous tasks. Flashcards need to be constructed only before learning the subsequent task, and hence, independent of the number of tasks trained before. We demonstrate the efficacy of flashcards in capturing learned knowledge representation (as an alternative to the original dataset) and empirically validate on a variety of continual learning tasks: reconstruction, denoising, task-incremental learning, and new-instance learning classification, using several heterogeneous benchmark datasets. Experimental evidence indicates that: (i) flashcards as a replay strategy is { \em task agnostic}, (ii) performs better than generative replay, and (iii) is on par with episodic replay without additional memory overhead.
△ Less
Submitted 29 April, 2021; v1 submitted 12 December, 2020;
originally announced December 2020.
-
Optimal Gaits for Drag-dominated Swimmers with Passive Elastic Joints
Authors:
Suresh Ramasamy,
Ross L. Hatton
Abstract:
In this paper, we identify optimal swimming strategies for drag-dominated swimmers with a passive elastic joint. We use resistive force theory (RFT) to obtain the dynamics of the system. We then use frequency domain analysis to relate the motion of the passive joint to the motion of the actuated joint. We couple this analysis with elements of the geometric framework introduced in our previous work…
▽ More
In this paper, we identify optimal swimming strategies for drag-dominated swimmers with a passive elastic joint. We use resistive force theory (RFT) to obtain the dynamics of the system. We then use frequency domain analysis to relate the motion of the passive joint to the motion of the actuated joint. We couple this analysis with elements of the geometric framework introduced in our previous work aimed at identifying useful gaits for systems in drag dominated environments, to identify speed-maximizing and efficiency-maximizing gaits for drag-dominated swimmers with a passive elastic joint.
△ Less
Submitted 29 September, 2020;
originally announced October 2020.
-
Large-scale Gender/Age Prediction of Tumblr Users
Authors:
Yao Zhan,
Changwei Hu,
Yifan Hu,
Tejaswi Kasturi,
Shanmugam Ramasamy,
Matt Gillingham,
Keith Yamamoto
Abstract:
Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gende…
▽ More
Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series
Authors:
Yang Guo,
Zhengyuan Liu,
Pavitra Krishnswamy,
Savitha Ramasamy
Abstract:
Real-world clinical time series data sets exhibit a high prevalence of missing values. Hence, there is an increasing interest in missing data imputation. Traditional statistical approaches impose constraints on the data-generating process and decouple imputation from prediction. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time seri…
▽ More
Real-world clinical time series data sets exhibit a high prevalence of missing values. Hence, there is an increasing interest in missing data imputation. Traditional statistical approaches impose constraints on the data-generating process and decouple imputation from prediction. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time series data. However, they generate deterministic outputs and neglect the inherent uncertainty. In this work, we introduce a unified Bayesian recurrent framework for simultaneous imputation and prediction on time series data sets. We evaluate our approach on two real-world mortality prediction tasks using the MIMIC-III and PhysioNet benchmark datasets. We demonstrate strong performance gains over state-of-the-art (SOTA) methods, and provide strategies to use the resulting probability distributions to better assess reliability of the imputations and predictions.
△ Less
Submitted 10 January, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
A novel method for extracting interpretable knowledge from a spiking neural classifier with time-varying synaptic weights
Authors:
Abeegithan Jeyasothy,
Suresh Sundaram,
Savitha Ramasamy,
Narasimhan Sundararajan
Abstract:
This paper presents a novel method for information interpretability in an MC-SEFRON classifier. To develop a method to extract knowledge stored in a trained classifier, first, the binary-class SEFRON classifier developed earlier is extended to handle multi-class problems. MC-SEFRON uses the population encoding scheme to encode the real-valued input data into spike patterns. MC-SEFRON is trained us…
▽ More
This paper presents a novel method for information interpretability in an MC-SEFRON classifier. To develop a method to extract knowledge stored in a trained classifier, first, the binary-class SEFRON classifier developed earlier is extended to handle multi-class problems. MC-SEFRON uses the population encoding scheme to encode the real-valued input data into spike patterns. MC-SEFRON is trained using the same supervised learning rule used in the SEFRON. After training, the proposed method extracts the knowledge for a given class stored in the classifier by mapping the weighted postsynaptic potential in the time domain to the feature domain as Feature Strength Functions (FSFs). A set of FSFs corresponding to each output class represents the extracted knowledge from the classifier. This knowledge encoding method is derived to maintain consistency between the classification in the time domain and the feature domain. The correctness of the FSF is quantitatively measured by using FSF directly for classification tasks. For a given input, each FSF is sampled at the input value to obtain the corresponding feature strength value (FSV). Then the aggregated FSVs obtained for each class are used to determine the output class labels during classification. FSVs are also used to interpret the predictions during the classification task. Using ten UCI datasets and the MNIST dataset, the knowledge extraction method, interpretation and the reliability of the FSF are demonstrated. Based on the studies, it can be seen that on an average, the difference in the classification accuracies using the FSF directly and those obtained by MC-SEFRON is only around 0.9% & 0.1\% for the UCI datasets and the MNIST dataset respectively. This clearly shows that the knowledge represented by the FSFs has acceptable reliability and the interpretability of classification using the classifier's knowledge has been justified.
△ Less
Submitted 28 February, 2019;
originally announced April 2019.
-
Efficient single input-output layer spiking neural classifier with time-varying weight model
Authors:
Abeegithan Jeyasothy,
Savitha Ramasamy,
Suresh Sundaram
Abstract:
This paper presents a supervised learning algorithm, namely, the Synaptic Efficacy Function with Meta-neuron based learning algorithm (SEF-M) for a spiking neural network with a time-varying weight model. For a given pattern, SEF-M uses the learning algorithm derived from meta-neuron based learning algorithm to determine the change in weights corresponding to each presynaptic spike times. The chan…
▽ More
This paper presents a supervised learning algorithm, namely, the Synaptic Efficacy Function with Meta-neuron based learning algorithm (SEF-M) for a spiking neural network with a time-varying weight model. For a given pattern, SEF-M uses the learning algorithm derived from meta-neuron based learning algorithm to determine the change in weights corresponding to each presynaptic spike times. The changes in weights modulate the amplitude of a Gaussian function centred at the same presynaptic spike times. The sum of amplitude modulated Gaussian functions represents the synaptic efficacy functions (or time-varying weight models). The performance of SEF-M is evaluated against state-of-the-art spiking neural network learning algorithms on 10 benchmark datasets from UCI machine learning repository. Performance studies show superior generalization ability of SEF-M. An ablation study on time-varying weight model is conducted using JAFFE dataset. The results of the ablation study indicate that using a time-varying weight model instead of single weight model improves the classification accuracy by 14%. Thus, it can be inferred that a single input-output layer spiking neural network with time-varying weight model is computationally more efficient than a multi-layer spiking neural network with long-term or short-term weight model.
△ Less
Submitted 21 March, 2019;
originally announced April 2019.
-
Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring
Authors:
Zhengyuan Liu,
Hazel Lim,
Nur Farah Ain Binte Suhaimi,
Shao Chuen Tong,
Sharon Ong,
Angela Ng,
Sheldon Lee,
Michael R. Macdonald,
Savitha Ramasamy,
Pavitra Krishnaswamy,
Wai Leng Chow,
Nancy F. Chen
Abstract:
Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversation…
▽ More
Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations.
△ Less
Submitted 5 April, 2019; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Accelerating Photovoltaic Materials Development via High-Throughput Experiments and Machine-Learning-Assisted Diagnosis
Authors:
Shijing Sun,
Noor T. P. Hartono,
Zekun D. Ren,
Felipe Oviedo,
Antonio M. Buscemi,
Mariya Layurova,
De Xin Chen,
Tofunmi Ogunfunmi,
Janak Thapa,
Savitha Ramasamy,
Charles Settens,
Brian L. DeCost,
Aaron Gilad Kusne,
Zhe Liu,
Siyu I. P. Tian,
I. Marius Peters,
Juan-Pablo Correa-Baena,
Tonio Buonassisi
Abstract:
Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased thro…
▽ More
Accelerating the experimental cycle for new materials development is vital for addressing the grand energy challenges of the 21st century. We fabricate and characterize 75 unique halide perovskite-inspired solution-based thin-film materials within a two-month period, with 87% exhibiting band gaps between 1.2 eV and 2.4 eV that are of interest for energy-harvesting applications. This increased throughput is enabled by streamlining experimental workflows, developing a set of precursors amenable to high-throughput synthesis, and developing machine-learning assisted diagnosis. We utilize a deep neural network to classify compounds based on experimental X-ray diffraction data into 0D, 2D, and 3D structures more than 10 times faster than human analysis and with 90% accuracy. We validate our methods using lead-halide perovskites and extend the application to novel lead-free compositions. The wider synthesis window and faster cycle of learning enables three noteworthy scientific findings: (1) we realize four inorganic layered perovskites, A3B2Br9 (A = Cs, Rb; B = Bi, Sb) in thin-film form via one-step liquid deposition; (2) we report a multi-site lead-free alloy series that was not previously described in literature, Cs3(Bi1-xSbx)2(I1-xBrx)9; and (3) we reveal the effect on bandgap (reduction to <2 eV) and structure upon simultaneous alloying on the B-site and X-site of Cs3Bi2I9 with Sb and Br. This study demonstrates that combining an accelerated experimental cycle of learning and machine-learning based diagnosis represents an important step toward realizing fully-automated laboratories for materials discovery and development.
△ Less
Submitted 25 November, 2018;
originally announced December 2018.
-
Predicting thermoelectric properties from crystal graphs and material descriptors - first application for functional materials
Authors:
Leo Laugier,
Daniil Bash,
Jose Recatala,
Hong Kuan Ng,
Savitha Ramasamy,
Chuan-Sheng Foo,
Vijay R Chandrasekhar,
Kedar Hippalgaonkar
Abstract:
We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results…
▽ More
We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results show that the optimized FCNN is three layer deep and is able to predict the scattering-time independent thermoelectric powerfactor much better than the CGCNN (or XGBoost), suggesting that bonding and density of states descriptors informed from materials science knowledge obtained partially from DFT are vital to predict functional properties.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Autonomous Deep Learning: Incremental Learning of Denoising Autoencoder for Evolving Data Streams
Authors:
Mahardhika Pratama,
Andri Ashfahani,
Yew Soon Ong,
Savitha Ramasamy,
Edwin Lughofer
Abstract:
The generative learning phase of Autoencoder (AE) and its successor Denosing Autoencoder (DAE) enhances the flexibility of data stream method in exploiting unlabelled samples. Nonetheless, the feasibility of DAE for data stream analytic deserves in-depth study because it characterizes a fixed network capacity which cannot adapt to rapidly changing environments. An automated construction of a denoi…
▽ More
The generative learning phase of Autoencoder (AE) and its successor Denosing Autoencoder (DAE) enhances the flexibility of data stream method in exploiting unlabelled samples. Nonetheless, the feasibility of DAE for data stream analytic deserves in-depth study because it characterizes a fixed network capacity which cannot adapt to rapidly changing environments. An automated construction of a denoising autoeconder, namely deep evolving denoising autoencoder (DEVDAN), is proposed in this paper. DEVDAN features an open structure both in the generative phase and in the discriminative phase where input features can be automatically added and discarded on the fly. A network significance (NS) method is formulated in this paper and is derived from the bias-variance concept. This method is capable of estimating the statistical contribution of the network structure and its hidden units which precursors an ideal state to add or prune input features. Furthermore, DEVDAN is free of the problem- specific threshold and works fully in the single-pass learning fashion. The efficacy of DEVDAN is numerically validated using nine non-stationary data stream problems simulated under the prequential test-then-train protocol where DEVDAN is capable of delivering an improvement of classification accuracy to recently published online learning works while having flexibility in the automatic extraction of robust input features and in adapting to rapidly changing environments.
△ Less
Submitted 24 September, 2018;
originally announced September 2018.
-
Network Topology Mapping from Partial Virtual Coordinates and Graph Geodesics
Authors:
Anura P. Jayasumana,
Randy Paffenroth,
Gunjan Mahindre,
Sridhar Ramasamy,
Kelum Gajamannage
Abstract:
For many important network types (e.g., sensor networks in complex harsh environments and social networks) physical coordinate systems (e.g., Cartesian), and physical distances (e.g., Euclidean), are either difficult to discern or inapplicable. Accordingly, coordinate systems and characterizations based on hop-distance measurements, such as Topology Preserving Maps (TPMs) and Virtual-Coordinate (V…
▽ More
For many important network types (e.g., sensor networks in complex harsh environments and social networks) physical coordinate systems (e.g., Cartesian), and physical distances (e.g., Euclidean), are either difficult to discern or inapplicable. Accordingly, coordinate systems and characterizations based on hop-distance measurements, such as Topology Preserving Maps (TPMs) and Virtual-Coordinate (VC) systems are attractive alternatives to Cartesian coordinates for many network algorithms. Herein, we present an approach to recover geometric and topological properties of a network with a small set of distance measurements. In particular, our approach is a combination of shortest path (often called geodesic) recovery concepts and low-rank matrix completion, generalized to the case of hop-distances in graphs. Results for sensor networks embedded in 2-D and 3-D spaces, as well as a social networks, indicates that the method can accurately capture the network connectivity with a small set of measurements. TPM generation can now also be based on various context appropriate measurements or VC systems, as long as they characterize different nodes by distances to small sets of random nodes (instead of a set of global anchors). The proposed method is a significant generalization that allows the topology to be extracted from a random set of graph shortest paths, making it applicable in contexts such as social networks where VC generation may not be possible.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Online Deep Learning: Growing RBM on the fly
Authors:
Savitha Ramasamy,
Kanagasabai Rajaraman,
Pavitra Krishnaswamy,
Vijay Chandrasekhar
Abstract:
We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representat…
▽ More
We propose a novel online learning algorithm for Restricted Boltzmann Machines (RBM), namely, the Online Generative Discriminative Restricted Boltzmann Machine (OGD-RBM), that provides the ability to build and adapt the network architecture of RBM according to the statistics of streaming data. The OGD-RBM is trained in two phases: (1) an online generative phase for unsupervised feature representation at the hidden layer and (2) a discriminative phase for classification. The online generative training begins with zero neurons in the hidden layer, adds and updates the neurons to adapt to statistics of streaming data in a single pass unsupervised manner, resulting in a feature representation best suited to the data. The discriminative phase is based on stochastic gradient descent and associates the represented features to the class labels. We demonstrate the OGD-RBM on a set of multi-category and binary classification problems for data sets having varying degrees of class-imbalance. We first apply the OGD-RBM algorithm on the multi-class MNIST dataset to characterize the network evolution. We demonstrate that the online generative phase converges to a stable, concise network architecture, wherein individual neurons are inherently discriminative to the class labels despite unsupervised training. We then benchmark OGD-RBM performance to other machine learning, neural network and ClassRBM techniques for credit scoring applications using 3 public non-stationary two-class credit datasets with varying degrees of class-imbalance. We report that OGD-RBM improves accuracy by 2.5-3% over batch learning techniques while requiring at least 24%-70% fewer neurons and fewer training samples. This online generative training approach can be extended greedily to multiple layers for training Deep Belief Networks in non-stationary data mining applications without the need for a priori fixed architectures.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
Network Topology Mapping from Partial Virtual Coordinates and Graph Geodesics
Authors:
Anura P. Jayasumana,
Randy Paffenroth,
Sridhar Ramasamy
Abstract:
For many important network types (e.g., sensor networks in complex harsh environments and social networks) physical coordinate systems (e.g., Cartesian), and physical distances (e.g., Euclidean), are either difficult to discern or inappropriate. Accordingly, Topology Preserving Maps (TPMs) derived from a Virtual-Coordinate (VC) system representing the distance to a small set of anchors is an attra…
▽ More
For many important network types (e.g., sensor networks in complex harsh environments and social networks) physical coordinate systems (e.g., Cartesian), and physical distances (e.g., Euclidean), are either difficult to discern or inappropriate. Accordingly, Topology Preserving Maps (TPMs) derived from a Virtual-Coordinate (VC) system representing the distance to a small set of anchors is an attractive alternative to physical coordinates for many network algorithms. Herein, we present an approach, based on the theory of low-rank matrix completion, to recover geometric properties of a network with only partial information about the VCs of nodes. In particular, our approach is a combination of geodesic recovery concepts and low-rank matrix completion, generalized to the case of hop-distances in graphs. Distortion evaluated using the change of distance among node pairs shows that even with up to 40% to 60% of random coordinates missing, accurate TPMs can be obtained. TPM generation can now also be based on different context appropriate VC systems or measurements as long as they characterize each node with distances to a small set of random nodes (instead of a global set of anchors). The proposed method is a significant generalization that allows the topology to be extracted from a random set of graph geodesics, making it applicable in contexts such as social networks where VC generation may not be possible.
△ Less
Submitted 13 September, 2018; v1 submitted 28 December, 2017;
originally announced December 2017.
-
Soap-bubble Optimization of Gaits
Authors:
Suresh Ramasamy,
Ross L. Hatton
Abstract:
In this paper, we present a geometric variational algorithm for optimizing the gaits of kinematic locomoting systems. The dynamics of this algorithm are analogous to the physics of a soap bubble, with the system's Lie bracket supplying an "inflation pressure" that is balanced by a "surface tension" term derived from a Riemannian metric on the system's shape space. We demonstrate this optimizer on…
▽ More
In this paper, we present a geometric variational algorithm for optimizing the gaits of kinematic locomoting systems. The dynamics of this algorithm are analogous to the physics of a soap bubble, with the system's Lie bracket supplying an "inflation pressure" that is balanced by a "surface tension" term derived from a Riemannian metric on the system's shape space. We demonstrate this optimizer on a variety of system geometries (including Purcell's swimmer) and for optimization criteria that include maximizing displacement and efficiency of motion for both translation and turning motions.
△ Less
Submitted 25 October, 2016; v1 submitted 8 September, 2016;
originally announced September 2016.