Search | arXiv e-print repository

doi 10.1364/JOCN.551188

Interference Detection in Spectrum-Blind Multi-User Optical Spectrum as a Service

Authors: Agastya Raj, Daniel C. Kilper, Marco Ruffini

Abstract: With the growing demand for high-bandwidth, low-latency applications, Optical Spectrum as a Service (OSaaS) is of interest for flexible bandwidth allocation within Elastic Optical Networks (EONs) and Open Line Systems (OLS). While OSaaS facilitates transparent connectivity and resource sharing among users, it raises concerns over potential network vulnerabilities due to shared fiber access and int… ▽ More With the growing demand for high-bandwidth, low-latency applications, Optical Spectrum as a Service (OSaaS) is of interest for flexible bandwidth allocation within Elastic Optical Networks (EONs) and Open Line Systems (OLS). While OSaaS facilitates transparent connectivity and resource sharing among users, it raises concerns over potential network vulnerabilities due to shared fiber access and inter-channel interference, such as fiber non-linearity and amplifier based crosstalk. These challenges are exacerbated in multi-user environments, complicating the identification and localization of service interferences. To reduce system disruptions and system repair costs, it is beneficial to detect and identify such interferences timely. Addressing these challenges, this paper introduces a Machine Learning (ML) based architecture for network operators to detect and attribute interferences to specific OSaaS users while blind to the users' internal spectrum details. Our methodology leverages available coarse power measurements and operator channel performance data, bypassing the need for internal user information of wide-band shared spectra. Experimental studies conducted on a 190 km optical line system in the Open Ireland testbed, with three OSaaS users demonstrate the model's capability to accurately classify the source of interferences, achieving a classification accuracy of 90.3%. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: This is a preprint of a paper accepted and published in the Journal of Optical Communications and Networking (JOCN). The final published version is available at: https://doi.org/10.1364/JOCN.551188

Journal ref: Journal of Optical Communications and Networking, Vol. 17, Issue 8, pp. C117-C126 (2025)

arXiv:2504.13125 [pdf, other]

LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard

Authors: Varun Rao, Youran Sun, Mahendra Kumar, Tejas Mutneja, Agastya Mukherjee, Haizhao Yang

Abstract: This paper investigates the application of large language models (LLMs) to financial tasks. We fine-tuned foundation models using the Open FinLLM Leaderboard as a benchmark. Building on Qwen2.5 and Deepseek-R1, we employed techniques including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) to enhance their financial capabilities. The fine-tuned… ▽ More This paper investigates the application of large language models (LLMs) to financial tasks. We fine-tuned foundation models using the Open FinLLM Leaderboard as a benchmark. Building on Qwen2.5 and Deepseek-R1, we employed techniques including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) to enhance their financial capabilities. The fine-tuned models demonstrated substantial performance gains across a wide range of financial tasks. Moreover, we measured the data scaling law in the financial domain. Our work demonstrates the potential of large language models (LLMs) in financial applications. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2503.18495 [pdf, other]

Real-Time Streaming Telemetry Based Detection and Mitigation of OOK and Power Interference in Multi-User OSaaS Networks

Authors: Agastya Raj, Devika Dass, Daniel C. Kilper, Marco Ruffini

Abstract: We present a framework to identify and mitigate rogue OOK signals and user-generated power interference in a multi-user Optical-Spectrum-as-a-Service network. Experimental tests on the OpenIreland-testbed achieve up to 89% detection rate within 10 seconds of an interference event. We present a framework to identify and mitigate rogue OOK signals and user-generated power interference in a multi-user Optical-Spectrum-as-a-Service network. Experimental tests on the OpenIreland-testbed achieve up to 89% detection rate within 10 seconds of an interference event. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: This paper is a preprint of a paper submitted to OFC 2025

arXiv:2503.17094 [pdf, other]

Transfer Learning for EDFA Gain Modeling: A Semi-Supervised Approach Using Internal Amplifier Features

Authors: Agastya Raj, Dan Kilper, Marco Ruffini

Abstract: The gain spectrum of an Erbium-Doped Fiber Amplifier (EDFA) has a complex dependence on channel loading, pump power, and operating mode, making accurate modeling difficult to achieve. Machine Learning (ML) based modeling methods can achieve high accuracy, but they require comprehensive data collection. We present a novel ML-based Semi-Supervised, Self-Normalizing Neural Network (SS-NN) framework t… ▽ More The gain spectrum of an Erbium-Doped Fiber Amplifier (EDFA) has a complex dependence on channel loading, pump power, and operating mode, making accurate modeling difficult to achieve. Machine Learning (ML) based modeling methods can achieve high accuracy, but they require comprehensive data collection. We present a novel ML-based Semi-Supervised, Self-Normalizing Neural Network (SS-NN) framework to model the wavelength dependent gain of EDFAs using minimal data, which achieve a Mean Absolute Error (MAE) of 0.07/0.08 dB for booster/pre-amplifier gain prediction. We further perform Transfer Learning (TL) using a single additional measurement per target-gain setting to transfer this model among 22 EDFAs in Open Ireland and COSMOS testbeds, which achieves a MAE of less than 0.19 dB even when operated across different amplifier types. We show that the SS-NN model achieves high accuracy for gain spectrum prediction with minimal data requirement when compared with current benchmark methods. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: This paper is a preprint of a paper accepted to IEEE Future Networks World Forum (FNWF) 2024

arXiv:2503.17079 [pdf, other]

Interference Identification in Multi-User Optical Spectrum as a Service using Convolutional Neural Networks

Authors: Agastya Raj, Zehao Wang, Frank Slyne, Tingjun Chen, Dan Kilper, Marco Ruffini

Abstract: We introduce a ML-based architecture for network operators to detect impairments from specific OSaaS users while blind to the users' internal spectrum details. Experimental studies with three OSaaS users demonstrate the model's capability to accurately classify the source of impairments, achieving classification accuracy of 94.2%. We introduce a ML-based architecture for network operators to detect impairments from specific OSaaS users while blind to the users' internal spectrum details. Experimental studies with three OSaaS users demonstrate the model's capability to accurately classify the source of impairments, achieving classification accuracy of 94.2%. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: This paper is a preprint of a paper accepted to ECOC 2024 and is subject to Institution of Engineering and Technology Copyright. A copy of record will be available at IET Digital Library

arXiv:2503.17072 [pdf, other]

Multi-Span Optical Power Spectrum Evolution Modeling using ML-based Multi-Decoder Attention Framework

Authors: Agastya Raj, Zehao Wang, Frank Slyne, Tingjun Chen, Dan Kilper, Marco Ruffini

Abstract: We implement a ML-based attention framework with component-specific decoders, improving optical power spectrum prediction in multi-span networks. By reducing the need for in-depth training on each component, the framework can be scaled to multi-span topologies with minimal data collection, making it suitable for brown-field scenarios. We implement a ML-based attention framework with component-specific decoders, improving optical power spectrum prediction in multi-span networks. By reducing the need for in-depth training on each component, the framework can be scaled to multi-span topologies with minimal data collection, making it suitable for brown-field scenarios. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: This paper is a preprint of a paper accepted in ECOC 2024 and is subject to Institution of Engineering and Technology Copyright. A copy of record will be available at IET Digital Library

arXiv:2412.14315 [pdf, other]

On the Robustness of Spectral Algorithms for Semirandom Stochastic Block Models

Authors: Aditya Bhaskara, Agastya Vibhuti Jha, Michael Kapralov, Naren Sarayu Manoj, Davide Mazzali, Weronika Wrzos-Kaminska

Abstract: In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to prova… ▽ More In a graph bisection problem, we are given a graph $G$ with two equally-sized unlabeled communities, and the goal is to recover the vertices in these communities. A popular heuristic, known as spectral clustering, is to output an estimated community assignment based on the eigenvector corresponding to the second smallest eigenvalue of the Laplacian of $G$. Spectral algorithms can be shown to provably recover the cluster structure for graphs generated from certain probabilistic models, such as the Stochastic Block Model (SBM). However, spectral clustering is known to be non-robust to model mis-specification. Techniques based on semidefinite programming have been shown to be more robust, but they incur significant computational overheads. In this work, we study the robustness of spectral algorithms against semirandom adversaries. Informally, a semirandom adversary is allowed to ``helpfully'' change the specification of the model in a way that is consistent with the ground-truth solution. Our semirandom adversaries in particular are allowed to add edges inside clusters or increase the probability that an edge appears inside a cluster. Semirandom adversaries are a useful tool to determine the extent to which an algorithm has overfit to statistical assumptions on the input. On the positive side, we identify classes of semirandom adversaries under which spectral bisection using the _unnormalized_ Laplacian is strongly consistent, i.e., it exactly recovers the planted partitioning. On the negative side, we show that in these classes spectral bisection with the _normalized_ Laplacian outputs a partitioning that makes a classification mistake on a constant fraction of the vertices. Finally, we demonstrate numerical experiments that complement our theoretical findings. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 45 pages. NeurIPS 2024

arXiv:2407.03525 [pdf, ps, other]

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

Authors: Md Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son, Eduardo Blanco, Steven R. Corman, Chitta Baral

Abstract: This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning w… ▽ More This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. Our data generation framework enables on-demand generation of new samples, mitigating the risk of data leakage. We designed three types of time-sensitive questions to test LLMs' temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs on synthetic fact-based TSQA reveals mixed results: while they perform well on simpler subsets, their overall performance remains inferior as compared to real world fact-based TSQA. Error analysis indicates that LLMs face difficulties in reasoning over long-range event dependencies and parallel events. △ Less

Submitted 2 June, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2025 (Main)

arXiv:2405.11844 [pdf]

NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors

Authors: Harideep Nair, William Leyman, Agastya Sampath, Quinn Jacobson, John Paul Shen

Abstract: Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, pred… ▽ More Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Accepted and Presented at Neuro-Inspired Computational Elements (NICE) Conference, La Jolla, CA. 2024

arXiv:2405.09755 [pdf, other]

Collision Avoidance Metric for 3D Camera Evaluation

Authors: Vage Taamazyan, Alberto Dall'olio, Agastya Kalra

Abstract: 3D cameras have emerged as a critical source of information for applications in robotics and autonomous driving. These cameras provide robots with the ability to capture and utilize point clouds, enabling them to navigate their surroundings and avoid collisions with other objects. However, current standard camera evaluation metrics often fail to consider the specific application context. These met… ▽ More 3D cameras have emerged as a critical source of information for applications in robotics and autonomous driving. These cameras provide robots with the ability to capture and utilize point clouds, enabling them to navigate their surroundings and avoid collisions with other objects. However, current standard camera evaluation metrics often fail to consider the specific application context. These metrics typically focus on measures like Chamfer distance (CD) or Earth Mover's Distance (EMD), which may not directly translate to performance in real-world scenarios. To address this limitation, we propose a novel metric for point cloud evaluation, specifically designed to assess the suitability of 3D cameras for the critical task of collision avoidance. This metric incorporates application-specific considerations and provides a more accurate measure of a camera's effectiveness in ensuring safe robot navigation. The source code is available at https://github.com/intrinsic-ai/collision-avoidance-metric. △ Less

Submitted 8 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.01049 [pdf, other]

A Novel Sector-Based Algorithm for an Optimized Star-Galaxy Classification

Authors: Anumanchi Agastya Sai Ram Likhit, Divyansh Tripathi, Akshay Agarwal

Abstract: This paper introduces a novel sector-based methodology for star-galaxy classification, leveraging the latest Sloan Digital Sky Survey data (SDSS-DR18). By strategically segmenting the sky into sectors aligned with SDSS observational patterns and employing a dedicated convolutional neural network (CNN), we achieve state-of-the-art performance for star galaxy classification. Our preliminary results… ▽ More This paper introduces a novel sector-based methodology for star-galaxy classification, leveraging the latest Sloan Digital Sky Survey data (SDSS-DR18). By strategically segmenting the sky into sectors aligned with SDSS observational patterns and employing a dedicated convolutional neural network (CNN), we achieve state-of-the-art performance for star galaxy classification. Our preliminary results demonstrate a promising pathway for efficient and precise astronomical analysis, especially in real-time observational settings. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Journal ref: The Second Tiny Papers Track at ICLR 2024

arXiv:2401.00287 [pdf, other]

The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

Authors: Neeraj Varshney, Pavel Dolin, Agastya Seth, Chitta Baral

Abstract: As Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications, their safety concerns become critical areas of NLP research. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark: a collection of diverse safe and unsafe prompts with carefully designed evaluation methods that facilitate systematic evaluation, comparison, and ana… ▽ More As Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications, their safety concerns become critical areas of NLP research. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark: a collection of diverse safe and unsafe prompts with carefully designed evaluation methods that facilitate systematic evaluation, comparison, and analysis over 'safety' and 'over-defensiveness.' With SODE, we study a variety of LLM defense strategies over multiple state-of-the-art LLMs, which reveals several interesting and important findings, such as (a) the widely popular 'self-checking' techniques indeed improve the safety against unsafe inputs, but this comes at the cost of extreme over-defensiveness on the safe inputs, (b) providing a safety instruction along with in-context exemplars (of both safe and unsafe inputs) consistently improves safety and also mitigates undue over-defensiveness of the models, (c) providing contextual knowledge easily breaks the safety guardrails and makes the models more vulnerable to generating unsafe responses. Overall, our work reveals numerous such critical findings that we believe will pave the way and facilitate further research in improving the safety of LLMs. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2308.02233 [pdf, other]

Self-Normalizing Neural Network, Enabling One Shot Transfer Learning for Modeling EDFA Wavelength Dependent Gain

Authors: Agastya Raj, Zehao Wang, Frank Slyne, Tingjun Chen, Dan Kilper, Marco Ruffini

Abstract: We present a novel ML framework for modeling the wavelength-dependent gain of multiple EDFAs, based on semi-supervised, self-normalizing neural networks, enabling one-shot transfer learning. Our experiments on 22 EDFAs in Open Ireland and COSMOS testbeds show high-accuracy transfer-learning even when operated across different amplifier types. We present a novel ML framework for modeling the wavelength-dependent gain of multiple EDFAs, based on semi-supervised, self-normalizing neural networks, enabling one-shot transfer learning. Our experiments on 22 EDFAs in Open Ireland and COSMOS testbeds show high-accuracy transfer-learning even when operated across different amplifier types. △ Less

Submitted 21 October, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: This paper is a preprint of a paper submitted to ECOC 2023 and is subject to Institution of Engineering and Technology Copyright. If accepted, the copy of record will be available at IET Digital Library

arXiv:2207.12724 [pdf]

An Automated News Bias Classifier Using Caenorhabditis Elegans Inspired Recursive Feedback Network Architecture

Authors: Agastya Sridharan, Natarajan S

Abstract: Traditional approaches to classify the political bias of news articles have failed to generate accurate, generalizable results. Existing networks premised on CNNs and DNNs lack a model to identify and extrapolate subtle indicators of bias like word choice, context, and presentation. In this paper, we propose a network architecture that achieves human-level accuracy in assigning bias classification… ▽ More Traditional approaches to classify the political bias of news articles have failed to generate accurate, generalizable results. Existing networks premised on CNNs and DNNs lack a model to identify and extrapolate subtle indicators of bias like word choice, context, and presentation. In this paper, we propose a network architecture that achieves human-level accuracy in assigning bias classifications to articles. The underlying model is based on a novel Mesh Neural Network (MNN),this structure enables feedback and feedforward synaptic connections between any two neurons in the mesh. The MNN ontains six network configurations that utilize Bernoulli based random sampling, pre-trained DNNs, and a network modelled after the C. Elegans nematode. The model is trained on over ten-thousand articles scraped from AllSides.com which are labelled to indicate political bias. The parameters of the network are then evolved using a genetic algorithm suited to the feedback neural structure. Finally, the best performing model is applied to five popular news sources in the United States over a fifty-day trial to quantify political biases in the articles they display. We hope our project can spur research into biological solutions for NLP tasks and provide accurate tools for citizens to understand subtle biases in the articles they consume. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: The paper is under review for AACL-IJCNLP

arXiv:2112.07499 [pdf, other]

Reconfiguring Shortest Paths in Graphs

Authors: Kshitij Gajjar, Agastya Vibhuti Jha, Manish Kumar, Abhiruk Lahiri

Abstract: Reconfiguring two shortest paths in a graph means modifying one shortest path to the other by changing one vertex at a time so that all the intermediate paths are also shortest paths. This problem has several natural applications, namely: (a) revamping road networks, (b) rerouting data packets in synchronous multiprocessing setting, (c) the shipping container stowage problem, and (d) the train mar… ▽ More Reconfiguring two shortest paths in a graph means modifying one shortest path to the other by changing one vertex at a time so that all the intermediate paths are also shortest paths. This problem has several natural applications, namely: (a) revamping road networks, (b) rerouting data packets in synchronous multiprocessing setting, (c) the shipping container stowage problem, and (d) the train marshalling problem. When modelled as graph problems, (a) is the most general case while (b), (c) and (d) are restrictions to different graph classes. We show that (a) is intractable, even for relaxed variants of the problem. For (b), (c) and (d), we present efficient algorithms to solve the respective problems. We also generalize the problem to when at most $k$ (for a fixed integer $k\geq 2$) contiguous vertices on a shortest path can be changed at a time. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 28 pages, 14 figures. To be presented at AAAI 2022

MSC Class: 68Q25; 05C85; 68T99 ACM Class: F.2.2

arXiv:2109.13488 [pdf, other]

Towards Rotation Invariance in Object Detection

Authors: Agastya Kalra, Guy Stoppi, Bradley Brown, Rishav Agarwal, Achuta Kadambi

Abstract: Rotation augmentations generally improve a model's invariance/equivariance to rotation - except in object detection. In object detection the shape is not known, therefore rotation creates a label ambiguity. We show that the de-facto method for bounding box label rotation, the Largest Box Method, creates very large labels, leading to poor performance and in many cases worse performance than using n… ▽ More Rotation augmentations generally improve a model's invariance/equivariance to rotation - except in object detection. In object detection the shape is not known, therefore rotation creates a label ambiguity. We show that the de-facto method for bounding box label rotation, the Largest Box Method, creates very large labels, leading to poor performance and in many cases worse performance than using no rotation at all. We propose a new method of rotation augmentation that can be implemented in a few lines of code. First, we create a differentiable approximation of label accuracy and show that axis-aligning the bounding box around an ellipse is optimal. We then introduce Rotation Uncertainty (RU) Loss, allowing the model to adapt to the uncertainty of the labels. On five different datasets (including COCO, PascalVOC, and Transparent Object Bin Picking), this approach improves the rotational invariance of both one-stage and two-stage architectures when measured with AP, AP50, and AP75. The code is available at https://github.com/akasha-imaging/ICCV2021. △ Less

Submitted 30 September, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: Accepted ICCV 2021

arXiv:2108.05484 [pdf, other]

Self-supervised Contrastive Learning for Irrigation Detection in Satellite Imagery

Authors: Chitra Agastya, Sirak Ghebremusse, Ian Anderson, Colorado Reed, Hossein Vahabi, Alberto Todeschini

Abstract: Climate change has caused reductions in river runoffs and aquifer recharge resulting in an increasingly unsustainable crop water demand from reduced freshwater availability. Achieving food security while deploying water in a sustainable manner will continue to be a major challenge necessitating careful monitoring and tracking of agricultural water usage. Historically, monitoring water usage has be… ▽ More Climate change has caused reductions in river runoffs and aquifer recharge resulting in an increasingly unsustainable crop water demand from reduced freshwater availability. Achieving food security while deploying water in a sustainable manner will continue to be a major challenge necessitating careful monitoring and tracking of agricultural water usage. Historically, monitoring water usage has been a slow and expensive manual process with many imperfections and abuses. Ma-chine learning and remote sensing developments have increased the ability to automatically monitor irrigation patterns, but existing techniques often require curated and labelled irrigation data, which are expensive and time consuming to obtain and may not exist for impactful areas such as developing countries. In this paper, we explore an end-to-end real world application of irrigation detection with uncurated and unlabeled satellite imagery. We apply state-of-the-art self-supervised deep learning techniques to optical remote sensing data, and find that we are able to detect irrigation with up to nine times better precision, 90% better recall and 40% more generalization ability than the traditional supervised learning methods. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2103.02843 [pdf]

doi 10.1098/rsfs.2021.0018

Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers

Authors: Agastya P. Bhati, Shunzhou Wan, Dario Alfè, Austin R. Clyde, Mathis Bode, Li Tan, Mikhail Titov, Andre Merzky, Matteo Turilli, Shantenu Jha, Roger R. Highfield, Walter Rocchia, Nicola Scafuri, Sauro Succi, Dieter Kranzlmüller, Gerald Mathias, David Wifling, Yann Donon, Alberto Di Meglio, Sofia Vallecorsa, Heng Ma, Anda Trifan, Arvind Ramanathan, Tom Brettin, Alexander Partin , et al. (4 additional authors not shown)

Abstract: The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods… ▽ More The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers. △ Less

Submitted 4 September, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Journal ref: Interface Focus. 2021. 11 (6): 20210018

arXiv:2010.10517 [pdf, other]

Scalable HPC and AI Infrastructure for COVID-19 Therapeutics

Authors: Hyungro Lee, Andre Merzky, Li Tan, Mikhail Titov, Matteo Turilli, Dario Alfe, Agastya Bhati, Alex Brace, Austin Clyde, Peter Coveney, Heng Ma, Arvind Ramanathan, Rick Stevens, Anda Trifan, Hubertus Van Dam, Shunzhou Wan, Sean Wilkinson, Shantenu Jha

Abstract: COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics dev… ▽ More COVID-19 has claimed more 1 million lives and resulted in over 40 million infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. In response, the DOE recently established the Medical Therapeutics project as part of the National Virtual Biotechnology Laboratory, and tasked it with creating the computational infrastructure and methods necessary to advance therapeutics development. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intelligence and simulation-based approaches, and the design of computational infrastructure to support these methods at scale. We discuss their implementation and characterize their performance, and highlight science advances that these capabilities have enabled. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2010.06574 [pdf, other]

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

Authors: Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin , et al. (11 additional authors not shown)

Abstract: The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating… ▽ More The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating the entire process. No single methodological approach can achieve the necessary accuracy with required efficiency. Here we describe multiple algorithmic innovations to overcome this fundamental limitation, development and deployment of computational infrastructure at scale integrates multiple artificial intelligence and simulation-based approaches. Three measures of performance are:(i) throughput, the number of ligands per unit time; (ii) scientific performance, the number of effective ligands sampled per unit time and (iii) peak performance, in flop/s. The capabilities outlined here have been used in production for several months as the workhorse of the computational infrastructure to support the capabilities of the US-DOE National Virtual Biotechnology Laboratory in combination with resources from the EU Centre of Excellence in Computational Biomedicine. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:2001.03194 [pdf, other]

MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection

Authors: Abdullah Rashwan, Rishav Agarwal, Agastya Kalra, Pascal Poupart

Abstract: We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and… ▽ More We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and regress the top-left and bottom-right corners. Second, we use MatrixNets for corner-based object detection by predicting top-left and bottom-right corners. Each corner predicts the center location of the object. We also enhance corner-based detection by replacing the embedding layer with center regression. Our final architecture achieves mAP of 47.8 on MS COCO, which is higher than its CornerNet counterpart by +5.6 mAP while also closing the gap between single-stage and two-stage detectors. The code is available at https://github.com/arashwan/matrixnet. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: This is the full paper for arXiv:1908.04646 with more applications, experiments, and ablation study

arXiv:1908.04646 [pdf, other]

Matrix Nets: A New Deep Architecture for Object Detection

Authors: Abdullah Rashwan, Agastya Kalra, Pascal Poupart

Abstract: We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP… ▽ More We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture. △ Less

Submitted 14 August, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: Short paper, stay tuned for the full paper!

arXiv:1904.07435 [pdf, other]

Photofeeler-D3: A Neural Network with Voter Modeling for Dating Photo Impression Prediction

Authors: Agastya Kalra, Ben Peterson

Abstract: In just a few years, online dating has become the dominant way that young people meet to date, making the deceptively error-prone task of picking good dating profile photos vital to a generation's ability to form romantic connections. Until now, artificial intelligence approaches to Dating Photo Impression Prediction (DPIP) have been very inaccurate, unadaptable to real-world application, and have… ▽ More In just a few years, online dating has become the dominant way that young people meet to date, making the deceptively error-prone task of picking good dating profile photos vital to a generation's ability to form romantic connections. Until now, artificial intelligence approaches to Dating Photo Impression Prediction (DPIP) have been very inaccurate, unadaptable to real-world application, and have only taken into account a subject's physical attractiveness. To that effect, we propose Photofeeler-D3 - the first convolutional neural network as accurate as 10 human votes for how smart, trustworthy, and attractive the subject appears in highly variable dating photos. Our "attractive" output is also applicable to Facial Beauty Prediction (FBP), making Photofeeler-D3 state-of-the-art for both DPIP and FBP. We achieve this by leveraging Photofeeler's Dating Dataset (PDD) with over 1 million images and tens of millions of votes, our novel technique of voter modeling, and cutting-edge computer vision techniques. △ Less

Submitted 10 May, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: 10 pages, 3 figures, 5 tables

arXiv:1701.05265 [pdf, other]

Online Structure Learning for Sum-Product Networks with Gaussian Leaves

Authors: Wilson Hsu, Agastya Kalra, Pascal Poupart

Abstract: Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a… ▽ More Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes the first online structure learning technique for continuous SPNs with Gaussian leaves. We also introduce an accompanying new parameter learning technique. △ Less

Submitted 18 January, 2017; originally announced January 2017.

arXiv:1512.02194 [pdf, other]

doi 10.1016/j.cpc.2016.05.020

FabSim: facilitating computational research through automation on large-scale and distributed e-infrastructures

Authors: Derek Groen, Agastya Bhati, James Suter, James Hetherington, Stefan Zasada, Peter Coveney

Abstract: We present FabSim, a toolkit developed to simplify a range of computational tasks for researchers in diverse disciplines. FabSim is flexible, adaptable, and allows users to perform a wide range of tasks with ease. It also provides a systematic way to automate the use of resourcess, including HPC and distributed resources, and to make tasks easier to repeat by recording contextual information. To d… ▽ More We present FabSim, a toolkit developed to simplify a range of computational tasks for researchers in diverse disciplines. FabSim is flexible, adaptable, and allows users to perform a wide range of tasks with ease. It also provides a systematic way to automate the use of resourcess, including HPC and distributed resources, and to make tasks easier to repeat by recording contextual information. To demonstrate this, we present three use cases where FabSim has enhanced our research productivity. These include simulating cerebrovascular bloodflow, modelling clay-polymer nanocomposites across multiple scales, and calculating ligand-protein binding affinities. △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: 29 pages, 8 figures, 2 tables, submitted

Showing 1–25 of 25 results for author: Agastya