-
Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph
Authors:
Akash Vishwakarma,
Hojin Lee,
Mohith Suresh,
Priyam Shankar Sharma,
Rahul Vishwakarma,
Sparsh Gupta,
Yuvraj Anupam Chauhan
Abstract:
The emergence of capable large language model (LLM) based agents necessitates memory architectures that transcend mere data storage, enabling continuous learning, nuanced reasoning, and dynamic adaptation. Current memory systems often grapple with fundamental limitations in structural flexibility, temporal awareness, and the ability to synthesize higher-level insights from raw interaction data. Th…
▽ More
The emergence of capable large language model (LLM) based agents necessitates memory architectures that transcend mere data storage, enabling continuous learning, nuanced reasoning, and dynamic adaptation. Current memory systems often grapple with fundamental limitations in structural flexibility, temporal awareness, and the ability to synthesize higher-level insights from raw interaction data. This paper introduces Cognitive Weave, a novel memory framework centered around a multi-layered spatio-temporal resonance graph (STRG). This graph manages information as semantically rich insight particles (IPs), which are dynamically enriched with resonance keys, signifiers, and situational imprints via a dedicated semantic oracle interface (SOI). These IPs are interconnected through typed relational strands, forming an evolving knowledge tapestry. A key component of Cognitive Weave is the cognitive refinement process, an autonomous mechanism that includes the synthesis of insight aggregates (IAs) condensed, higher-level knowledge structures derived from identified clusters of related IPs. We present comprehensive experimental results demonstrating Cognitive Weave's marked enhancement over existing approaches in long-horizon planning tasks, evolving question-answering scenarios, and multi-session dialogue coherence. The system achieves a notable 34% average improvement in task completion rates and a 42% reduction in mean query latency when compared to state-of-the-art baselines. Furthermore, this paper explores the ethical considerations inherent in such advanced memory systems, discusses the implications for long-term memory in LLMs, and outlines promising future research trajectories.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Billions at Stake: How Self-Citation Adjusted Metrics Can Transform Equitable Research Funding
Authors:
Rahul Vishwakarma,
Sinchan Banerjee
Abstract:
Citation metrics serve as the cornerstone of scholarly impact evaluation despite their well-documented vulnerability to inflation through self-citation practices. This paper introduces the Self-Citation Adjusted Index (SCAI), a sophisticated metric designed to recalibrate citation counts by accounting for discipline-specific self-citation patterns. Through comprehensive analysis of 5,000 researche…
▽ More
Citation metrics serve as the cornerstone of scholarly impact evaluation despite their well-documented vulnerability to inflation through self-citation practices. This paper introduces the Self-Citation Adjusted Index (SCAI), a sophisticated metric designed to recalibrate citation counts by accounting for discipline-specific self-citation patterns. Through comprehensive analysis of 5,000 researcher profiles across diverse disciplines, we demonstrate that excessive self-citation inflates traditional metrics by 10-20%, potentially misdirecting billions in research funding. Recent studies confirm that self-citation patterns exhibit significant gender disparities, with men self-citing up to 70% more frequently than women, exacerbating existing inequalities in academic recognition. Our open-source implementation provides comprehensive tools for calculating SCAI and related metrics, offering a more equitable assessment of research impact that reduces the gender citation gap by approximately 8.5%. This work contributes to the paradigm shift toward transparent, nuanced, and equitable research evaluation methodologies in academia, with direct implications for funding allocation decisions that collectively amount to over $100 billion annually in the United States alone.
△ Less
Submitted 11 May, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Statistical Guarantees in Synthetic Data through Conformal Adversarial Generation
Authors:
Rahul Vishwakarma,
Shrey Dharmendra Modi,
Vishwanath Seshagiri
Abstract:
The generation of high-quality synthetic data presents significant challenges in machine learning research, particularly regarding statistical fidelity and uncertainty quantification. Existing generative models produce compelling synthetic samples but lack rigorous statistical guarantees about their relation to the underlying data distribution, limiting their applicability in critical domains requ…
▽ More
The generation of high-quality synthetic data presents significant challenges in machine learning research, particularly regarding statistical fidelity and uncertainty quantification. Existing generative models produce compelling synthetic samples but lack rigorous statistical guarantees about their relation to the underlying data distribution, limiting their applicability in critical domains requiring robust error bounds. We address this fundamental limitation by presenting a novel framework that incorporates conformal prediction methodologies into Generative Adversarial Networks (GANs). By integrating multiple conformal prediction paradigms including Inductive Conformal Prediction (ICP), Mondrian Conformal Prediction, Cross-Conformal Prediction, and Venn-Abers Predictors, we establish distribution-free uncertainty quantification in generated samples. This approach, termed Conformalized GAN (cGAN), demonstrates enhanced calibration properties while maintaining the generative power of traditional GANs, producing synthetic data with provable statistical guarantees. We provide rigorous mathematical proofs establishing finite-sample validity guarantees and asymptotic efficiency properties, enabling the reliable application of synthetic data in high-stakes domains including healthcare, finance, and autonomous systems.
△ Less
Submitted 11 May, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
PieGlyph: An R package for creating axis invariant pie-glyphs for 2d plots
Authors:
Rishabh Vishwakarma,
Caroline Brophy,
Catherine Hurley
Abstract:
Effective visualisation of multidimensional data is crucial for generating insights. Glyph-based visualisations, which encode data dimensions onto multiple visual channels such as colour, shape, and size, provide an effective means of representing complex datasets. Pie-chart glyphs (pie-glyphs) are one such approach, where multiple data attributes are mapped to slices within a pie chart. This pape…
▽ More
Effective visualisation of multidimensional data is crucial for generating insights. Glyph-based visualisations, which encode data dimensions onto multiple visual channels such as colour, shape, and size, provide an effective means of representing complex datasets. Pie-chart glyphs (pie-glyphs) are one such approach, where multiple data attributes are mapped to slices within a pie chart. This paper introduces the PieGlyph R package, which enables users to overlay any 2D plot with axis-invariant pie-glyphs, offering a compact and intuitive representation of multidimensional data. Unlike existing R packages such as scatterpie or ggforce, PieGlyph generates pie-glyphs independently of the plot axes by employing a nested coordinate system, ensuring they remain circular regardless of changes to the underlying coordinate system. This enhances interpretability, particularly in when visualising spatial data, as users can select the most appropriate map projection without distorting the glyphs' shape. Pie-glyphs are also particularly well-suited for visualising compositional data, where there is a natural sum-to-one constraint on the data attributes. PieGlyph is developed under the Grammar of Graphics paradigm using the ggplot2 framework and supports the generation of interactive pie-glyphs through the ggiraph package. Designed to integrate seamlessly with all features and extensions offered by ggplot2 and ggiraph, PieGlyph provides users with full flexibility in customising every aspect of the visualisation. This paper outlines the conceptual framework of PieGlyph, compares it with existing alternatives, and demonstrates its applications through example visualisations.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Uncertainty-Aware Hardware Trojan Detection Using Multimodal Deep Learning
Authors:
Rahul Vishwakarma,
Amin Rezaei
Abstract:
The risk of hardware Trojans being inserted at various stages of chip production has increased in a zero-trust fabless era. To counter this, various machine learning solutions have been developed for the detection of hardware Trojans. While most of the focus has been on either a statistical or deep learning approach, the limited number of Trojan-infected benchmarks affects the detection accuracy a…
▽ More
The risk of hardware Trojans being inserted at various stages of chip production has increased in a zero-trust fabless era. To counter this, various machine learning solutions have been developed for the detection of hardware Trojans. While most of the focus has been on either a statistical or deep learning approach, the limited number of Trojan-infected benchmarks affects the detection accuracy and restricts the possibility of detecting zero-day Trojans. To close the gap, we first employ generative adversarial networks to amplify our data in two alternative representation modalities, a graph and a tabular, ensuring that the dataset is distributed in a representative manner. Further, we propose a multimodal deep learning approach to detect hardware Trojans and evaluate the results from both early fusion and late fusion strategies. We also estimate the uncertainty quantification metrics of each prediction for risk-aware decision-making. The outcomes not only confirms the efficacy of our proposed hardware Trojan detection method but also opens a new door for future studies employing multimodality and uncertainty quantification to address other hardware security challenges.
△ Less
Submitted 23 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method
Authors:
Rahul Vishwakarma,
Subhankar Mishra
Abstract:
Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work toget…
▽ More
Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work together to complete the proof. In this paper, we introduce DS-Prover, a novel dynamic sampling method for theorem proving. This method dynamically determines the number of tactics to apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem. This makes the proof search process more efficient by adjusting the balance between exploration and exploitation as time passes. We also augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises. This gives the model more examples to learn from and helps it to predict the tactics with premises more accurately. We perform our experiments using the Mathlib dataset of the Lean theorem prover and report the performance on two standard datasets, MiniF2F and ProofNet. Our methods achieve significant performance gains on both datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the best-reported Pass@1 of 29.6% using Lean.
△ Less
Submitted 15 February, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
IndoorGNN: A Graph Neural Network based approach for Indoor Localization using WiFi RSSI
Authors:
Rahul Vishwakarma,
Rucha Bhalchandra Joshi,
Subhankar Mishra
Abstract:
Indoor localization is the process of determining the location of a person or object inside a building. Potential usage of indoor localization includes navigation, personalization, safety and security, and asset tracking. Commonly used technologies for indoor localization include WiFi, Bluetooth, RFID, and Ultra-wideband. Among these, WiFi's Received Signal Strength Indicator (RSSI)-based localiza…
▽ More
Indoor localization is the process of determining the location of a person or object inside a building. Potential usage of indoor localization includes navigation, personalization, safety and security, and asset tracking. Commonly used technologies for indoor localization include WiFi, Bluetooth, RFID, and Ultra-wideband. Among these, WiFi's Received Signal Strength Indicator (RSSI)-based localization is preferred because of widely available WiFi Access Points (APs). We have two main contributions. First, we develop our method, 'IndoorGNN' which involves using a Graph Neural Network (GNN) based algorithm in a supervised manner to classify a specific location into a particular region based on the RSSI values collected at that location. Most of the ML algorithms that perform this classification require a large number of labeled data points (RSSI vectors with location information). Collecting such data points is a labor-intensive and time-consuming task. To overcome this challenge, as our second contribution, we demonstrate the performance of IndoorGNN on the restricted dataset. It shows a comparable prediction accuracy to that of the complete dataset. We performed experiments on the UJIIndoorLoc and MNAV datasets, which are real-world standard indoor localization datasets. Our experiments show that IndoorGNN gives better location prediction accuracies when compared with state-of-the-art existing conventional as well as GNN-based methods for this same task. It continues to outperform these algorithms even with restricted datasets. It is noteworthy that its performance does not decrease a lot with a decrease in the number of available data points. Our method can be utilized for navigation and wayfinding in complex indoor environments, asset tracking and building management, enhancing mobile applications with location-based services, and improving safety and security during emergencies.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Risk-Aware and Explainable Framework for Ensuring Guaranteed Coverage in Evolving Hardware Trojan Detection
Authors:
Rahul Vishwakarma,
Amin Rezaei
Abstract:
As the semiconductor industry has shifted to a fabless paradigm, the risk of hardware Trojans being inserted at various stages of production has also increased. Recently, there has been a growing trend toward the use of machine learning solutions to detect hardware Trojans more effectively, with a focus on the accuracy of the model as an evaluation metric. However, in a high-risk and sensitive dom…
▽ More
As the semiconductor industry has shifted to a fabless paradigm, the risk of hardware Trojans being inserted at various stages of production has also increased. Recently, there has been a growing trend toward the use of machine learning solutions to detect hardware Trojans more effectively, with a focus on the accuracy of the model as an evaluation metric. However, in a high-risk and sensitive domain, we cannot accept even a small misclassification. Additionally, it is unrealistic to expect an ideal model, especially when Trojans evolve over time. Therefore, we need metrics to assess the trustworthiness of detected Trojans and a mechanism to simulate unseen ones. In this paper, we generate evolving hardware Trojans using our proposed novel conformalized generative adversarial networks and offer an efficient approach to detecting them based on a non-invasive algorithm-agnostic statistical inference framework that leverages the Mondrian conformal predictor. The method acts like a wrapper over any of the machine learning models and produces set predictions along with uncertainty quantification for each new detected Trojan for more robust decision-making. In the case of a NULL set, a novel method to reject the decision by providing a calibrated explainability is discussed. The proposed approach has been validated on both synthetic and real chip-level benchmarks and proven to pave the way for researchers looking to find informed machine learning solutions to hardware security problems.
△ Less
Submitted 13 October, 2023;
originally announced December 2023.
-
Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors
Authors:
Rahul Vishwakarma,
Jinha Hwang,
Soundouss Messoudi,
Ava Hedayatipour
Abstract:
Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the…
▽ More
Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the significant time and energy consumption involved. To address these issues, we propose a selective disk scrubbing method that enhances the overall reliability and power efficiency in data centers. Our method employs a Machine Learning model based on Mondrian Conformal prediction to identify specific disks for scrubbing, by proactively predicting the health status of each disk in the storage pool, forecasting n-days in advance, and using an open-source dataset. For disks predicted as non-healthy, we mark them for replacement without further action. For healthy drives, we create a set and quantify their relative health across the entire storage pool based on the predictor's confidence. This enables us to prioritize selective scrubbing for drives with established scrubbing frequency based on the scrub cycle. The method we propose provides an efficient and dependable solution for managing enterprise disk drives. By scrubbing just 22.7% of the total storage disks, we can achieve optimized energy consumption and reduce the carbon footprint of the data center.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Attacks on Continuous Chaos Communication and Remedies for Resource Limited Devices
Authors:
Rahul Vishwakarma,
Ravi Monani,
Amin Rezaei,
Hossein Sayadi,
Mehrdad Aliasgari,
Ava Hedayatipour
Abstract:
The Global Wearable market is anticipated to rise at a considerable rate in the next coming years and communication is a fundamental block in any wearable device. In communication, encryption methods are being used with the aid of microcontrollers or software implementations, which are power-consuming and incorporate complex hardware implementation. Internet of Things (IoT) devices are considered…
▽ More
The Global Wearable market is anticipated to rise at a considerable rate in the next coming years and communication is a fundamental block in any wearable device. In communication, encryption methods are being used with the aid of microcontrollers or software implementations, which are power-consuming and incorporate complex hardware implementation. Internet of Things (IoT) devices are considered as resource-constrained devices that are expected to operate with low computational power and resource utilization criteria. At the same time, recent research has shown that IoT devices are highly vulnerable to emerging security threats, which elevates the need for low-power and small-size hardware-based security countermeasures. Chaotic encryption is a method of data encryption that utilizes chaotic systems and non-linear dynamics to generate secure encryption keys. It aims to provide high-level security by creating encryption keys that are sensitive to initial conditions and difficult to predict, making it challenging for unauthorized parties to intercept and decode encrypted data. Since the discovery of chaotic equations, there have been various encryption applications associated with them. In this paper, we comprehensively analyze the physical and encryption attacks on continuous chaotic systems in resource-constrained devices and their potential remedies. To this aim, we introduce different categories of attacks of chaotic encryption. Our experiments focus on chaotic equations implemented using Chua's equation and leverages circuit architectures and provide simulations proof of remedies for different attacks. These remedies are provided to block the attackers from stealing users' information (e.g., a pulse message) with negligible cost to the power and area of the design.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Hunting Group Clues with Transformers for Social Group Activity Recognition
Authors:
Masato Tamura,
Rahul Vishwakarma,
Ravigopal Vennelakanti
Abstract:
This paper presents a novel framework for social group activity recognition. As an expanded task of group activity recognition, social group activity recognition requires recognizing multiple sub-group activities and identifying group members. Most existing methods tackle both tasks by refining region features and then summarizing them into activity features. Such heuristic feature design renders…
▽ More
This paper presents a novel framework for social group activity recognition. As an expanded task of group activity recognition, social group activity recognition requires recognizing multiple sub-group activities and identifying group members. Most existing methods tackle both tasks by refining region features and then summarizing them into activity features. Such heuristic feature design renders the effectiveness of features susceptible to incomplete person localization and disregards the importance of scene contexts. Furthermore, region features are sub-optimal to identify group members because the features may be dominated by those of people in the regions and have different semantics. To overcome these drawbacks, we propose to leverage attention modules in transformers to generate effective social group features. Our method is designed in such a way that the attention modules identify and then aggregate features relevant to social group activities, generating an effective feature for each social group. Group member information is embedded into the features and thus accessed by feed-forward networks. The outputs of feed-forward networks represent groups so concisely that group members can be identified with simple Hungarian matching between groups and individuals. Experimental results show that our method outperforms state-of-the-art methods on the Volleyball and Collective Activity datasets.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
CNN Model & Tuning for Global Road Damage Detection
Authors:
Rahul Vishwakarma,
Ravigopal Vennelakanti
Abstract:
This paper provides a report on our solution including model selection, tuning strategy and results obtained for Global Road Damage Detection Challenge. This Big Data Cup Challenge was held as a part of IEEE International Conference on Big Data 2020. We assess single and multi-stage network architectures for object detection and provide a benchmark using popular state-of-the-art open-source PyTorc…
▽ More
This paper provides a report on our solution including model selection, tuning strategy and results obtained for Global Road Damage Detection Challenge. This Big Data Cup Challenge was held as a part of IEEE International Conference on Big Data 2020. We assess single and multi-stage network architectures for object detection and provide a benchmark using popular state-of-the-art open-source PyTorch frameworks like Detectron2 and Yolov5. Data preparation for provided Road Damage training dataset, captured using smartphone camera from Czech, India and Japan is discussed. We studied the effect of training on a per country basis with respect to a single generalizable model. We briefly describe the tuning strategy for the experiments conducted on two-stage Faster R-CNN with Deep Residual Network (Resnet) and Feature Pyramid Network (FPN) backbone. Additionally, we compare this to a one-stage Yolov5 model with Cross Stage Partial Network (CSPNet) backbone. We show a mean F1 score of 0.542 on Test2 and 0.536 on Test1 datasets using a multi-stage Faster R-CNN model, with Resnet-50 and Resnet-101 backbones respectively. This shows the generalizability of the Resnet-50 model when compared to its more complex counterparts. Experiments were conducted using Google Colab having K80 and a Linux PC with 1080Ti, NVIDIA consumer grade GPU. A PyTorch based Detectron2 code to preprocess, train, test and submit the Avg F1 score to is made available at https://github.com/vishwakarmarhl/rdd2020
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Exploring the role and nature of interactions between Institutes in a Local Affiliation Network
Authors:
Chakresh Kumar Singh,
Ravi Vishwakarma,
Shivakumar Jolad
Abstract:
In this work, we have studied the collaboration and citation network between Indian Institutes from publications in American Physical Society(APS) journals between 1970-2013. We investigate the role of geographic proximity on the network structure and find that it is the characteristics of the Institution, rather than the geographic distance, that plays a dominant role in collaboration networks. W…
▽ More
In this work, we have studied the collaboration and citation network between Indian Institutes from publications in American Physical Society(APS) journals between 1970-2013. We investigate the role of geographic proximity on the network structure and find that it is the characteristics of the Institution, rather than the geographic distance, that plays a dominant role in collaboration networks. We find that Institutions with better federal funding dominate the network topology and play a crucial role in overall research output. We find that the citation flow across different categories of institutions is strongly linked to the collaborations between them. We have estimated the knowledge flow in and out of Institutions and identified the top knowledge source and sinks.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.