Skip to main content

Showing 1–28 of 28 results for author: Aakur, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.17516  [pdf, ps, other

    cs.RO cs.CV

    EASE: Embodied Active Event Perception via Self-Supervised Energy Minimization

    Authors: Zhou Chen, Sanjoy Kundu, Harsimran S. Baweja, Sathyanarayanan N. Aakur

    Abstract: Active event perception, the ability to dynamically detect, track, and summarize events in real time, is essential for embodied intelligence in tasks such as human-AI collaboration, assistive robotics, and autonomous navigation. However, existing approaches often depend on predefined action spaces, annotated datasets, and extrinsic rewards, limiting their adaptability and scalability in dynamic, r… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE Robotics and Automation Letters, 2025

  2. arXiv:2506.05651  [pdf, other

    cs.CV

    Hallucinate, Ground, Repeat: A Framework for Generalized Visual Relationship Detection

    Authors: Shanmukha Vellamcheti, Sanjoy Kundu, Sathyanarayanan N. Aakur

    Abstract: Understanding relationships between objects is central to visual intelligence, with applications in embodied AI, assistive systems, and scene understanding. Yet, most visual relationship detection (VRD) models rely on a fixed predicate set, limiting their generalization to novel interactions. A key challenge is the inability to visually ground semantically plausible, but unannotated, relationships… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 22 pages, 9 figures, 5 tables

  3. arXiv:2505.22858  [pdf, ps, other

    cs.CV

    A Probabilistic Jump-Diffusion Framework for Open-World Egocentric Activity Recognition

    Authors: Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan N. Aakur

    Abstract: Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitatio… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Extended abstract of arXiv:2504.03948 for CVPR 2025 EgoVis Workshop

  4. arXiv:2504.03948  [pdf, other

    cs.CV

    ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

    Authors: Sanjoy Kundu, Shanmukha Vellamchetti, Sathyanarayanan N. Aakur

    Abstract: Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitatio… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 17 pages, 6 figures, 3 tables. Under review

  5. arXiv:2502.19607   

    cs.CL

    Revisiting Word Embeddings in the LLM Era

    Authors: Yash Mahajan, Matthew Freestone, Sathyanarayanan Aakur, Santu Karmaker

    Abstract: Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models and use them for various inference tasks with promising results. However, it is still unclear whether the performance improvement of LLM-induced embeddings is me… ▽ More

    Submitted 1 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: This work was intended as a replacement of the older version, arXiv:2402.11094, and any subsequent updates will appear there

  6. arXiv:2406.14472  [pdf, other

    cs.CV

    Self-supervised Multi-actor Social Activity Understanding in Streaming Videos

    Authors: Shubham Trehan, Sathyanarayanan N. Aakur

    Abstract: This work addresses the problem of Social Activity Recognition (SAR), a critical component in real-world tasks like surveillance and assistive robotics. Unlike traditional event understanding approaches, SAR necessitates modeling individual actors' appearance and motions and contextualizing them within their social interactions. Traditional action localization methods fall short due to their singl… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 16 pages, 2 figures, 4 pages

  7. arXiv:2406.14456  [pdf, other

    cs.LG cs.CV

    Capturing Temporal Components for Time Series Classification

    Authors: Venkata Ragavendra Vavilthota, Ranjith Ramanathan, Sathyanarayanan N. Aakur

    Abstract: Analyzing sequential data is crucial in many domains, particularly due to the abundance of data collected from the Internet of Things paradigm. Time series classification, the task of categorizing sequential data, has gained prominence, with machine learning approaches demonstrating remarkable performance on public benchmark datasets. However, progress has primarily been in designing architectures… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages, 2 figures, 4 tables

  8. arXiv:2406.05722  [pdf, other

    cs.CV

    ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition

    Authors: Sanjoy Kundu, Shubham Trehan, Sathyanarayanan N. Aakur

    Abstract: Learning to infer labels in an open world, i.e., in an environment where the target "labels" are unknown, is an important characteristic for achieving autonomy. Foundation models pre-trained on enormous amounts of data have shown remarkable generalization skills through prompting, particularly in zero-shot inference. However, their performance is restricted to the correctness of the target label's… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Extended abstract of arXiv:2305.16602 for CVPR EgoVis Workshop

  9. arXiv:2402.11094  [pdf, other

    cs.CL

    Revisiting Word Embeddings in the LLM Era

    Authors: Yash Mahajan, Matthew Freestone, Naman Bansal, Sathyanarayanan Aakur, Shubhra Kanti Karmaker Santu

    Abstract: Large Language Models (LLMs) have recently shown remarkable advancement in various NLP tasks. As such, a popular trend has emerged lately where NLP researchers extract word/sentence/document embeddings from these large decoder-only models and use them for various inference tasks with promising results. However, it is still unclear whether the performance improvement of LLM-induced embeddings is me… ▽ More

    Submitted 1 March, 2025; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: This is an updated version of the older version: arXiv:2402.11094. We accidentally submitted this article as a new submission (arXiv:2502.19607), which we have requested to withdraw. This version has 30 pages and 22 figures

    ACM Class: I.2.7

  10. arXiv:2401.13219  [pdf, other

    q-bio.GN cs.AI cs.LG

    TEPI: Taxonomy-aware Embedding and Pseudo-Imaging for Scarcely-labeled Zero-shot Genome Classification

    Authors: Sathyanarayanan Aakur, Vishalini R. Laguduva, Priyadharsini Ramamurthy, Akhilesh Ramachandran

    Abstract: A species' genetic code or genome encodes valuable evolutionary, biological, and phylogenetic information that aids in species recognition, taxonomic classification, and understanding genetic predispositions like drug resistance and virulence. However, the vast number of potential species poses significant challenges in developing a general-purpose whole genome classification tool. Traditional bio… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE JBHI

  11. arXiv:2309.10210  [pdf, other

    eess.IV cs.CV

    ProtoKD: Learning from Extremely Scarce Data for Parasite Ova Recognition

    Authors: Shubham Trehan, Udhav Ramachandran, Ruth Scimeca, Sathyanarayanan N. Aakur

    Abstract: Developing reliable computational frameworks for early parasite detection, particularly at the ova (or egg) stage is crucial for advancing healthcare and effectively managing potential public health crises. While deep learning has significantly assisted human workers in various tasks, its application and diagnostics has been constrained by the need for extensive datasets. The ability to learn from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: To Appear at IEEE ICMLA 2023

  12. arXiv:2308.06869  [pdf, other

    cs.CV stat.ML

    Shape-Graph Matching Network (SGM-net): Registration for Statistical Shape Analysis

    Authors: Shenyuan Liang, Mauricio Pamplona Segundo, Sathyanarayanan N. Aakur, Sudeep Sarkar, Anuj Srivastava

    Abstract: This paper focuses on the statistical analysis of shapes of data objects called shape graphs, a set of nodes connected by articulated curves with arbitrary shapes. A critical need here is a constrained registration of points (nodes to nodes, edges to edges) across objects. This, in turn, requires optimization over the permutation group, made challenging by differences in nodes (in terms of numbers… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

  13. arXiv:2305.16602  [pdf, other

    cs.CV

    Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning

    Authors: Sanjoy Kundu, Shubham Trehan, Sathyanarayanan N. Aakur

    Abstract: Learning to infer labels in an open world, i.e., in an environment where the target ``labels'' are unknown, is an important characteristic for achieving autonomy. Foundation models, pre-trained on enormous amounts of data, have shown remarkable generalization skills through prompting, particularly in zero-shot inference. However, their performance is restricted to the correctness of the target lab… ▽ More

    Submitted 3 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 25 Pages, 4 figures, 3 tables

  14. arXiv:2212.00015  [pdf, other

    cs.LG q-bio.GN

    Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning

    Authors: Sai Narayanan, Sathyanarayanan N. Aakur, Priyadharsini Ramamurthy, Arunkumar Bagavathi, Vishalini Ramnath, Akhilesh Ramachandran

    Abstract: Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure a… ▽ More

    Submitted 29 November, 2022; originally announced December 2022.

    Comments: Expanded version of arXiv:2111.08001. Under review in a journal

  15. arXiv:2211.16636  [pdf, other

    cs.CV

    Iterative Scene Graph Generation with Generative Transformers

    Authors: Sanjoy Kundu, Sathyanarayanan N. Aakur

    Abstract: Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 10 pages, 4 figures, 4 tables

  16. arXiv:2203.17109  [pdf, other

    cs.AI

    A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

    Authors: Vishal Pallagani, Priyadharsini Ramamurthy, Vedant Khandelwal, Revathy Venkataramanan, Kausik Lakkaraju, Sathyanarayanan N. Aakur, Biplav Srivastava

    Abstract: Food is not only a basic human necessity but also a key factor driving a society's health and economic well-being. As a result, the cooking domain is a popular use-case to demonstrate decision-support (AI) capabilities in service of benefits like precision health with tools ranging from information retrieval interfaces to task-oriented chatbots. An AI here should understand concepts in the food do… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  17. arXiv:2111.08001  [pdf, other

    q-bio.GN cs.AI cs.LG

    Metagenome2Vec: Building Contextualized Representations for Scalable Metagenome Analysis

    Authors: Sathyanarayanan N. Aakur, Vineela Indla, Vennela Indla, Sai Narayanan, Arunkumar Bagavathi, Vishalini Laguduva Ramnath, Akhilesh Ramachandran

    Abstract: Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of metagenome sequences, there is a need for scalable frameworks to analyze and segment metagenome sequences from clinical samples, which can be highly imbalanced… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: To appear in DMBIH Workshop at ICDM 2021

  18. arXiv:2111.05448  [pdf, other

    cs.CV

    Towards Active Vision for Action Localization with Reactive Control and Predictive Learning

    Authors: Shubham Trehan, Sathyanarayanan N. Aakur

    Abstract: Visual event perception tasks such as action localization have primarily focused on supervised learning settings under a static observer, i.e., the camera is static and cannot be controlled by an algorithm. They are often restricted by the quality, quantity, and diversity of \textit{annotated} training data and do not often generalize to out-of-domain samples. In this work, we tackle the problem o… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: To appear at WACV 2022

  19. arXiv:2107.09883  [pdf, other

    cs.LG q-bio.GN

    MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis

    Authors: Sathyanarayanan N. Aakur, Sai Narayanan, Vineela Indla, Arunkumar Bagavathi, Vishalini Laguduva Ramnath, Akhilesh Ramachandran

    Abstract: The emergence of novel pathogens and zoonotic diseases like the SARS-CoV-2 have underlined the need for developing novel diagnosis and intervention pipelines that can learn rapidly from small amounts of labeled data. Combined with technological advances in next-generation sequencing, metagenome-based diagnostic tools hold much promise to revolutionize rapid point-of-care diagnosis. However, there… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: To appear in MICCAI 2021

  20. arXiv:2104.14131  [pdf, other

    cs.CV

    Actor-centered Representations for Action Localization in Streaming Videos

    Authors: Sathyanarayanan N. Aakur, Sudeep Sarkar

    Abstract: Event perception tasks such as recognizing and localizing actions in streaming videos are essential for scaling to real-world application contexts. We tackle the problem of learning actor-centered representations through the notion of continual hierarchical predictive learning to localize actions in streaming videos without the need for training labels and outlines for the objects in the video. We… ▽ More

    Submitted 29 November, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: ECCV 2022

  21. arXiv:2009.07470  [pdf, other

    cs.CV cs.AI

    Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision

    Authors: Sathyanarayanan N. Aakur, Sanjoy Kundu, Nikhil Gunti

    Abstract: Advances in deep learning have enabled the development of models that have exhibited a remarkable tendency to recognize and even localize actions in videos. However, they tend to experience errors when faced with scenes or examples beyond their initial training environment. Hence, they fail to adapt to new domains without significant retraining with large amounts of annotated data. In this paper,… ▽ More

    Submitted 11 March, 2022; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Pattern Recognition Lettters

  22. arXiv:2007.12791  [pdf, other

    cs.LG q-bio.QM

    Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks

    Authors: Sai Narayanan, Akhilesh Ramachandran, Sathyanarayanan N. Aakur, Arunkumar Bagavathi

    Abstract: Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disease in cattle with multiple etiologies, including bacterial and viral. It is estimated that mortality, morbidity, therapy, and quarantine resulting from BRDC account for significant losses in the cattle industry. Early detection and management of BRDC are crucial in mitigating economic losses. Current animal disease diagnostics… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  23. arXiv:2003.12185  [pdf, other

    cs.CV

    Action Localization through Continual Predictive Learning

    Authors: Sathyanarayanan N. Aakur, Sudeep Sarkar

    Abstract: The problem of action recognition involves locating the action in the video, both over time and spatially in the image. The dominant current approaches use supervised learning to solve this problem, and require large amounts of annotated training data, in the form of frame-level bounding box annotations around the region of interest. In this paper, we present a new approach based on continual lear… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: 18 pages, 4 figures and 3 tables

  24. arXiv:2001.11580  [pdf, other

    cs.CV

    Unsupervised Gaze Prediction in Egocentric Videos by Energy-based Surprise Modeling

    Authors: Sathyanarayanan N. Aakur, Arunkumar Bagavathi

    Abstract: Egocentric perception has grown rapidly with the advent of immersive computing devices. Human gaze prediction is an important problem in analyzing egocentric videos and has primarily been tackled through either saliency-based modeling or highly supervised learning. We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task o… ▽ More

    Submitted 29 April, 2021; v1 submitted 30 January, 2020; originally announced January 2020.

    Comments: To appear at VISAP2021. More details: https://saakur.github.io/Projects/GazePrediction/

  25. arXiv:1909.07891  [pdf, other

    cs.CR

    Machine Learning based IoT Edge Node Security Attack and Countermeasures

    Authors: Vishalini R. Laguduva, Sheikh Ariful Islam, Sathyanarayanan Aakur, Srinivas Katkoori, Robert Karam

    Abstract: Advances in technology have enabled tremendous progress in the development of a highly connected ecosystem of ubiquitous computing devices collectively called the Internet of Things (IoT). Ensuring the security of IoT devices is a high priority due to the sensitive nature of the collected data. Physically Unclonable Functions (PUFs) have emerged as critical hardware primitive for ensuring the secu… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: Accepted in ISVLSI 2019

  26. arXiv:1909.03099  [pdf, other

    cs.CL cs.AI

    Abductive Reasoning as Self-Supervision for Common Sense Question Answering

    Authors: Sathyanarayanan N. Aakur, Sudeep Sarkar

    Abstract: Question answering has seen significant advances in recent times, especially with the introduction of increasingly bigger transformer-based models pre-trained on massive amounts of data. While achieving impressive results on many benchmarks, their performances appear to be proportional to the amount of training data available in the target domain. In this work, we explore the ability of current qu… ▽ More

    Submitted 12 September, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: 8 Pages, 4 figures, 4 tables

  27. arXiv:1811.04869  [pdf, other

    cs.CV

    A Perceptual Prediction Framework for Self Supervised Event Segmentation

    Authors: Sathyanarayanan N. Aakur, Sudeep Sarkar

    Abstract: Temporal segmentation of long videos is an important problem, that has largely been tackled through supervised learning, often requiring large amounts of annotated training data. In this paper, we tackle the problem of self-supervised temporal segmentation of long videos that alleviate the need for any supervision. We introduce a self-supervised, predictive learning framework that draws inspiratio… ▽ More

    Submitted 5 April, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

    Comments: CVPR 2019 Camera Ready

  28. arXiv:1708.03725  [pdf, other

    cs.CV

    Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization

    Authors: Sathyanarayanan N. Aakur, Fillipe DM de Souza, Sudeep Sarkar

    Abstract: A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bas… ▽ More

    Submitted 15 November, 2018; v1 submitted 11 August, 2017; originally announced August 2017.

    Comments: Accepted to WACV 2019