Skip to main content

Showing 1–50 of 128 results for author: Tripathi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23464  [pdf, ps, other

    cs.AI

    The Confidence Paradox: Can LLM Know When It's Wrong

    Authors: Sahil Tripathi, Md Tabrez Nafis, Imran Hussain, Jiechao Gao

    Abstract: Document Visual Question Answering (DocVQA) systems are increasingly deployed in real world applications, yet they remain ethically opaque-often producing overconfident answers to ambiguous questions or failing to communicate uncertainty in a trustworthy manner. This misalignment between model confidence and actual knowledge poses significant risks, particularly in domains requiring ethical accoun… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2506.05787  [pdf, ps, other

    cs.CV

    EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

    Authors: Ivan Rodin, Tz-Ying Wu, Kyle Min, Sharath Nittur Sridhar, Antonino Furnari, Subarna Tripathi, Giovanni Maria Farinella

    Abstract: We introduce EASG-Bench, a question-answering benchmark for egocentric videos where the question-answering pairs are created from spatio-temporally grounded dynamic scene graphs capturing intricate relationships among actors, actions, and objects. We propose a systematic evaluation framework and evaluate several language-only and video large language models (video-LLMs) on this benchmark. We obser… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2506.03170  [pdf, ps, other

    cs.CV cs.AI cs.LG

    PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models

    Authors: Murthy L, Subarna Tripathi

    Abstract: The risk of misusing text-to-image generative models for malicious uses, especially due to the open-source development of such models, has become a serious concern. As a risk mitigation strategy, attributing generative models with neural fingerprinting is emerging as a popular technique. There has been a plethora of recent work that aim for addressing neural fingerprinting. A trade-off between the… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

  4. arXiv:2506.01102  [pdf, ps, other

    cs.CV

    Keystep Recognition using Graph Neural Networks

    Authors: Julia Lee Romero, Kyle Min, Subarna Tripathi, Morteza Karimzadeh

    Abstract: We pose keystep recognition as a node classification task, and propose a flexible graph-learning framework for fine-grained keystep recognition that is able to effectively leverage long-term dependencies in egocentric videos. Our approach, termed GLEVR, consists of constructing a graph where each video clip of the egocentric video corresponds to a node. The constructed graphs are sparse and comput… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  5. arXiv:2505.24090  [pdf, other

    cs.DB cs.AI

    Searching Clinical Data Using Generative AI

    Authors: Karan Hanswadkar, Anika Kanchi, Shivani Tripathi, Shi Qiao, Rony Chatterjee, Alekh Jindal

    Abstract: Artificial Intelligence (AI) is making a major impact on healthcare, particularly through its application in natural language processing (NLP) and predictive analytics. The healthcare sector has increasingly adopted AI for tasks such as clinical data analysis and medical code assignment. However, searching for clinical information in large and often unorganized datasets remains a manual and error-… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  6. arXiv:2505.23942  [pdf, ps, other

    cs.LG cs.AI

    SG-Blend: Learning an Interpolation Between Improved Swish and GELU for Robust Neural Representations

    Authors: Gaurav Sarkar, Jay Gala, Subarna Tripathi

    Abstract: The design of activation functions remains a pivotal component in optimizing deep neural networks. While prevailing choices like Swish and GELU demonstrate considerable efficacy, they often exhibit domain-specific optima. This work introduces SG-Blend, a novel activation function that blends our proposed SSwish, a first-order symmetric variant of Swish and the established GELU through dynamic inte… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2504.17695  [pdf, other

    cs.CV

    PICO: Reconstructing 3D People In Contact with Objects

    Authors: Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas

    Abstract: Recovering 3D Human-Object Interaction (HOI) from single color images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes. We tackle… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted in CVPR'25. Project Page: https://pico.is.tue.mpg.de

  8. arXiv:2504.05303  [pdf, other

    cs.CV

    InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

    Authors: Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas

    Abstract: We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes. Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual label… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  9. arXiv:2502.14360  [pdf

    cs.CV

    Weed Detection using Convolutional Neural Network

    Authors: Santosh Kumar Tripathi, Shivendra Pratap Singh, Devansh Sharma, Harshavardhan U Patekar

    Abstract: In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment,… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  10. arXiv:2501.04121  [pdf, other

    cs.CV

    Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition

    Authors: Julia Lee Romero, Kyle Min, Subarna Tripathi, Morteza Karimzadeh

    Abstract: Egocentric videos capture scenes from a wearer's viewpoint, resulting in dynamic backgrounds, frequent motion, and occlusions, posing challenges to accurate keystep recognition. We propose a flexible graph-learning framework for fine-grained keystep recognition that is able to effectively leverage long-term dependencies in egocentric videos, and leverage alignment between egocentric and exocentric… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: 9 pages, 6 figures

  11. arXiv:2412.13935  [pdf, other

    cs.LG cs.AI

    Spatio-Temporal Forecasting of PM2.5 via Spatial-Diffusion guided Encoder-Decoder Architecture

    Authors: Malay Pandey, Vaishali Jain, Nimit Godhani, Sachchida Nand Tripathi, Piyush Rai

    Abstract: In many problem settings that require spatio-temporal forecasting, the values in the time-series not only exhibit spatio-temporal correlations but are also influenced by spatial diffusion across locations. One such example is forecasting the concentration of fine particulate matter (PM2.5) in the atmosphere which is influenced by many complex factors, the most important ones being diffusion due to… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 9 pages, 4 figures, International Conference on Data Science and Management of Data (CODS-COMAD), IIT Jodhpur, 2024

  12. arXiv:2410.17043  [pdf, other

    cs.LG cs.NI

    Optimizing Mixture-of-Experts Inference Time Combining Model Deployment and Communication Scheduling

    Authors: Jialong Li, Shreyansh Tripathi, Lakshay Rastogi, Yiming Lei, Rui Pan, Yiting Xia

    Abstract: As machine learning models scale in size and complexity, their computational requirements become a significant barrier. Mixture-of-Experts (MoE) models alleviate this issue by selectively activating relevant experts. Despite this, MoE models are hindered by high communication overhead from all-to-all operations, low GPU utilization due to the synchronous communication constraint, and complications… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  13. arXiv:2409.16178  [pdf, other

    cs.CV

    SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

    Authors: Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi, Theo Gevers, Sai Kumar Dwivedi, Dimitrios Tzionas

    Abstract: Recovering 3D object pose and shape from a single image is a challenging and highly ill-posed problem. This is due to strong (self-)occlusions, depth ambiguities, the vast intra- and inter-class shape variance, and lack of 3D ground truth for natural images. While existing methods train deep networks on synthetic datasets to predict 3D shapes, they often struggle to generalize to real-world scenar… ▽ More

    Submitted 10 March, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: 12 pages, 10 figures, 5 tables

  14. arXiv:2409.06010  [pdf, other

    cs.NI eess.SY

    When Learning Meets Dynamics: Distributed User Connectivity Maximization in UAV-Based Communication Networks

    Authors: Bowei Li, Saugat Tripathi, Salman Hosain, Ran Zhang, Jiang, Xie, Miao Wang

    Abstract: Distributed management over Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) has attracted increasing research attention. In this work, we study a distributed user connectivity maximization problem in a UCN. The work features a horizontal study over different levels of information exchange during the distributed iteration and a consideration of dynamics in UAV set and user distrib… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 12 pages, 12 figures, journal draft

  15. arXiv:2409.03944  [pdf, other

    cs.CV cs.AI

    HUMOS: Human Motion Model Conditioned on Body Shape

    Authors: Shashank Tripathi, Omid Taheri, Christoph Lassner, Michael J. Black, Daniel Holden, Carsten Stoll

    Abstract: Generating realistic human motion is essential for many computer vision and graphics applications. The wide variety of human body shapes and sizes greatly impacts how people move. However, most existing motion models ignore these differences, relying on a standardized, average body. This leads to uniform motion across different body types, where movements don't match their physical characteristics… ▽ More

    Submitted 3 April, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted in ECCV'24. Project page: https://CarstenEpic.github.io/humos/

  16. arXiv:2407.19520  [pdf, other

    cs.CV cs.LG

    Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation

    Authors: Tz-Ying Wu, Kyle Min, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Video understanding typically requires fine-tuning the large backbone when adapting to new domains. In this paper, we leverage the egocentric video foundation models (Ego-VFMs) based on video-language pre-training and propose a parameter-efficient adaptation for egocentric video tasks, namely Ego-VPA. It employs a local sparse approximation for each video frame/text feature using the basis prompts… ▽ More

    Submitted 26 February, 2025; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted to WACV 2025

  17. arXiv:2406.09462  [pdf, other

    cs.CV cs.AI

    SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video

    Authors: Hector A. Valdez, Kyle Min, Subarna Tripathi

    Abstract: Pretraining egocentric vision-language models has become essential to improving downstream egocentric video-text tasks. These egocentric foundation models commonly use the transformer architecture. The memory footprint of these models during pretraining can be substantial. Therefore, we pretrain SViTT-Ego, the first sparse egocentric video-text transformer model integrating edge and node sparsific… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The Facial Action Coding System (FACS) for studying facial expressions is manual and requires significant effort and expertise. This paper explores the use of automated techniques to generate Action Units (AUs) for studying facial expressions. We propose an unsupervised approach based on Principal Component Analysis (PCA) and facial keypoint tracking to generate data-driven AUs called PCA AUs usin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in [LNCS,volume 14301], and is available online at https://doi.org/10.1007/978-3-031-45170-6_85

  19. arXiv:2406.05434  [pdf, other

    cs.CV cs.HC

    Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

    Authors: Shivansh Chandra Tripathi, Rahul Garg

    Abstract: The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facia… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  20. arXiv:2406.02631  [pdf, other

    cs.CV

    Contrastive Language Video Time Pre-training

    Authors: Hengyue Liu, Kyle Min, Hector A. Valdez, Subarna Tripathi

    Abstract: We introduce LAVITI, a novel approach to learning language, video, and temporal representations in long-form videos via contrastive learning. Different from pre-training on video-text pairs like EgoVLP, LAVITI aims to align language, video, and temporal features by extracting meaningful moments in untrimmed videos. Our model employs a set of learnable moment queries to decode clip-level visual, la… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: CVPR EgoVis Workshop 2024 extended abstract

  21. D-VRE: From a Jupyter-enabled Private Research Environment to Decentralized Collaborative Research Ecosystem

    Authors: Yuandou Wang, Sheejan Tripathi, Siamak Farshidi, Zhiming Zhao

    Abstract: Today, scientific research is increasingly data-centric and compute-intensive, relying on data and models across distributed sources. However, it still faces challenges in the traditional cooperation mode, due to the high storage and computing cost, geo-location barriers, and local confidentiality regulations. The Jupyter environment has recently emerged and evolved as a vital virtual research env… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: We revised the manuscript draft and submitted the revised manuscript to the journal Blockchain: Research and Applications

  22. arXiv:2404.10539  [pdf, other

    cs.CV cs.AI

    VideoSAGE: Video Summarization with Graph Representation Learning

    Authors: Jose M. Rojas Chaves, Subarna Tripathi

    Abstract: We propose a graph-based representation learning framework for video summarization. First, we convert an input video to a graph where nodes correspond to each of the video frames. Then, we impose sparsity on the graph by connecting only those pairs of nodes that are within a specified temporal distance. We then formulate the video summarization task as a binary node classification problem, precise… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2207.07783

  23. Loss Regularizing Robotic Terrain Classification

    Authors: Shakti Deo Kumar, Sudhanshu Tripathi, Krishna Ujjwal, Sarvada Sakshi Jha, Suddhasil De

    Abstract: Locomotion mechanics of legged robots are suitable when pacing through difficult terrains. Recognising terrains for such robots are important to fully yoke the versatility of their movements. Consequently, robotic terrain classification becomes significant to classify terrains in real time with high accuracy. The conventional classifiers suffer from overfitting problem, low accuracy problem, high… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Preliminary draft of the work published in IEEE conference 2023

  24. arXiv:2403.00788  [pdf

    cs.CL cs.AI cs.HC cs.LG

    PRECISE Framework: GPT-based Text For Improved Readability, Reliability, and Understandability of Radiology Reports For Patient-Centered Care

    Authors: Satvik Tripathi, Liam Mutter, Meghana Muppuri, Suhani Dheer, Emiliano Garza-Frias, Komal Awan, Aakash Jha, Michael Dezube, Azadeh Tabari, Christopher P. Bridge, Dania Daye

    Abstract: This study introduces and evaluates the PRECISE framework, utilizing OpenAI's GPT-4 to enhance patient engagement by providing clearer and more accessible chest X-ray reports at a sixth-grade reading level. The framework was tested on 500 reports, demonstrating significant improvements in readability, reliability, and understandability. Statistical analyses confirmed the effectiveness of the PRECI… ▽ More

    Submitted 19 February, 2024; originally announced March 2024.

  25. arXiv:2312.05432  [pdf, other

    cs.LG math.OC

    Fusing Multiple Algorithms for Heterogeneous Online Learning

    Authors: Darshan Gadginmath, Shivanshu Tripathi, Fabio Pasqualetti

    Abstract: This study addresses the challenge of online learning in contexts where agents accumulate disparate data, face resource constraints, and use different local algorithms. This paper introduces the Switched Online Learning Algorithm (SOLA), designed to solve the heterogeneous online learning problem by amalgamating updates from diverse agents through a dynamic switching mechanism contingent upon thei… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 13 pages, 3 figures

  26. arXiv:2312.03391  [pdf, other

    cs.CV

    Action Scene Graphs for Long-Form Understanding of Egocentric Videos

    Authors: Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella

    Abstract: We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how a… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  27. arXiv:2311.10476  [pdf, other

    cs.CV

    FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

    Authors: Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Ivan DeAndres-Tame, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ , et al. (22 additional authors not shown)

    Abstract: Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 10 pages, 1 figure, WACV 2024 Workshops

  28. arXiv:2310.02753  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    MUNCH: Modelling Unique 'N Controllable Heads

    Authors: Debayan Deb, Suvidha Tripathi, Pranit Puri

    Abstract: The automated generation of 3D human heads has been an intriguing and challenging task for computer vision researchers. Prevailing methods synthesize realistic avatars but with limited control over the diversity and quality of rendered outputs and suffer from limited correlation between shape and texture of the character. We propose a method that offers quality, diversity, control, and realism alo… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  29. arXiv:2309.15273  [pdf, other

    cs.CV

    DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

    Authors: Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black

    Abstract: Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. I… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted as Oral in ICCV'23. Project page: https://deco.is.tue.mpg.de

  30. arXiv:2307.16195  [pdf, ps, other

    cs.AR

    Implementation of Fast and Power Efficient SEC-DAEC and SEC-DAEC-TAEC Codecs on FPGA

    Authors: Sayan Tripathi, Jhilam Jana, Jaydeb Bhaumik

    Abstract: The reliability of memory devices is affected by radiation induced soft errors. Multiple cell upsets (MCUs) caused by radiation corrupt data stored in multiple cells within memories. Error correction codes (ECCs) are typically used to mitigate the effects of MCUs. Single error correction-double error detection (SEC-DED) codes are not the right choice against MCUs, but are more suitable for protect… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: 9 pages, 2 figures, 2 tables

  31. Emotional Speech-Driven Animation with Content-Emotion Disentanglement

    Authors: Radek Daněček, Kiran Chhatre, Shashank Tripathi, Yandong Wen, Michael J. Black, Timo Bolkart

    Abstract: To be widely adopted, 3D facial avatars must be animated easily, realistically, and directly from speech signals. While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions. Realistic facial animation requires lip-sync together with the natural expression of emotion. To that end, we propose EMOTE… ▽ More

    Submitted 26 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  32. arXiv:2306.05689  [pdf, other

    cs.CV

    Single-Stage Visual Relationship Learning using Conditional Queries

    Authors: Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2022

  33. arXiv:2306.01652  [pdf, other

    cs.IT eess.SP

    On the Coverage of Cognitive mmWave Networks with Directional Sensing and Communication

    Authors: Shuchi Tripathi, Abhishek K. Gupta, SaiDhiraj Amuru

    Abstract: Millimeter-waves' propagation characteristics create prospects for spatial and temporal spectrum sharing in a variety of contexts, including cognitive spectrum sharing (CSS). However, CSS along with omnidirectional sensing, is not efficient at mmWave frequencies due to their directional nature of transmission, as this limits secondary networks' ability to access the spectrum. This inspired us to c… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: 30 pages, 12 figures

  34. arXiv:2304.11827  [pdf

    cs.CR cs.NI

    Safe and Secure Smart Home using Cisco Packet Tracer

    Authors: Shivansh Walia, Tejas Iyer, Shubham Tripathi, Akshith Vanaparthy

    Abstract: This project presents an implementation and designing of safe, secure and smart home with enhanced levels of security features which uses IoT-based technology. We got our motivation for this project after learning about movement of west towards smart homes and designs. This galvanized us to engage in this work as we wanted for homeowners to have a greater control over their in-house environment wh… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: 11 pages

  35. arXiv:2304.08809  [pdf, other

    cs.CV

    SViTT: Temporal Learning of Sparse Video-Text Transformers

    Authors: Yi Li, Kyle Min, Subarna Tripathi, Nuno Vasconcelos

    Abstract: Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-t… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  36. arXiv:2304.00733  [pdf, other

    cs.CV

    Unbiased Scene Graph Generation in Videos

    Authors: Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury

    Abstract: The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context usi… ▽ More

    Submitted 29 June, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  37. arXiv:2303.18246  [pdf, other

    cs.CV cs.AI cs.GR

    3D Human Pose Estimation via Intuitive Physics

    Authors: Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael J. Black, Dimitrios Tzionas

    Abstract: Estimating 3D humans from images often produces implausible bodies that lean, float, or penetrate the floor. Such methods ignore the fact that bodies are typically supported by the scene. A physics engine can be used to enforce physical plausibility, but these are not differentiable, rely on unrealistic proxy bodies, and are difficult to integrate into existing optimization and learning frameworks… ▽ More

    Submitted 24 July, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR'23. Project page: https://ipman.is.tue.mpg.de

  38. arXiv:2303.17499  [pdf, other

    cs.CR

    Fuzzified advanced robust hashes for identification of digital and physical objects

    Authors: Shashank Tripathi, Volker Skwarek

    Abstract: With the rising numbers for IoT objects, it is becoming easier to penetrate counterfeit objects into the mainstream market by adversaries. Such infiltration of bogus products can be addressed with third-party-verifiable identification. Generally, state-of-the-art identification schemes do not guarantee that an identifier e.g. barcodes or RFID itself cannot be forged. This paper introduces identifi… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 9 pages, 6 figures, 3 tables

    ACM Class: E.3; E.4; H.1

  39. arXiv:2212.04360  [pdf, other

    cs.CV cs.GR

    MIME: Human-Aware 3D Scene Generation

    Authors: Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black

    Abstract: Generating realistic 3D worlds occupied by moving humans has many applications in games, architecture, and synthetic data creation. But generating such scenes is expensive and labor intensive. Recent work generates human poses and motions given a 3D scene. Here, we take the opposite approach and generate 3D indoor scenes given 3D human motion. Such motions can come from archival motion capture or… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Page: https://mime.is.tue.mpg.de

  40. arXiv:2211.04442  [pdf, other

    cs.LG

    Algorithmic Bias in Machine Learning Based Delirium Prediction

    Authors: Sandhya Tripathi, Bradley A Fritz, Michael S Avidan, Yixin Chen, Christopher R King

    Abstract: Although prediction models for delirium, a commonly occurring condition during general hospitalization or post-surgery, have not gained huge popularity, their algorithmic bias evaluation is crucial due to the existing association between social determinants of health and delirium risk. In this context, using MIMIC-III and another academic hospital dataset, we present some initial experimental evid… ▽ More

    Submitted 26 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 14 pages

  41. arXiv:2210.15923  [pdf, other

    cs.LG

    DELFI: Deep Mixture Models for Long-term Air Quality Forecasting in the Delhi National Capital Region

    Authors: Naishadh Parmar, Raunak Shah, Tushar Goswamy, Vatsalya Tandon, Ravi Sahu, Ronak Sutaria, Purushottam Kar, Sachchida Nand Tripathi

    Abstract: The identification and control of human factors in climate change is a rapidly growing concern and robust, real-time air-quality monitoring and forecasting plays a critical role in allowing effective policy formulation and implementation. This paper presents DELFI, a novel deep learning-based mixture model to make effective long-term predictions of Particulate Matter (PM) 2.5 concentrations. A key… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: 6 pages

  42. arXiv:2210.10130  [pdf, other

    cs.CV

    PERI: Part Aware Emotion Recognition In The Wild

    Authors: Akshita Mittel, Shashank Tripathi

    Abstract: Emotion recognition aims to interpret the emotional states of a person based on various inputs including audio, visual, and textual cues. This paper focuses on emotion recognition using visual features. To leverage the correlation between facial expression and the emotional state of a person, pioneering methods rely primarily on facial features. However, facial features are often unreliable in nat… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted at ECCVW 2022

  43. arXiv:2210.00521  [pdf, other

    cs.LG eess.SP

    Leveraging unsupervised data and domain adaptation for deep regression in low-cost sensor calibration

    Authors: Swapnil Dey, Vipul Arora, Sachchida Nand Tripathi

    Abstract: Air quality monitoring is becoming an essential task with rising awareness about air quality. Low cost air quality sensors are easy to deploy but are not as reliable as the costly and bulky reference monitors. The low quality sensors can be calibrated against the reference monitors with the help of deep learning. In this paper, we translate the task of sensor calibration into a semi-supervised dom… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: submitted to IEEE Trans. on Neural Networks and Learning Systems as a regular article

  44. arXiv:2208.01953  [pdf, ps, other

    cs.DS

    Maximum Minimal Feedback Vertex Set: A Parameterized Perspective

    Authors: Ajinkya Gaikwad, Hitendra Kumar, Soumen Maity, Saket Saurabh, Shuvam Kant Tripathi

    Abstract: In this paper we study a maximization version of the classical Feedback Vertex Set (FVS) problem, namely, the Max Min FVS problem, in the realm of parameterized complexity. In this problem, given an undirected graph $G$, a positive integer $k$, the question is to check whether $G$ has a minimal feedback vertex set of size at least $k$. We obtain following results for Max Min FVS. 1) We first des… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

  45. arXiv:2207.07783  [pdf, other

    cs.CV

    Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

    Authors: Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

    Abstract: Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique n… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 camera ready (Supplementary videos: on ECVA soon). This paper supersedes arXiv:2112.01479

  46. arXiv:2207.03536  [pdf, other

    cs.DB cs.LG

    Deep Learning to Jointly Schema Match, Impute, and Transform Databases

    Authors: Sandhya Tripathi, Bradley A. Fritz, Mohamed Abdelhack, Michael S. Avidan, Yixin Chen, Christopher R. King

    Abstract: An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable algorithms, especially in health care. We approach this issue in the common but difficult case of numeric features such as nearly Gaussian and binary features, wher… ▽ More

    Submitted 22 June, 2022; originally announced July 2022.

  47. arXiv:2205.08440  [pdf, other

    cs.CR cs.DC cs.MA cs.SE

    Moving Smart Contracts -- A Privacy Preserving Method for Off-Chain Data Trust

    Authors: Simon Tschirner, Shashank Shekher Tripathi, Mathias Roeper, Markus M. Becker, Volker Skwarek

    Abstract: Blockchains provide environments where parties can interact transparently and securely peer-to-peer without needing a trusted third party. Parties can trust the integrity and correctness of transactions and the verifiable execution of binary code on the blockchain (smart contracts) inside the system. Including information from outside of the blockchain remains challenging. A challenge is data priv… ▽ More

    Submitted 18 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 10 pages, 6 figures

    ACM Class: C.2.4; E.2; E.3

  48. arXiv:2204.08695  [pdf, other

    cs.SE

    Automated Application Processing

    Authors: Eshita Sharma, Keshav Gupta, Lubaina Machinewala, Samaksh Dhingra, Shrey Tripathi, Shreyas V S, Sujit Kumar Chakrabarti

    Abstract: Recruitment in large organisations often involves interviewing a large number of candidates. The process is resource intensive and complex. Therefore, it is important to carry it out efficiently and effectively. Planning the selection process consists of several problems, each of which maps to one or the other well-known computing problem. Research that looks at each of these problems in isolation… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  49. arXiv:2204.07066  [pdf, other

    cs.NE cs.LG eess.SP

    EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting

    Authors: Ethan Jacob Moyer, Alisha Isabelle Augustin, Satvik Tripathi, Ansh Aashish Dholakia, Andy Nguyen, Isamu Mclean Isozaki, Daniel Schwartz, Edward Kim

    Abstract: In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with th… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 5 pages, 2 figures, 2 tables

  50. arXiv:2204.01918  [pdf, other

    cs.CV

    Text Spotting Transformers

    Authors: Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu

    Abstract: In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild. TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition. Other than most existing literature, our method is free from Region-of-Interest operations and heu… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022