Skip to main content

Showing 1–8 of 8 results for author: Mahadev, R

.
  1. arXiv:2505.21454  [pdf, ps, other

    cs.CV

    Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations

    Authors: Yue Li Du, Ben Alexander, Mikhail Antonenka, Rohan Mahadev, Hao-yu Wu, Dmitry Kislyuk

    Abstract: Retrieving semantically similar but visually distinct contents has been a critical capability in visual search systems. In this work, we aim to tackle this problem with Visual Product Graph (VPG), leveraging high-performance infrastructure for storage and state-of-the-art computer vision models for image understanding. VPG is built to be an online real-time retrieval system that enables navigation… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 10 pages, 10 figures

  2. arXiv:2107.09211  [pdf, other

    cs.CV

    Understanding Gender and Racial Disparities in Image Recognition Models

    Authors: Rohan Mahadev, Anindya Chakravarti

    Abstract: Large scale image classification models trained on top of popular datasets such as Imagenet have shown to have a distributional skew which leads to disparities in prediction accuracies across different subsections of population demographics. A lot of approaches have been made to solve for this distributional skew using methods that alter the model pre, post and during training. We investigate one… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

  3. arXiv:1912.11659  [pdf, other

    cs.CV

    Improving Visual Recognition using Ambient Sound for Supervision

    Authors: Rohan Mahadev, Hongyu Lu

    Abstract: Our brains combine vision and hearing to create a more elaborate interpretation of the world. When the visual input is insufficient, a rich panoply of sounds can be used to describe our surroundings. Since more than 1,000 hours of videos are uploaded to the internet everyday, it is arduous, if not impossible, to manually annotate these videos. Therefore, incorporating audio along with visual data… ▽ More

    Submitted 25 December, 2019; originally announced December 2019.

    Comments: 8 pages, 8 figures

  4. arXiv:1901.01153  [pdf, other

    cs.CV

    Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance

    Authors: Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramakrishnan

    Abstract: This paper addresses automatic summarization of videos in a unified manner. In particular, we propose a framework for multi-faceted summarization for extractive, query base and entity summarization (summarization at the level of entities like objects, scenes, humans and faces in the video). We investigate several summarization models which capture notions of diversity, coverage, representation and… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Accepted to WACV 2019. arXiv admin note: substantial text overlap with arXiv:1704.01466, arXiv:1809.08846

  5. arXiv:1901.01151  [pdf, other

    cs.CV

    Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

    Authors: Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan

    Abstract: Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset se… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Accepted to WACV 2019. arXiv admin note: substantial text overlap with arXiv:1805.11191

  6. arXiv:1809.08846  [pdf, other

    cs.CV cs.LG

    Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization

    Authors: Rishabh Iyer, Pratik Dubal, Kunal Dargan, Suraj Kothawade, Rohan Mahadev, Vishal Kaushal

    Abstract: With increasing amounts of visual data being created in the form of videos and images, visual data selection and summarization are becoming ever increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data Selection and Summarization. Vis-DSS implements a framework of models for summarization and data subset selection using submodular functions, which are becoming increasingly p… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

    Comments: Vis-DSS is available at https://github.com/rishabhk108/vis-dss

  7. arXiv:1805.10604  [pdf, other

    cs.CV cs.DM

    Deployment of Customized Deep Learning based Video Analytics On Surveillance Cameras

    Authors: Pratik Dubal, Rohan Mahadev, Suraj Kothawade, Kunal Dargan, Rishabh Iyer

    Abstract: This paper demonstrates the effectiveness of our customized deep learning based video analytics system in various applications focused on security, safety, customer analytics and process compliance. We describe our video analytics system comprising of Search, Summarize, Statistics and real-time alerting, and outline its building blocks. These building blocks include object detection, tracking, fac… ▽ More

    Submitted 27 June, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: Added Equal Contribution footnote

  8. arXiv:1611.04010   

    cs.CL

    Multi-Language Identification Using Convolutional Recurrent Neural Network

    Authors: Vrishabh Ajay Lakhani, Rohan Mahadev

    Abstract: Language Identification, being an important aspect of Automatic Speaker Recognition has had many changes and new approaches to ameliorate performance over the last decade. We compare the performance of using audio spectrum in the log scale and using Polyphonic sound sequences from raw audio samples to train the neural network and to classify speech as either English or Spanish. To achieve this, we… ▽ More

    Submitted 18 May, 2017; v1 submitted 12 November, 2016; originally announced November 2016.

    Comments: Further experiments were performed on the model using LibriVox speech dataset and it was found that a Time Distributed CRNN model performed better and represented our initial ideas about the speaker recognition task better. The dataset contains speech in three languages - English, Spanish and Czech. A report on our findings along with experimental results will be published soon