Search | arXiv e-print repository

Dataset Augmentation by Mixing Visual Concepts

Authors: Abdullah Al Rahat, Hemanth Venkateswara

Abstract: This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models. Generating images using a pre-trained diffusion model with textual conditioning often results in domain discrepancy between real data and generated images. We propose a fine-tuning approach where we adapt the diffusion model by conditioning it with real images and novel text embeddings. We introduce a un… ▽ More This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models. Generating images using a pre-trained diffusion model with textual conditioning often results in domain discrepancy between real data and generated images. We propose a fine-tuning approach where we adapt the diffusion model by conditioning it with real images and novel text embeddings. We introduce a unique procedure called Mixing Visual Concepts (MVC) where we create novel text embeddings from image captions. The MVC enables us to generate multiple images which are diverse and yet similar to the real data enabling us to perform effective dataset augmentation. We perform comprehensive qualitative and quantitative evaluations with the proposed dataset augmentation approach showcasing both coarse-grained and finegrained changes in generated images. Our approach outperforms state-of-the-art augmentation techniques on benchmark classification tasks. △ Less

Submitted 19 December, 2024; originally announced December 2024.

Comments: Accepted at WACV 2025 main conference

arXiv:2402.06809 [pdf, other]

Domain Adaptation Using Pseudo Labels

Authors: Sachin Chhabra, Hemanth Venkateswara, Baoxin Li

Abstract: In the absence of labeled target data, unsupervised domain adaptation approaches seek to align the marginal distributions of the source and target domains in order to train a classifier for the target. Unsupervised domain alignment procedures are category-agnostic and end up misaligning the categories. We address this problem by deploying a pretrained network to determine accurate labels for the t… ▽ More In the absence of labeled target data, unsupervised domain adaptation approaches seek to align the marginal distributions of the source and target domains in order to train a classifier for the target. Unsupervised domain alignment procedures are category-agnostic and end up misaligning the categories. We address this problem by deploying a pretrained network to determine accurate labels for the target domain using a multi-stage pseudo-label refinement procedure. The filters are based on the confidence, distance (conformity), and consistency of the pseudo labels. Our results on multiple datasets demonstrate the effectiveness of our simple procedure in comparison with complex state-of-the-art techniques. △ Less

Submitted 11 March, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 8 pages + 3 pages of references

arXiv:2212.01590 [pdf, other]

Domain-Invariant Feature Alignment Using Variational Inference For Partial Domain Adaptation

Authors: Sandipan Choudhuri, Suli Adeniye, Arunabha Sen, Hemanth Venkateswara

Abstract: The standard closed-set domain adaptation approaches seek to mitigate distribution discrepancies between two domains under the constraint of both sharing identical label sets. However, in realistic scenarios, finding an optimal source domain with identical label space is a challenging task. Partial domain adaptation alleviates this problem of procuring a labeled dataset with identical label space… ▽ More The standard closed-set domain adaptation approaches seek to mitigate distribution discrepancies between two domains under the constraint of both sharing identical label sets. However, in realistic scenarios, finding an optimal source domain with identical label space is a challenging task. Partial domain adaptation alleviates this problem of procuring a labeled dataset with identical label space assumptions and addresses a more practical scenario where the source label set subsumes the target label set. This, however, presents a few additional obstacles during adaptation. Samples with categories private to the source domain thwart relevant knowledge transfer and degrade model performance. In this work, we try to address these issues by coupling variational information and adversarial learning with a pseudo-labeling technique to enforce class distribution alignment and minimize the transfer of superfluous information from the source samples. The experimental findings in numerous cross-domain classification tasks demonstrate that the proposed technique delivers superior and comparable accuracy to existing methods. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Accepted in the 56th Asilomar Conference on Signals, Systems, and Computers, 2022

arXiv:2210.15722 [pdf, other]

PatchRot: A Self-Supervised Technique for Training Vision Transformers

Authors: Sachin Chhabra, Prabal Bijoy Dutta, Hemanth Venkateswara, Baoxin Li

Abstract: Vision transformers require a huge amount of labeled data to outperform convolutional neural networks. However, labeling a huge dataset is a very expensive process. Self-supervised learning techniques alleviate this problem by learning features similar to supervised learning in an unsupervised way. In this paper, we propose a self-supervised technique PatchRot that is crafted for vision transforme… ▽ More Vision transformers require a huge amount of labeled data to outperform convolutional neural networks. However, labeling a huge dataset is a very expensive process. Self-supervised learning techniques alleviate this problem by learning features similar to supervised learning in an unsupervised way. In this paper, we propose a self-supervised technique PatchRot that is crafted for vision transformers. PatchRot rotates images and image patches and trains the network to predict the rotation angles. The network learns to extract both global and local features from an image. Our extensive experiments on different datasets showcase PatchRot training learns rich features which outperform supervised learning and compared baseline. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: NeurIPS Workshop on Vision Transformers: Theory and Applications (VTTA)

arXiv:2207.08145 [pdf]

doi 10.47852/bonviewJCCE2202324

Coupling Adversarial Learning with Selective Voting Strategy for Distribution Alignment in Partial Domain Adaptation

Authors: Sandipan Choudhuri, Hemanth Venkateswara, Arunabha Sen

Abstract: In contrast to a standard closed-set domain adaptation task, partial domain adaptation setup caters to a realistic scenario by relaxing the identical label set assumption. The fact of source label set subsuming the target label set, however, introduces few additional obstacles as training on private source category samples thwart relevant knowledge transfer and mislead the classification process.… ▽ More In contrast to a standard closed-set domain adaptation task, partial domain adaptation setup caters to a realistic scenario by relaxing the identical label set assumption. The fact of source label set subsuming the target label set, however, introduces few additional obstacles as training on private source category samples thwart relevant knowledge transfer and mislead the classification process. To mitigate these issues, we devise a mechanism for strategic selection of highly-confident target samples essential for the estimation of class-importance weights. Furthermore, we capture class-discriminative and domain-invariant features by coupling the process of achieving compact and distinct class distributions with an adversarial objective. Experimental findings over numerous cross-domain classification tasks demonstrate the potential of the proposed technique to deliver superior and comparable accuracy over existing methods. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Journal ref: Journal of Computational and Cognitive Engineering. Volume 1, Issue 4, 2022

arXiv:2201.10711 [pdf, other]

Sparsity Regularization For Cold-Start Recommendation

Authors: Aksheshkumar Ajaykumar Shah, Hemanth Venkateswara

Abstract: Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. In this paper we introduce a novel representation for user-vectors by combining user demographics and user preferences, making the model a hybrid system which uses Collaborati… ▽ More Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. In this paper we introduce a novel representation for user-vectors by combining user demographics and user preferences, making the model a hybrid system which uses Collaborative Filtering and Content Based Recommendation. Our system models user purchase behavior using weighted user-product preferences (explicit feedback) rather than binary user-product interactions (implicit feedback). Using this we develop a novel sparse adversarial model, SRLGAN, for Cold-Start Recommendation leveraging the sparse user-purchase behavior which ensures training stability and avoids over-fitting on warm users. We evaluate the SRLGAN on two popular datasets and demonstrate state-of-the-art results. △ Less

Submitted 28 January, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

arXiv:2101.02275 [pdf, other]

Partial Domain Adaptation Using Selective Representation Learning For Class-Weight Computation

Authors: Sandipan Choudhuri, Riti Paul, Arunabha Sen, Baoxin Li, Hemanth Venkateswara

Abstract: The generalization power of deep-learning models is dependent on rich-labelled data. This supervision using large-scaled annotated information is restrictive in most real-world scenarios where data collection and their annotation involve huge cost. Various domain adaptation techniques exist in literature that bridge this distribution discrepancy. However, a majority of these models require the lab… ▽ More The generalization power of deep-learning models is dependent on rich-labelled data. This supervision using large-scaled annotated information is restrictive in most real-world scenarios where data collection and their annotation involve huge cost. Various domain adaptation techniques exist in literature that bridge this distribution discrepancy. However, a majority of these models require the label sets of both the domains to be identical. To tackle a more practical and challenging scenario, we formulate the problem statement from a partial domain adaptation perspective, where the source label set is a super set of the target label set. Driven by the motivation that image styles are private to each domain, in this work, we develop a method that identifies outlier classes exclusively from image content information and train a label classifier exclusively on class-content from source images. Additionally, elimination of negative transfer of samples from classes private to the source domain is achieved by transforming the soft class-level weights into two clusters, 0 (outlier source classes) and 1 (shared classes) by maximizing the between-cluster variance between them. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:2007.09549 [pdf, other]

Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning

Authors: Maunil R Vyas, Hemanth Venkateswara, Sethuraman Panchanathan

Abstract: Zero-shot learning (ZSL) addresses the unseen class recognition problem by leveraging semantic information to transfer knowledge from seen classes to unseen classes. Generative models synthesize the unseen visual features and convert ZSL into a classical supervised learning problem. These generative models are trained using the seen classes and are expected to implicitly transfer the knowledge fro… ▽ More Zero-shot learning (ZSL) addresses the unseen class recognition problem by leveraging semantic information to transfer knowledge from seen classes to unseen classes. Generative models synthesize the unseen visual features and convert ZSL into a classical supervised learning problem. These generative models are trained using the seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting, which leads to substandard performance on Generalized Zero-Shot learning (GZSL). To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss). The SR-loss guides the LsrGAN to generate visual features that mirror the semantic relationships between seen and unseen classes. Experiments on seven benchmark datasets, including the challenging Wikipedia text-based CUB and NABirds splits, and Attribute-based AWA, CUB, and SUN, demonstrates the superiority of the LsrGAN compared to previous state-of-the-art approaches under both ZSL and GZSL. Code is available at https: // github. com/ Maunil/ LsrGAN △ Less

Submitted 18 July, 2020; originally announced July 2020.

Comments: 19 Pages, To be appear in ECCV 2020

arXiv:2001.01824 [pdf, other]

Foveated Haptic Gaze

Authors: Bijan Fakhri, Troy McDaniel, Heni Ben Amor, Hemanth Venkateswara, Abhik Chowdhury, Sethuraman Panchanathan

Abstract: As digital worlds become ubiquitous via video games, simulations, virtual and augmented reality, people with disabilities who cannot access those worlds are becoming increasingly disenfranchised. More often than not the design of these environments focuses on vision, making them inaccessible in whole or in part to people with visual impairments. Accessible games and visual aids have been developed… ▽ More As digital worlds become ubiquitous via video games, simulations, virtual and augmented reality, people with disabilities who cannot access those worlds are becoming increasingly disenfranchised. More often than not the design of these environments focuses on vision, making them inaccessible in whole or in part to people with visual impairments. Accessible games and visual aids have been developed but their lack of prevalence or unintuitive interfaces make them impractical for daily use. To address this gap, we present Foveated Haptic Gaze, a method for conveying visual information via haptics that is intuitive and designed for interacting with real-time 3-dimensional environments. To validate our approach we developed a prototype of the system along with a simplified first-person shooter game. Lastly we present encouraging user study results of both sighted and blind participants using our system to play the game with no visual feedback. △ Less

Submitted 21 January, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: Accepted to ICSM 2019. For a demonstration of Foveated Haptic Gaze, see https://youtu.be/Xp7B8UqtVFw

arXiv:1907.01098 [pdf, other]

doi 10.1007/978-3-030-43887-6_50

Representation, Exploration and Recommendation of Music Playlists

Authors: Piyush Papreja, Hemanth Venkateswara, Sethuraman Panchanathan

Abstract: Playlists have become a significant part of our listening experience because of the digital cloud-based services such as Spotify, Pandora, Apple Music. Owing to the meteoric rise in the usage of playlists, recommending playlists is crucial to music services today. Although there has been a lot of work done in playlist prediction, the area of playlist representation hasn't received that level of at… ▽ More Playlists have become a significant part of our listening experience because of the digital cloud-based services such as Spotify, Pandora, Apple Music. Owing to the meteoric rise in the usage of playlists, recommending playlists is crucial to music services today. Although there has been a lot of work done in playlist prediction, the area of playlist representation hasn't received that level of attention. Over the last few years, sequence-to-sequence models, especially in the field of natural language processing, have shown the effectiveness of learned embeddings in capturing the semantic characteristics of sequences. We can apply similar concepts to music to learn fixed length representations for playlists and use those representations for downstream tasks such as playlist discovery, browsing, and recommendation. In this work, we formulate the problem of learning a fixed-length playlist representation in an unsupervised manner, using Sequence-to-sequence (Seq2seq) models, interpreting playlists as sentences and songs as words. We compare our model with two other encoding architectures for baseline comparison. We evaluate our work using the suite of tasks commonly used for assessing sentence embeddings, along with a few additional tasks pertaining to music, and a recommendation task to study the traits captured by the playlist embeddings and their effectiveness for the purpose of music recommendation. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1706.07535 [pdf, ps, other]

Efficient Approximate Solutions to Mutual Information Based Global Feature Selection

Authors: Hemanth Venkateswara, Prasanth Lade, Binbin Lin, Jieping Ye, Sethuraman Panchanathan

Abstract: Mutual Information (MI) is often used for feature selection when developing classifier models. Estimating the MI for a subset of features is often intractable. We demonstrate, that under the assumptions of conditional independence, MI between a subset of features can be expressed as the Conditional Mutual Information (CMI) between pairs of features. But selecting features with the highest CMI turn… ▽ More Mutual Information (MI) is often used for feature selection when developing classifier models. Estimating the MI for a subset of features is often intractable. We demonstrate, that under the assumptions of conditional independence, MI between a subset of features can be expressed as the Conditional Mutual Information (CMI) between pairs of features. But selecting features with the highest CMI turns out to be a hard combinatorial problem. In this work, we have applied two unique global methods, Truncated Power Method (TPower) and Low Rank Bilinear Approximation (LowRank), to solve the feature selection problem. These algorithms provide very good approximations to the NP-hard CMI based feature selection problem. We experimentally demonstrate the effectiveness of these procedures across multiple datasets and compare them with existing MI based global and iterative feature selection procedures. △ Less