Search | arXiv e-print repository

SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Authors: Rohit Venkata Sai Dulam, Chandra Kambhamettu

Abstract: Salient Object Detection (SOD) has traditionally relied on feature refinement modules that utilize the features of an ImageNet pre-trained backbone. However, this approach limits the possibility of pre-training the entire network because of the distinct nature of SOD and image classification. Additionally, the architecture of these backbones originally built for Image classification is sub-optimal… ▽ More Salient Object Detection (SOD) has traditionally relied on feature refinement modules that utilize the features of an ImageNet pre-trained backbone. However, this approach limits the possibility of pre-training the entire network because of the distinct nature of SOD and image classification. Additionally, the architecture of these backbones originally built for Image classification is sub-optimal for a dense prediction task like SOD. To address these issues, we propose a novel encoder-decoder-style neural network called SODAWideNet++ that is designed explicitly for SOD. Inspired by the vision transformers ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module, which combines large dilated convolutions and self-attention. Specifically, we use attention features to guide long-range information extracted by multiple dilated convolutions, thus taking advantage of the inductive biases of a convolution operation and the input dependency brought by self-attention. In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end. Further, we supervise the background predictions along with the foreground to push our model to generate accurate saliency predictions. SODAWideNet++ performs competitively on five different datasets while only containing 35% of the trainable parameters compared to the state-of-the-art models. The code and pre-computed saliency maps are provided at https://github.com/VimsLab/SODAWideNetPlusPlus. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted at ICPR 2024

arXiv:2311.04828 [pdf, other]

SODAWideNet -- Salient Object Detection with an Attention augmented Wide Encoder Decoder network without ImageNet pre-training

Authors: Rohit Venkata Sai Dulam, Chandra Kambhamettu

Abstract: Developing a new Salient Object Detection (SOD) model involves selecting an ImageNet pre-trained backbone and creating novel feature refinement modules to use backbone features. However, adding new components to a pre-trained backbone needs retraining the whole network on the ImageNet dataset, which requires significant time. Hence, we explore developing a neural network from scratch directly trai… ▽ More Developing a new Salient Object Detection (SOD) model involves selecting an ImageNet pre-trained backbone and creating novel feature refinement modules to use backbone features. However, adding new components to a pre-trained backbone needs retraining the whole network on the ImageNet dataset, which requires significant time. Hence, we explore developing a neural network from scratch directly trained on SOD without ImageNet pre-training. Such a formulation offers full autonomy to design task-specific components. To that end, we propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection. We deviate from the commonly practiced paradigm of narrow and deep convolutional models to a wide and shallow architecture, resulting in a parameter-efficient deep neural network. To achieve a shallower network, we increase the receptive field from the beginning of the network using a combination of dilated convolutions and self-attention. Therefore, we propose Multi Receptive Field Feature Aggregation Module (MRFFAM) that efficiently obtains discriminative features from farther regions at higher resolutions using dilated convolutions. Next, we propose Multi-Scale Attention (MSA), which creates a feature pyramid and efficiently computes attention across multiple resolutions to extract global features from larger feature maps. Finally, we propose two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), that achieve competitive performance against state-of-the-art models on five datasets. △ Less

Submitted 8 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: Accepted at ISVC'23

arXiv:2212.12213 [pdf, ps, other]

Finetuning for Sarcasm Detection with a Pruned Dataset

Authors: Ishita Goyal, Priyank Bhandia, Sanjana Dulam

Abstract: Sarcasm is a form of irony that involves saying or writing something that is opposite or opposite to what one really means, often in a humorous or mocking way. It is often used to mock or mock someone or something, or to be humorous or amusing. Sarcasm is usually conveyed through tone of voice, facial expressions, or other forms of nonverbal communication, but it can also be indicated by the use o… ▽ More Sarcasm is a form of irony that involves saying or writing something that is opposite or opposite to what one really means, often in a humorous or mocking way. It is often used to mock or mock someone or something, or to be humorous or amusing. Sarcasm is usually conveyed through tone of voice, facial expressions, or other forms of nonverbal communication, but it can also be indicated by the use of certain words or phrases that are typically associated with irony or humor. Sarcasm detection is difficult because it relies on context and non-verbal cues. It can also be culturally specific, subjective and ambiguous. In this work, we fine-tune the RoBERTa based sarcasm detection model presented in Abaskohi et al. [2022] to get to within 0.02 F1 of the state-of-the-art (Hercog et al. [2022]) on the iSarcasm dataset (Oprea and Magdy [2019]). This performance is achieved by augmenting iSarcasm with a pruned version of the Self Annotated Reddit Corpus (SARC) (Khodak et al. [2017]). Our pruned version is 100 times smaller than the subset of SARC used to train the state-of-the-art model. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: 5 pages, 3 tables

arXiv:2002.01575 [pdf, other]

Seeing through the smoke : a world-wide comparative study of e-cigarette flavors, brands and markets using data from Reddit and Twitter

Authors: Rohit Venkata Sai Dulam, Meghana Murthy, Jiebo Luo

Abstract: The growing popularity of E-cigarettes, an alternative to cigarettes, has motivated us to study trends of the brands, flavors and online market activity using posts from Reddit and Twitter. The main motivation for this world-wide study is to emphasize the difference that laws and regulations have on the usage and availability of different flavors and brands of vapes in different countries. Data ha… ▽ More The growing popularity of E-cigarettes, an alternative to cigarettes, has motivated us to study trends of the brands, flavors and online market activity using posts from Reddit and Twitter. The main motivation for this world-wide study is to emphasize the difference that laws and regulations have on the usage and availability of different flavors and brands of vapes in different countries. Data has been obtained from subreddits belonging to e-cigarette communities from Australia, Canada, Europe, and the UK. Extensive cleaning of data, and rigorous text mining operations provide varying results for different countries. Varying results have been obtained from Reddit and Twitter since they provide different atmospheres to the users. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: 7 pages, 11 figures

Showing 1–4 of 4 results for author: Dulam, S