Search | arXiv e-print repository

Benchmarking Image Similarity Metrics for Novel View Synthesis Applications

Authors: Charith Wickrema, Sara Leary, Shivangi Sarkar, Mark Giglio, Eric Bianchi, Eliza Mace, Michael Twardowski

Abstract: Traditional image similarity metrics are ineffective at evaluating the similarity between a real image of a scene and an artificially generated version of that viewpoint [6, 9, 13, 14]. Our research evaluates the effectiveness of a new, perceptual-based similarity metric, DreamSim [2], and three popular image similarity metrics: Structural Similarity (SSIM), Peak Signal-to-Noise Ratio (PSNR), and… ▽ More Traditional image similarity metrics are ineffective at evaluating the similarity between a real image of a scene and an artificially generated version of that viewpoint [6, 9, 13, 14]. Our research evaluates the effectiveness of a new, perceptual-based similarity metric, DreamSim [2], and three popular image similarity metrics: Structural Similarity (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS) [18, 19] in novel view synthesis (NVS) applications. We create a corpus of artificially corrupted images to quantify the sensitivity and discriminative power of each of the image similarity metrics. These tests reveal that traditional metrics are unable to effectively differentiate between images with minor pixel-level changes and those with substantial corruption, whereas DreamSim is more robust to minor defects and can effectively evaluate the high-level similarity of the image. Additionally, our results demonstrate that DreamSim provides a more effective and useful evaluation of render quality, especially for evaluating NVS renders in real-world use cases where slight rendering corruptions are common, but do not affect image utility for human tasks. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2502.08021 [pdf, other]

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Authors: Pai Liu, Lingfeng Zhao, Shivangi Agarwal, Jinghan Liu, Audrey Huang, Philip Amortila, Nan Jiang

Abstract: Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either incurs exponential variance (e.g., importance sampling) or has hyperparameters on their own (e.g., FQE and model-based). In this work we focus on hyperparamete… ▽ More Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either incurs exponential variance (e.g., importance sampling) or has hyperparameters on their own (e.g., FQE and model-based). In this work we focus on hyperparameter tuning for OPE itself, which is even more under-investigated. Concretely, we select among candidate value functions ("model-free") or dynamics ("model-based") to best assess the performance of a target policy. We develop: (1) new model-free and model-based selectors with theoretical guarantees, and (2) a new experimental protocol for empirically evaluating them. Compared to the model-free protocol in prior works, our new protocol allows for more stable generation and better control of candidate value functions in an optimization-free manner, and evaluation of model-free and model-based methods alike. We exemplify the protocol on Gym-Hopper, and find that our new model-free selector, LSTD-Tournament, demonstrates promising empirical performance. △ Less

Submitted 15 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

arXiv:2501.07072 [pdf, other]

Label Calibration in Source Free Domain Adaptation

Authors: Shivangi Rai, Rini Smita Thakur, Kunal Jangid, Vinod K Kurmi

Abstract: Source-free domain adaptation (SFDA) utilizes a pre-trained source model with unlabeled target data. Self-supervised SFDA techniques generate pseudolabels from the pre-trained source model, but these pseudolabels often contain noise due to domain discrepancies between the source and target domains. Traditional self-supervised SFDA techniques rely on deterministic model predictions using the softma… ▽ More Source-free domain adaptation (SFDA) utilizes a pre-trained source model with unlabeled target data. Self-supervised SFDA techniques generate pseudolabels from the pre-trained source model, but these pseudolabels often contain noise due to domain discrepancies between the source and target domains. Traditional self-supervised SFDA techniques rely on deterministic model predictions using the softmax function, leading to unreliable pseudolabels. In this work, we propose to introduce predictive uncertainty and softmax calibration for pseudolabel refinement using evidential deep learning. The Dirichlet prior is placed over the output of the target network to capture uncertainty using evidence with a single forward pass. Furthermore, softmax calibration solves the translation invariance problem to assist in learning with noisy labels. We incorporate a combination of evidential deep learning loss and information maximization loss with calibrated softmax in both prior and non-prior target knowledge SFDA settings. Extensive experimental analysis shows that our method outperforms other state-of-the-art methods on benchmark datasets. △ Less

Submitted 13 January, 2025; originally announced January 2025.

Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

arXiv:2412.15545 [pdf, other]

Climate Policy Elites' Twitter Interactions across Nine Countries

Authors: Ted Hsuan Yun Chen, Arttu Malkamäki, Ali Faqeeh, Esa Palosaari, Anniina Kotkaniemi, Laura Funke, Cáit Gleeson, James Goodman, Antti Gronow, Marlene Kammerer, Myanna Lahsen, Alexandre Marques, Petr Ocelik, Shivangi Seth, Mark Stoddart, Martin Svozil, Pradip Swarnakar, Matthew Trull, Paul Wagner, Yixi Yang, Mikko Kivelä, Tuomas Ylä-Anttila

Abstract: We identified the Twitter accounts of 941 climate change policy actors across nine countries, and collected their activities from 2017--2022, totalling 48 million activities from 17,700 accounts at different organizational levels. There is considerable temporal and cross-national variation in how prominent climate-related activities were, but all national policy systems generally responded to clim… ▽ More We identified the Twitter accounts of 941 climate change policy actors across nine countries, and collected their activities from 2017--2022, totalling 48 million activities from 17,700 accounts at different organizational levels. There is considerable temporal and cross-national variation in how prominent climate-related activities were, but all national policy systems generally responded to climate-related events, such as climate protests, in a similar manner. Examining patterns of interaction within and across countries, we find that these national policy systems rarely directly interact with one another, but are connected through consistently engaging with the same content produced by accounts of international organizations, climate activists, and researchers. △ Less

Submitted 19 December, 2024; originally announced December 2024.

Comments: working paper, 16 pages, 6 figures

arXiv:2411.18675 [pdf, other]

GaussianSpeech: Audio-Driven Gaussian Avatars

Authors: Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner

Abstract: We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion s… ▽ More We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles. △ Less

Submitted 27 November, 2024; originally announced November 2024.

Comments: Paper Video: https://youtu.be/2VqYoFlYcwQ Project Page: https://shivangi-aneja.github.io/projects/gaussianspeech

arXiv:2408.11821 [pdf, other]

doi 10.1016/j.iot.2024.101075

MIMA 2.0 -- Compact and Portable Multifunctional IoT integrated Menstrual Aid

Authors: Kumar J. Jyothish, Shreya Shivangi, Amish Bibhu, Subhankar Mishra, Sulagna Saha

Abstract: The shredding intrauterine lining or the endometrium is known as Menstruation. It occurs every month and causes several issues like Menstrual Cramps and aches in the abdominal region, stains, menstrual malodor, rashes in intimate areas, and many more. In our research, almost all of the products available in the market do not cater to these problems single-handedly. There are few remedies available… ▽ More The shredding intrauterine lining or the endometrium is known as Menstruation. It occurs every month and causes several issues like Menstrual Cramps and aches in the abdominal region, stains, menstrual malodor, rashes in intimate areas, and many more. In our research, almost all of the products available in the market do not cater to these problems single-handedly. There are few remedies available to cater to the cramps, among which heat therapy is the most commonly used. Our methodology, involved surveys regarding problems and the solutions to these problems that are deemed optimal. This inclusive approach helped us infer about the gaps in available menstrual aids which has become our guide towards developing MIMA (Multifunctional IoT Integrated Menstrual Aid). In this paper, we have featured an IOT incorporated multifunctional smart intimate wear that aims to provide for the multiple necessities of women during menstruation like leakproof, antibacterial, anti-odor, rash-free experience along with an integrated Bluetooth-controlled intimate heat-pad for relieving abdominal cramps. The entire process of product development has been done in phases according to feedback from target users in each stage. This paper is an extension to our paper [1] which serves as the proof of concept for our approach. The development has led us towards MIMA 2.0 featuring a completely concealed and integrated design that includes a safe Bluetooth-controlled heating system for the intimate area. The product has received incredibly positive feedback from survey participants. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Journal ref: In Internet of Things (Vol. 25, p. 101075). Elsevier BV (2024)

arXiv:2404.17105 [pdf, other]

Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis

Authors: Shivangi Yadav, Arun Ross

Abstract: Biometric systems based on iris recognition are currently being used in border control applications and mobile devices. However, research in iris recognition is stymied by various factors such as limited datasets of bonafide irides and presentation attack instruments; restricted intra-class variations; and privacy concerns. Some of these issues can be mitigated by the use of synthetic iris data. I… ▽ More Biometric systems based on iris recognition are currently being used in border control applications and mobile devices. However, research in iris recognition is stymied by various factors such as limited datasets of bonafide irides and presentation attack instruments; restricted intra-class variations; and privacy concerns. Some of these issues can be mitigated by the use of synthetic iris data. In this paper, we present a comprehensive review of state-of-the-art GAN-based synthetic iris image generation techniques, evaluating their strengths and limitations in producing realistic and useful iris images that can be used for both training and testing iris recognition systems and presentation attack detectors. In this regard, we first survey the various methods that have been used for synthetic iris generation and specifically consider generators based on StyleGAN, RaSGAN, CIT-GAN, iWarpGAN, StarGAN, etc. We then analyze the images generated by these models for realism, uniqueness, and biometric utility. This comprehensive analysis highlights the pros and cons of various GANs in the context of developing robust iris matchers and presentation attack detectors. △ Less

Submitted 11 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.12618 [pdf, other]

NewsCaption: Named-Entity aware Captioning for Out-of-Context Media

Authors: Anurag Singh, Shivangi Aneja

Abstract: With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection method… ▽ More With the increasing influence of social media, online misinformation has grown to become a societal issue. The motivation for our work comes from the threat caused by cheapfakes, where an unaltered image is described using a news caption in a new but false-context. The main challenge in detecting such out-of-context multimedia is the unavailability of large-scale datasets. Several detection methods employ randomly selected captions to generate out-of-context training inputs. However, these randomly matched captions are not truly representative of out-of-context scenarios due to inconsistencies between the image description and the matched caption. We aim to address these limitations by introducing a novel task of out-of-context caption generation. In this work, we propose a new method that generates a realistic out-of-context caption given visual and textual context. We also demonstrate that the semantics of the generated captions can be controlled using the textual context. We also evaluate our method against several baselines and our method improves over the image captioning baseline by 6.2% BLUE-4, 2.96% CiDEr, 11.5% ROUGE, and 7.3% METEOR △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2312.08459 [pdf, other]

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coh… ▽ More We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coherent motion sequences. We propose a new latent diffusion model for this task, operating in the expression space of neural parametric head models, to synthesize audio-driven realistic head sequences. In the absence of a dataset with corresponding NPHM expressions to audio, we optimize for these correspondences to produce a dataset of temporally-optimized NPHM expressions fit to audio-video recordings of people talking. To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of volumetric human heads, representing a significant advancement in the field of audio-driven 3D animation. Notably, our approach stands out in its ability to generate plausible motion sequences that can produce high-fidelity head animation coupled with the NPHM shape space. Our experimental results substantiate the effectiveness of FaceTalk, consistently achieving superior and visually natural motion, encompassing diverse facial expressions and styles, outperforming existing methods by 75% in perceptual user study evaluation. △ Less

Submitted 17 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Paper Video: https://youtu.be/7Jf0kawrA3Q Project Page: https://shivangi-aneja.github.io/projects/facetalk/

Journal ref: CVPR 2024

arXiv:2306.06375 [pdf, ps, other]

Optimized Gradient Tracking for Decentralized Online Learning

Authors: Shivangi Dubey Sharma, Ketan Rajawat

Abstract: This work considers the problem of decentralized online learning, where the goal is to track the optimum of the sum of time-varying functions, distributed across several nodes in a network. The local availability of the functions and their gradients necessitates coordination and consensus among the nodes. We put forth the Generalized Gradient Tracking (GGT) framework that unifies a number of exist… ▽ More This work considers the problem of decentralized online learning, where the goal is to track the optimum of the sum of time-varying functions, distributed across several nodes in a network. The local availability of the functions and their gradients necessitates coordination and consensus among the nodes. We put forth the Generalized Gradient Tracking (GGT) framework that unifies a number of existing approaches, including the state-of-the-art ones. The performance of the proposed GGT algorithm is theoretically analyzed using a novel semidefinite programming-based analysis that yields the desired regret bounds under very general conditions and without requiring the gradient boundedness assumption. The results are applicable to the special cases of GGT, which include various state-of-the-art algorithms as well as new dynamic versions of various classical decentralized algorithms. To further minimize the regret, we consider a condensed version of GGT with only four free parameters. A procedure for offline tuning of these parameters using only the problem parameters is also detailed. The resulting optimized GGT (oGGT) algorithm not only achieves improved dynamic regret bounds, but also outperforms all state-of-the-art algorithms on both synthetic and real-world datasets. △ Less

Submitted 13 February, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

Comments: 30 pages, 6 Figures

arXiv:2305.12596 [pdf, other]

iWarpGAN: Disentangling Identity and Style to Generate Synthetic Iris Images

Authors: Shivangi Yadav, Arun Ross

Abstract: Generative Adversarial Networks (GANs) have shown success in approximating complex distributions for synthetic image generation. However, current GAN-based methods for generating biometric images, such as iris, have certain limitations: (a) the synthetic images often closely resemble images in the training dataset; (b) the generated images lack diversity in terms of the number of unique identities… ▽ More Generative Adversarial Networks (GANs) have shown success in approximating complex distributions for synthetic image generation. However, current GAN-based methods for generating biometric images, such as iris, have certain limitations: (a) the synthetic images often closely resemble images in the training dataset; (b) the generated images lack diversity in terms of the number of unique identities represented in them; and (c) it is difficult to generate multiple images pertaining to the same identity. To overcome these issues, we propose iWarpGAN that disentangles identity and style in the context of the iris modality by using two transformation pathways: Identity Transformation Pathway to generate unique identities from the training set, and Style Transformation Pathway to extract the style code from a reference image and output an iris image using this style. By concatenating the transformed identity code and reference style code, iWarpGAN generates iris images with both inter- and intra-class variations. The efficacy of the proposed method in generating such iris DeepFakes is evaluated both qualitatively and quantitatively using ISO/IEC 29794-6 Standard Quality Metrics and the VeriEye iris matcher. Further, the utility of the synthetically generated images is demonstrated by improving the performance of deep learning based iris matchers that augment synthetic data with real data during the training process. △ Less

Submitted 29 August, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

arXiv:2302.04577 [pdf, other]

Incorporating Total Variation Regularization in the design of an intelligent Query by Humming system

Authors: Shivangi Ranjan, Vishal Srivastava

Abstract: A Query-By-Humming (QBH) system constitutes a particular case of music information retrieval where the input is a user-hummed melody and the output is the original song which contains that melody. A typical QBH system consists of melody extraction and candidate melody retrieval. For melody extraction, accurate note transcription is the key enabling technology. However, current transcription meth… ▽ More A Query-By-Humming (QBH) system constitutes a particular case of music information retrieval where the input is a user-hummed melody and the output is the original song which contains that melody. A typical QBH system consists of melody extraction and candidate melody retrieval. For melody extraction, accurate note transcription is the key enabling technology. However, current transcription methods are unable to definitively capture the melody and address inaccuracies in user-hummed queries. In this paper, we incorporate Total Variation Regularization (TVR) to denoise queries. This approach accounts for user error in humming without loss of meaningful data and reliably captures the underlying melody. For candidate melody retrieval, we employ a deep learning approach to time series classification using a Fully Convolutional Neural Network. The trained network classifies the incoming query as belonging to one of the target songs. For our experiments, we use Roger Jang's MIR-QBSH dataset which is the standard MIREX dataset. We demonstrate that inclusion of TVR denoised queries in the training set enhances the overall accuracy of the system to 93% which is higher than other state-of-the-art QBH systems. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2212.01406 [pdf, other]

doi 10.1145/3588432.3591566

ClipFace: Text-guided Editing of Textured 3D Morphable Models

Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

Abstract: We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a… ▽ More We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a self-supervised generative model to jointly synthesize expressive, textured, and articulated faces in 3D. We enable high-quality texture generation for 3D faces by adversarial self-supervised training, guided by differentiable rendering against collections of real RGB images. Controllable editing and manipulation are given by language prompts to adapt texture and expression of the 3D morphable model. To this end, we propose a neural network that predicts both texture and expression latent codes of the morphable model. Our model is trained in a self-supervised fashion by exploiting differentiable rendering and losses based on a pre-trained CLIP model. Once trained, our model jointly predicts face textures in UV-space, along with expression parameters to capture both geometry and texture changes in facial expressions in a single forward pass. We further show the applicability of our method to generate temporally changing textures for a given animation sequence. △ Less

Submitted 24 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Paper Video: https://youtu.be/toGOQqFuNmA Project website: https://shivangi-aneja.github.io/projects/clipface/

Journal ref: SIGGRAPH 2023

arXiv:2207.14534 [pdf, other]

ACM Multimedia Grand Challenge on Detecting Cheapfakes

Authors: Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Sohail Ahmed Khan, Michael Riegler, Pål Halvorsen, Chris Bregler, Balu Adsumilli

Abstract: Cheapfake is a recently coined term that encompasses non-AI (``cheap'') manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alterati… ▽ More Cheapfake is a recently coined term that encompasses non-AI (``cheap'') manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2107.05297

arXiv:2205.04841 [pdf, other]

Object Detection in Indian Food Platters using Transfer Learning with YOLOv4

Authors: Deepanshu Pandey, Purva Parmar, Gauri Toshniwal, Mansi Goel, Vishesh Agrawal, Shivangi Dhiman, Lavanya Gupta, Ganesh Bagler

Abstract: Object detection is a well-known problem in computer vision. Despite this, its usage and pervasiveness in the traditional Indian food dishes has been limited. Particularly, recognizing Indian food dishes present in a single photo is challenging due to three reasons: 1. Lack of annotated Indian food datasets 2. Non-distinct boundaries between the dishes 3. High intra-class variation. We solve these… ▽ More Object detection is a well-known problem in computer vision. Despite this, its usage and pervasiveness in the traditional Indian food dishes has been limited. Particularly, recognizing Indian food dishes present in a single photo is challenging due to three reasons: 1. Lack of annotated Indian food datasets 2. Non-distinct boundaries between the dishes 3. High intra-class variation. We solve these issues by providing a comprehensively labelled Indian food dataset- IndianFood10, which contains 10 food classes that appear frequently in a staple Indian meal and using transfer learning with YOLOv4 object detector model. Our model is able to achieve an overall mAP score of 91.8% and f1-score of 0.90 for our 10 class dataset. We also provide an extension of our 10 class dataset- IndianFood20, which contains 10 more traditional Indian food classes. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 6 pages, 7 figures, 38th IEEE International Conference on Data Engineering, 2022, DECOR Workshop

arXiv:2201.08020 [pdf, other]

A Deep Learning Approach To Estimation Using Measurements Received Over a Network

Authors: Shivangi Agarwal, Sanjit K. Kaul, Saket Anand, P. B. Sujit

Abstract: We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and n… ▽ More We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and need retransmission. They may suffer waiting delays as they traverse a network path. Works on estimation often assume knowledge of the dynamic model of the measured system, which may not be available in practice. The DNN estimator doesn't assume knowledge of the dynamic system model or the communication network. It doesn't require a history of measurements, often used by other works. The DNN estimator results in significantly smaller average estimation error than the commonly used Time-varying Kalman Filter and the Unscented Kalman Filter, in simulations of linear and nonlinear dynamic systems. The DNN need not be trained separately for different communications network settings. It is robust to errors in estimation of network delays that occur due to imperfect time synchronization between the measurement source and the estimator. Last but not the least, our simulations shed light on the rate of updates that result in low estimation error. △ Less

Submitted 12 September, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.09151 [pdf, other]

TAFIM: Targeted Adversarial Attacks against Facial Image Manipulations

Authors: Shivangi Aneja, Lev Markhasin, Matthias Niessner

Abstract: Face manipulation methods can be misused to affect an individual's privacy or to spread disinformation. To this end, we introduce a novel data-driven approach that produces image-specific perturbations which are embedded in the original images. The key idea is that these protected images prevent face manipulation by causing the manipulation model to produce a predefined manipulation target (unifor… ▽ More Face manipulation methods can be misused to affect an individual's privacy or to spread disinformation. To this end, we introduce a novel data-driven approach that produces image-specific perturbations which are embedded in the original images. The key idea is that these protected images prevent face manipulation by causing the manipulation model to produce a predefined manipulation target (uniformly colored output image in our case) instead of the actual manipulation. In addition, we propose to leverage differentiable compression approximation, hence making generated perturbations robust to common image compression. In order to prevent against multiple manipulation methods simultaneously, we further propose a novel attention-based fusion of manipulation-specific perturbations. Compared to traditional adversarial attacks that optimize noise patterns for each image individually, our generalized model only needs a single forward pass, thus running orders of magnitude faster and allowing for easy integration in image processing stacks, even on resource-constrained devices like smartphones. △ Less

Submitted 25 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: (ECCV 2022 Paper) Video: https://youtu.be/11VMOJI7tKg Project Page: https://shivangi-aneja.github.io/projects/tafim/

arXiv:2107.08973 [pdf, other]

Unsupervised Identification of Relevant Prior Cases

Authors: Shivangi Bithel, Sumitra S Malagi

Abstract: Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given… ▽ More Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given query case. Our proposed approaches are using word embeddings like word2vec, doc2vec, and sent2vec, finding cosine similarity using TF-IDF, retrieving relevant documents using BM25 scores, using the pre-trained model and SBERT to find the most similar document, and using the product of BM25 and TF-IDF scores to find the most relevant document for a given query. We compared all the methods based on precision@10, recall@10, and MRR. Based on the comparative analysis, we found that the TF-IDF score multiplied by the BM25 score gives the best result. In this paper, we have also presented the analysis that we did to improve the BM25 score. △ Less

Submitted 19 July, 2021; originally announced July 2021.

Comments: Code: https://github.com/shivangibithel/Information-Retrieval-CS6370

arXiv:2107.05297 [pdf, other]

MMSys'21 Grand Challenge on Detecting Cheapfakes

Authors: Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Niessner, Balu Adsumilli, Chris Bregler

Abstract: Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration… ▽ More Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse} of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2105.01688 [pdf, other]

Height Estimation of Children under Five Years using Depth Images

Authors: Anusua Trivedi, Mohit Jain, Nikhil Kumar Gupta, Markus Hinsche, Prashant Singh, Markus Matiaschek, Tristan Behrens, Mirco Militeri, Cameron Birge, Shivangi Kaushik, Archisman Mohapatra, Rita Chatterjee, Rahul Dodhia, Juan Lavista Ferres

Abstract: Malnutrition is a global health crisis and is the leading cause of death among children under five. Detecting malnutrition requires anthropometric measurements of weight, height, and middle-upper arm circumference. However, measuring them accurately is a challenge, especially in the global south, due to limited resources. In this work, we propose a CNN-based approach to estimate the height of stan… ▽ More Malnutrition is a global health crisis and is the leading cause of death among children under five. Detecting malnutrition requires anthropometric measurements of weight, height, and middle-upper arm circumference. However, measuring them accurately is a challenge, especially in the global south, due to limited resources. In this work, we propose a CNN-based approach to estimate the height of standing children under five years from depth images collected using a smart-phone. According to the SMART Methodology Manual [5], the acceptable accuracy for height is less than 1.4 cm. On training our deep learning model on 87131 depth images, our model achieved an average mean absolute error of 1.64% on 57064 test images. For 70.3% test images, we estimated height accurately within the acceptable 1.4 cm range. Thus, our proposed solution can accurately detect stunting (low height-for-age) in standing children below five years of age. △ Less

Submitted 30 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.02830 [pdf, other]

IndoFashion : Apparel Classification for Indian Ethnic Clothes

Authors: Pranjal Singh Rajput, Shivangi Aneja

Abstract: Cloth categorization is an important research problem that is used by e-commerce websites for displaying correct products to the end-users. Indian clothes have a large number of clothing categories both for men and women. The traditional Indian clothes like "Saree" and "Dhoti" are worn very differently from western clothes like t-shirts and jeans. Moreover, the style and patterns of ethnic clothes… ▽ More Cloth categorization is an important research problem that is used by e-commerce websites for displaying correct products to the end-users. Indian clothes have a large number of clothing categories both for men and women. The traditional Indian clothes like "Saree" and "Dhoti" are worn very differently from western clothes like t-shirts and jeans. Moreover, the style and patterns of ethnic clothes have a very different distribution from western outfits. Thus the models trained on standard cloth datasets fail miserably on ethnic outfits. To address these challenges, we introduce the first large-scale ethnic dataset of over 106k images with 15 different categories for fine-grained classification of Indian ethnic clothes. We gathered a diverse dataset from a large number of Indian e-commerce websites. We then evaluate several baselines for the cloth classification task on our dataset. In the end, we obtain 88.43% classification accuracy. We hope that our dataset would foster research in the development of several algorithms such as cloth classification, landmark detection, especially for ethnic clothes. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2103.15446 [pdf]

doi 10.1145/3393822.3432314

Deep Image Compositing

Authors: Shivangi Aneja, Soham Mazumder

Abstract: In image editing, the most common task is pasting objects from one image to the other and then eventually adjusting the manifestation of the foreground object with the background object. This task is called image compositing. But image compositing is a challenging problem that requires professional editing skills and a considerable amount of time. Not only these professionals are expensive to hire… ▽ More In image editing, the most common task is pasting objects from one image to the other and then eventually adjusting the manifestation of the foreground object with the background object. This task is called image compositing. But image compositing is a challenging problem that requires professional editing skills and a considerable amount of time. Not only these professionals are expensive to hire, but the tools (like Adobe Photoshop) used for doing such tasks are also expensive to purchase making the overall task of image compositing difficult for people without this skillset. In this work, we aim to cater to this problem by making composite images look realistic. To achieve this, we are using Generative Adversarial Networks (GANS). By training the network with a diverse range of filters applied to the images and special loss functions, the model is able to decode the color histogram of the foreground and background part of the image and also learns to blend the foreground object with the background. The hue and saturation values of the image play an important role as discussed in this paper. To the best of our knowledge, this is the first work that uses GANs for the task of image compositing. Currently, there is no benchmark dataset available for image compositing. So we created the dataset and will also make the dataset publicly available for benchmarking. Experimental results on this dataset show that our method outperforms all current state-of-the-art methods. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: ESSE 2020: Proceedings of the 2020 European Symposium on Software Engineering

Journal ref: In Proceedings of the 2020 European Symposium on Software Engineering (pp. 101-104) 2020

arXiv:2102.11276 [pdf, other]

Factorization of Fact-Checks for Low Resource Indian Languages

Authors: Shivangi Singhal, Rajiv Ratn Shah, Ponnurangam Kumaraguru

Abstract: The advancement in technology and accessibility of internet to each individual is revolutionizing the real time information. The liberty to express your thoughts without passing through any credibility check is leading to dissemination of fake content in the ecosystem. It can have disastrous effects on both individuals and society as a whole. The amplification of fake news is becoming rampant in I… ▽ More The advancement in technology and accessibility of internet to each individual is revolutionizing the real time information. The liberty to express your thoughts without passing through any credibility check is leading to dissemination of fake content in the ecosystem. It can have disastrous effects on both individuals and society as a whole. The amplification of fake news is becoming rampant in India too. Debunked information often gets republished with a replacement description, claiming it to depict some different incidence. To curb such fabricated stories, it is necessary to investigate such deduplicates and false claims made in public. The majority of studies on automatic fact-checking and fake news detection is restricted to English only. But for a country like India where only 10% of the literate population speak English, role of regional languages in spreading falsity cannot be undermined. In this paper, we introduce FactDRIL: the first large scale multilingual Fact-checking Dataset for Regional Indian Languages. We collect an exhaustive dataset across 7 months covering 11 low-resource languages. Our propose dataset consists of 9,058 samples belonging to English, 5,155 samples to Hindi and remaining 8,222 samples are distributed across various regional languages, i.e. Bangla, Marathi, Malayalam, Telugu, Tamil, Oriya, Assamese, Punjabi, Urdu, Sinhala and Burmese. We also present the detailed characterization of three M's (multi-lingual, multi-media, multi-domain) in the FactDRIL accompanied with the complete list of other varied attributes making it a unique dataset to study. Lastly, we present some potential use cases of the dataset. We expect this dataset will be a valuable resource and serve as a starting point to fight proliferation of fake news in low resource languages. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: 15 pages, 6 figures

arXiv:2101.11155 [pdf, other]

doi 10.1007/s42979-021-00455-5

Exploring multi-task multi-lingual learning of transformer models for hate speech and offensive speech identification in social media

Authors: Sudhanshu Mishra, Shivangi Prasad, Shubhanshu Mishra

Abstract: Hate Speech has become a major content moderation issue for online social media platforms. Given the volume and velocity of online content production, it is impossible to manually moderate hate speech related content on any platform. In this paper we utilize a multi-task and multi-lingual approach based on recently proposed Transformer Neural Networks to solve three sub-tasks for hate speech. Thes… ▽ More Hate Speech has become a major content moderation issue for online social media platforms. Given the volume and velocity of online content production, it is impossible to manually moderate hate speech related content on any platform. In this paper we utilize a multi-task and multi-lingual approach based on recently proposed Transformer Neural Networks to solve three sub-tasks for hate speech. These sub-tasks were part of the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages. We expand on our submission to that competition by utilizing multi-task models which are trained using three approaches, a) multi-task learning with separate task heads, b) back-translation, and c) multi-lingual training. Finally, we investigate the performance of various models and identify instances where the Transformer based models perform differently and better. We show that it is possible to to utilize different combined approaches to obtain models that can generalize easily on different languages and tasks, while trading off slight accuracy (in some cases) for a much reduced inference time compute cost. We open source an updated version of our HASOC 2019 code with the new improvements at https://github.com/socialmediaie/MTML_HateSpeech. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: "To be published in SN Computer Science at https://doi.org/10.1007/s42979-021-00455-5" "30 pages, 6 figures" "Code available at https://github.com/socialmediaie/MTML_HateSpeech"

MSC Class: 68T50 68T50 (Primary); 68T07 (Secondary) ACM Class: I.2.7

arXiv:2101.06278 [pdf, other]

COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning

Authors: Shivangi Aneja, Chris Bregler, Matthias Nießner

Abstract: Despite the recent attention to DeepFakes, one of the most prevalent ways to mislead audiences on social media is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our key insight is to leverage the grounding of image with text to distinguish out-of-c… ▽ More Despite the recent attention to DeepFakes, one of the most prevalent ways to mislead audiences on social media is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our key insight is to leverage the grounding of image with text to distinguish out-of-context scenarios that cannot be disambiguated with language alone. We propose a self-supervised training strategy where we only need a set of captioned images. At train time, our method learns to selectively align individual objects in an image with textual claims, without explicit supervision. At test time, we check if both captions correspond to the same object(s) in the image but are semantically different, which allows us to make fairly accurate out-of-context predictions. Our method achieves 85% out-of-context detection accuracy. To facilitate benchmarking of this task, we create a large-scale dataset of 200K images with 450K textual captions from a variety of news websites, blogs, and social media posts. The dataset and source code is publicly available at https://shivangi-aneja.github.io/projects/cosmos/. △ Less

Submitted 21 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: Video : https://youtu.be/riI3Cl2xy10

arXiv:2012.02374 [pdf, other]

CIT-GAN: Cyclic Image Translation Generative Adversarial Network With Application in Iris Presentation Attack Detection

Authors: Shivangi Yadav, Arun Ross

Abstract: In this work, we propose a novel Cyclic Image Translation Generative Adversarial Network (CIT-GAN) for multi-domain style transfer. To facilitate this, we introduce a Styling Network that has the capability to learn style characteristics of each domain represented in the training dataset. The Styling Network helps the generator to drive the translation of images from a source domain to a reference… ▽ More In this work, we propose a novel Cyclic Image Translation Generative Adversarial Network (CIT-GAN) for multi-domain style transfer. To facilitate this, we introduce a Styling Network that has the capability to learn style characteristics of each domain represented in the training dataset. The Styling Network helps the generator to drive the translation of images from a source domain to a reference domain and generate synthetic images with style characteristics of the reference domain. The learned style characteristics for each domain depend on both the style loss and domain classification loss. This induces variability in style characteristics within each domain. The proposed CIT-GAN is used in the context of iris presentation attack detection (PAD) to generate synthetic presentation attack (PA) samples for classes that are under-represented in the training set. Evaluation using current state-of-the-art iris PAD methods demonstrates the efficacy of using such synthetically generated PA samples for training PAD methods. Further, the quality of the synthetically generated samples is evaluated using Frechet Inception Distance (FID) score. Results show that the quality of synthetic images generated by the proposed method is superior to that of other competing methods, including StarGan. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 10 pages (8 pages + 2 reference pages) and 10 figures

Journal ref: WACV 2020

arXiv:2009.12727 [pdf, other]

Multi-timescale Representation Learning in LSTM Language Models

Authors: Shivangi Mahto, Vy A. Vo, Javier S. Turek, Alexander G. Huth

Abstract: Language models must capture statistical dependencies between words at timescales ranging from very short to very long. Earlier work has demonstrated that dependencies in natural language tend to decay with distance between words according to a power law. However, it is unclear how this knowledge can be used for analyzing or designing neural network language models. In this work, we derived a theo… ▽ More Language models must capture statistical dependencies between words at timescales ranging from very short to very long. Earlier work has demonstrated that dependencies in natural language tend to decay with distance between words according to a power law. However, it is unclear how this knowledge can be used for analyzing or designing neural network language models. In this work, we derived a theory for how the memory gating mechanism in long short-term memory (LSTM) language models can capture power law decay. We found that unit timescales within an LSTM, which are determined by the forget gate bias, should follow an Inverse Gamma distribution. Experiments then showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution. Further, we found that explicitly imposing the theoretical distribution upon the model during training yielded better language model perplexity overall, with particular improvements for predicting low-frequency (rare) words. Moreover, the explicit multi-timescale model selectively routes information about different types of words through units with different timescales, potentially improving model interpretability. These results demonstrate the importance of careful, theoretically-motivated analysis of memory and timescale in language models. △ Less

Submitted 17 March, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

MSC Class: 91F20 ACM Class: I.2.7; I.2.6

Journal ref: International Conference on Learning Representations 2021

arXiv:2007.06277 [pdf, other]

doi 10.1109/MGRS.2020.2994107

OpenStreetMap: Challenges and Opportunities in Machine Learning and Remote Sensing

Authors: John Vargas, Shivangi Srivastava, Devis Tuia, Alexandre Falcao

Abstract: OpenStreetMap (OSM) is a community-based, freely available, editable map service that was created as an alternative to authoritative ones. Given that it is edited mainly by volunteers with different mapping skills, the completeness and quality of its annotations are heterogeneous across different geographical locations. Despite that, OSM has been widely used in several applications in {Geosciences… ▽ More OpenStreetMap (OSM) is a community-based, freely available, editable map service that was created as an alternative to authoritative ones. Given that it is edited mainly by volunteers with different mapping skills, the completeness and quality of its annotations are heterogeneous across different geographical locations. Despite that, OSM has been widely used in several applications in {Geosciences}, Earth Observation and environmental sciences. In this work, we present a review of recent methods based on machine learning to improve and use OSM data. Such methods aim either 1) at improving the coverage and quality of OSM layers, typically using GIS and remote sensing technologies, or 2) at using the existing OSM layers to train models based on image data to serve applications like navigation or {land use} classification. We believe that OSM (as well as other sources of open land maps) can change the way we interpret remote sensing data and that the synergy with machine learning can scale participatory map making and its quality to the level needed to serve global and up-to-date land mapping. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:2006.15717 [pdf]

Calculating Great Britains half-hourly electrical demand from publicly available data

Authors: IA Grant Wilson, Shivangi Sharma, Joseph Day, Noah Godfrey

Abstract: Here we present a method to combine half-hourly publicly available electrical generation and interconnector operational data for Great Britain to create a timeseries that approximates its electrical demand. We term the calculated electrical demand ESPENI that is an acronym for Elexon Sum Plus Embedded Net Imports. The method adds value to the original data by combining both transmission and distri… ▽ More Here we present a method to combine half-hourly publicly available electrical generation and interconnector operational data for Great Britain to create a timeseries that approximates its electrical demand. We term the calculated electrical demand ESPENI that is an acronym for Elexon Sum Plus Embedded Net Imports. The method adds value to the original data by combining both transmission and distribution generation data into a single dataset and adding ISO 8601 compatible datetimes to increase interoperability with other timeseries data. Data cleansing is undertaken by visually flagging data errors and then using simple linear interpolation to impute values to replace the flagged data. Publishing the method allows it to be further enhanced or adapted and to be considered and critiqued by a wider community. In addition, the published raw and cleaned data is a valuable resource that saves researchers considerable time in repeating the steps presented in the method to prepare the data for further analysis. The data is a public record of the decarbonisation of Great Britains electrical system since late 2008, widely seen as an example of rapid decarbonisation of an electrical system away from fossil fuel generation to lower carbon sources. △ Less

Submitted 15 September, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

Comments: 33 pages, 3 Figures, 6 tables

arXiv:2006.11863 [pdf, other]

Generalized Zero and Few-Shot Transfer for Facial Forgery Detection

Authors: Shivangi Aneja, Matthias Nießner

Abstract: We propose Deep Distribution Transfer(DDT), a new transfer learning approach to address the problem of zero and few-shot transfer in the context of facial forgery detection. We examine how well a model (pre-)trained with one forgery creation method generalizes towards a previously unseen manipulation technique or different dataset. To facilitate this transfer, we introduce a new mixture model-base… ▽ More We propose Deep Distribution Transfer(DDT), a new transfer learning approach to address the problem of zero and few-shot transfer in the context of facial forgery detection. We examine how well a model (pre-)trained with one forgery creation method generalizes towards a previously unseen manipulation technique or different dataset. To facilitate this transfer, we introduce a new mixture model-based loss formulation that learns a multi-modal distribution, with modes corresponding to class categories of the underlying data of the source forgery method. Our core idea is to first pre-train an encoder neural network, which maps each mode of this distribution to the respective class labels, i.e., real or fake images in the source domain by minimizing wasserstein distance between them. In order to transfer this model to a new domain, we associate a few target samples with one of the previously trained modes. In addition, we propose a spatial mixup augmentation strategy that further helps generalization across domains. We find this learning strategy to be surprisingly effective at domain transfer compared to a traditional classification or even state-of-the-art domain adaptation/few-shot learning methods. For instance, compared to the best baseline, our method improves the classification accuracy by 4.88% for zero-shot and by 8.38% for the few-shot case transferred from the FaceForensics++ to Dessa dataset. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: Project page: https://shivangi-aneja.github.io/ddt/

arXiv:1912.03000 [pdf]

3D CNN with Localized Residual Connections for Hyperspectral Image Classification

Authors: Shivangi Dwivedi, Murari Mandal, Shekhar Yadav, Santosh Kumar Vipparthi

Abstract: In this paper we propose a novel 3D CNN network with localized residual connections for hyperspectral image classification. Our work chalks a comparative study with the existing methods employed for abstracting deeper features and propose a model which incorporates residual features from multiple stages in the network. The proposed architecture processes individual spatiospectral feature rich cube… ▽ More In this paper we propose a novel 3D CNN network with localized residual connections for hyperspectral image classification. Our work chalks a comparative study with the existing methods employed for abstracting deeper features and propose a model which incorporates residual features from multiple stages in the network. The proposed architecture processes individual spatiospectral feature rich cubes from hyperspectral images through 3D convolutional layers. The residual connections result in improved performance due to assimilation of both low-level and high-level features. We conduct experiments over Pavia University and Pavia Center dataset for performance analysis. We compare our method with two recent state-of-the-art methods for hyperspectral image classification method. The proposed network outperforms the existing approaches by a good margin. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Comments: 4th International Conference on Computer Vision and Image Processing (CVIP-2019)

arXiv:1910.13584 [pdf, other]

A Tunably Compliant Origami Mechanism for Dynamically Dexterous Robots

Authors: Wei-Hsi Chen, Shivangi Misra, Yuchong Gao, Young-Joo Lee, Daniel E. Koditschek, Shu Yang, Cynthia R. Sung

Abstract: We present an approach to overcoming challenges in dynamical dexterity for robots through tunable origami structures. Our work leverages a one-parameter family of flat sheet crease patterns that folds into origami bellows, whose axial compliance can be tuned to select desired stiffness. Concentrically arranged cylinder pairs reliably manifest additive stiffness, extending the tunable range by near… ▽ More We present an approach to overcoming challenges in dynamical dexterity for robots through tunable origami structures. Our work leverages a one-parameter family of flat sheet crease patterns that folds into origami bellows, whose axial compliance can be tuned to select desired stiffness. Concentrically arranged cylinder pairs reliably manifest additive stiffness, extending the tunable range by nearly an order of magnitude and achieving bulk axial stiffness spanning 200-1500 N/m using 8 mil thick polyester-coated paper. Accordingly, we design origami energy-storing springs with a stiffness of 1035 N/m each and incorporate them into a three degree-of-freedom (DOF) tendon-driven spatial pointing mechanism that exhibits trajectory tracking accuracy less than 15% rms error within a ~2 cm^3 volume. The origami springs can sustain high power throughput, enabling the robot to achieve asymptotically stable juggling for both highly elastic (1~kg resilient shot put ball) and highly damped ("medicine ball") collisions in the vertical direction with apex heights approaching 10 cm. The results demonstrate that "soft" robotic mechanisms are able to perform a controlled, dynamically actuated task. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: This paper is submitted to the IEEE Robotics and Automation Letters, in review

arXiv:1907.09695 [pdf, other]

Adaptive Compression-based Lifelong Learning

Authors: Shivangi Srivastava, Maxim Berman, Matthew B. Blaschko, Devis Tuia

Abstract: The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach… ▽ More The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task's training samples anymore. Recently, approaches like pruning networks for freeing network capacity during sequential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: Accepted at BMVC 2019

arXiv:1905.04717 [pdf, other]

Some Research Problems in Biometrics: The Future Beckons

Authors: Arun Ross, Sudipta Banerjee, Cunjian Chen, Anurag Chowdhury, Vahid Mirjalili, Renu Sharma, Thomas Swearingen, Shivangi Yadav

Abstract: The need for reliably determining the identity of a person is critical in a number of different domains ranging from personal smartphones to border security; from autonomous vehicles to e-voting; from tracking child vaccinations to preventing human trafficking; from crime scene investigation to personalization of customer service. Biometrics, which entails the use of biological attributes such as… ▽ More The need for reliably determining the identity of a person is critical in a number of different domains ranging from personal smartphones to border security; from autonomous vehicles to e-voting; from tracking child vaccinations to preventing human trafficking; from crime scene investigation to personalization of customer service. Biometrics, which entails the use of biological attributes such as face, fingerprints and voice for recognizing a person, is being increasingly used in several such applications. While biometric technology has made rapid strides over the past decade, there are several fundamental issues that are yet to be satisfactorily resolved. In this article, we will discuss some of these issues and enumerate some of the exciting challenges in this field. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Comments: 8 pages, 12 figures, ICB-2019

arXiv:1905.01752 [pdf, other]

doi 10.1016/j.rse.2019.04.014

Understanding urban landuse from the above and ground perspectives: a deep learning, multimodal solution

Authors: Shivangi Srivastava, John E. Vargas-Muñoz, Devis Tuia

Abstract: Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imag… ▽ More Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of Île-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide. △ Less

Submitted 5 May, 2019; originally announced May 2019.

Journal ref: Remote Sensing of Environment, 228, pages 129 - 143, 2019

arXiv:1904.00549 [pdf, other]

MESH: A Flexible Distributed Hypergraph Processing System

Authors: Benjamin Heintz, Rankyung Hong, Shivangi Singh, Gaurav Khandelwal, Corey Tesdahl, Abhishek Chandra

Abstract: With the rapid growth of large online social networks, the ability to analyze large-scale social structure and behavior has become critically important, and this has led to the development of several scalable graph processing systems. In reality, however, social interaction takes place not only between pairs of individuals as in the graph model, but rather in the context of multi-user groups. Rese… ▽ More With the rapid growth of large online social networks, the ability to analyze large-scale social structure and behavior has become critically important, and this has led to the development of several scalable graph processing systems. In reality, however, social interaction takes place not only between pairs of individuals as in the graph model, but rather in the context of multi-user groups. Research has shown that such group dynamics can be better modeled through a more general hypergraph model, resulting in the need to build scalable hypergraph processing systems. In this paper, we present MESH, a flexible distributed framework for scalable hypergraph processing. MESH provides an easy-to-use and expressive application programming interface that naturally extends the think like a vertex model common to many popular graph processing systems. Our framework provides a flexible implementation based on an underlying graph processing system, and enables different design choices for the key implementation issues of partitioning a hypergraph representation. We implement MESH on top of the popular GraphX graph processing framework in Apache Spark. Using a variety of real datasets and experiments conducted on a local 8-node cluster as well as a 65-node Amazon AWS testbed, we demonstrate that MESH provides flexibility based on data and application characteristics, as well as scalability with cluster size. We further show that it is competitive in performance to HyperX, another hypergraph processing system based on Spark, while providing a much simpler implementation (requiring about 5X fewer lines of code), thus showing that simplicity and flexibility need not come at the cost of performance. △ Less

Submitted 10 May, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: 14 pages, 15 figures, 2019 IEEE International Conference on Cloud Engineering (IC2E)

Showing 1–36 of 36 results for author: Shivangi