Skip to main content

Showing 1–50 of 50 results for author: Pirsiavash, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15263  [pdf, ps, other

    cs.CV cs.LG

    gen2seg: Generative Models Enable Generalizable Instance Segmentation

    Authors: Om Khangaonkar, Hamed Pirsiavash

    Abstract: By pretraining to synthesize coherent images from perturbed inputs, generative models inherently learn to understand object boundaries and scene compositions. How can we repurpose these generative representations for general-purpose perceptual organization? We finetune Stable Diffusion and MAE (encoder+decoder) for category-agnostic instance segmentation using our instance coloring loss exclusivel… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Website: https://reachomk.github.io/gen2seg/

  2. arXiv:2412.04668  [pdf, other

    cs.CV cs.AI

    Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation

    Authors: Ali Abbasi, Shima Imani, Chenyang An, Gayathri Mahalingam, Harsh Shrivastava, Maurice Diesendruck, Hamed Pirsiavash, Pramod Sharma, Soheil Kolouri

    Abstract: With the rapid scaling of neural networks, data storage and communication demands have intensified. Dataset distillation has emerged as a promising solution, condensing information from extensive datasets into a compact set of synthetic samples by solving a bilevel optimization problem. However, current methods face challenges in computational efficiency, particularly with high-resolution data and… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  3. arXiv:2410.09687  [pdf, other

    cs.LG cs.AI cs.CL

    MoIN: Mixture of Introvert Experts to Upcycle an LLM

    Authors: Ajinkya Tejankar, KL Navaneet, Ujjawal Panchal, Kossar Pourahmadi, Hamed Pirsiavash

    Abstract: The goal of this paper is to improve (upcycle) an existing large language model without the prohibitive requirements of continued pre-training of the full-model. The idea is to split the pre-training data into semantically relevant groups and train an expert on each subset. An expert takes the form of a lightweight adapter added on the top of a frozen base model. During inference, an incoming quer… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  4. arXiv:2406.19301  [pdf, other

    cs.LG

    MCNC: Manifold-Constrained Reparameterization for Neural Compression

    Authors: Chayne Thrash, Ali Abbasi, Reed Andreas, Parsa Nooralinejad, Soroush Abbasi Koohpayegani, Hamed Pirsiavash, Soheil Kolouri

    Abstract: The outstanding performance of large foundational models across diverse tasks, from computer vision to speech and natural language processing, has significantly increased their demand. However, storing and transmitting these models poses significant challenges due to their massive size (e.g., 750GB for Llama 3.1 405B). Recent literature has focused on compressing the original weights or reducing t… ▽ More

    Submitted 24 April, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2403.07142  [pdf, other

    cs.CV cs.CL cs.LG

    One Category One Prompt: Dataset Distillation using Diffusion Models

    Authors: Ali Abbasi, Ashkan Shahbazi, Hamed Pirsiavash, Soheil Kolouri

    Abstract: The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive datasets into a much smaller yet representative set of synthetic samples. However, traditional dataset distillation approaches often struggle to scale effectively wit… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  6. arXiv:2312.02548  [pdf, other

    cs.CV

    GeNIe: Generative Hard Negative Images Through Diffusion

    Authors: Soroush Abbasi Koohpayegani, Anuj Singh, K L Navaneet, Hamed Pirsiavash, Hadi Jamali-Rad

    Abstract: Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to combine two contr… ▽ More

    Submitted 6 November, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Our code is available https://github.com/UCDvision/GeNIe

  7. arXiv:2311.18159  [pdf, other

    cs.CV

    CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

    Authors: KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash

    Abstract: 3D Gaussian Splatting (3DGS) is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we… ▽ More

    Submitted 26 September, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Code is available at https://github.com/UCDvision/compact3d

  8. arXiv:2311.11995  [pdf, other

    cs.LG cs.AI cs.CR

    BrainWash: A Poisoning Attack to Forget in Continual Learning

    Authors: Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash, Soheil Kolouri

    Abstract: Continual learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose… ▽ More

    Submitted 23 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

  9. arXiv:2310.02556  [pdf, other

    cs.CL cs.CV

    NOLA: Compressing LoRA using Linear Combination of Random Basis

    Authors: Soroush Abbasi Koohpayegani, KL Navaneet, Parsa Nooralinejad, Soheil Kolouri, Hamed Pirsiavash

    Abstract: Fine-tuning Large Language Models (LLMs) and storing them for each downstream task or domain is impractical because of the massive model size (e.g., 350GB in GPT-3). Current literature, such as LoRA, showcases the potential of low-rank modifications to the original weights of an LLM, enabling efficient adaptation and storage for task-specific models. These methods can reduce the number of paramete… ▽ More

    Submitted 29 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Our code is available here: https://github.com/UCDvision/NOLA

  10. arXiv:2310.02544  [pdf, other

    cs.CV

    SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy Efficiency of Inference Efficient Vision Transformers

    Authors: KL Navaneet, Soroush Abbasi Koohpayegani, Essam Sleiman, Hamed Pirsiavash

    Abstract: Recently, there has been a lot of progress in reducing the computation of deep models at inference time. These methods can reduce both the computational needs and power usage of deep models. Some of these approaches adaptively scale the compute based on the input instance. We show that such models can be vulnerable to a universal adversarial patch attack, where the attacker optimizes for a patch t… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/UCDvision/SlowFormer

  11. arXiv:2304.12210  [pdf, other

    cs.LG cs.CV

    A Cookbook of Self-Supervised Learning

    Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

    Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  12. arXiv:2304.01482  [pdf, other

    cs.CV

    Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning

    Authors: Ajinkya Tejankar, Maziar Sanjabi, Qifan Wang, Sinong Wang, Hamed Firooz, Hamed Pirsiavash, Liang Tan

    Abstract: Recently, self-supervised learning (SSL) was shown to be vulnerable to patch-based data poisoning backdoor attacks. It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a backdoor that the adversary can exploit. This work aims to defend self-supervised learning against such attacks. We use a three-st… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  13. arXiv:2210.14797  [pdf, other

    cs.LG cs.CV

    Is Multi-Task Learning an Upper Bound for Continual Learning?

    Authors: Zihao Wu, Huy Tran, Hamed Pirsiavash, Soheil Kolouri

    Abstract: Continual and multi-task learning are common machine learning approaches to learning from multiple tasks. The existing works in the literature often assume multi-task learning as a sensible performance upper bound for various continual learning algorithms. While this assumption is empirically verified for different continual learning benchmarks, it is not rigorously justified. Moreover, it is imag… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  14. arXiv:2206.08898  [pdf, other

    cs.CV

    SimA: Simple Softmax-free Attention for Vision Transformers

    Authors: Soroush Abbasi Koohpayegani, Hamed Pirsiavash

    Abstract: Recently, vision transformers have become very popular. However, deploying them in many applications is computationally expensive partly due to the Softmax layer in the attention block. We introduce a simple but effective, Softmax-free attention block, SimA, which normalizes query and key matrices with simple $\ell_1$-norm instead of using Softmax layer. Then, the attention block in SimA is a simp… ▽ More

    Submitted 23 March, 2024; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Code is available here: https://github.com/UCDvision/sima

  15. arXiv:2206.08477  [pdf, other

    cs.CV cs.CR cs.LG

    Backdoor Attacks on Vision Transformers

    Authors: Akshayvarun Subramanya, Aniruddha Saha, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

    Abstract: Vision Transformers (ViT) have recently demonstrated exemplary performance on a variety of vision tasks and are being used as an alternative to CNNs. Their design is based on a self-attention mechanism that processes images as a sequence of patches, which is quite different compared to CNNs. Hence it is interesting to study if ViTs are vulnerable to backdoor attacks. Backdoor attacks happen when a… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  16. arXiv:2206.08464  [pdf, other

    cs.LG

    PRANC: Pseudo RAndom Networks for Compacting deep models

    Authors: Parsa Nooralinejad, Ali Abbasi, Soroush Abbasi Koohpayegani, Kossar Pourahmadi Meibodi, Rana Muhammad Shahroz Khan, Soheil Kolouri, Hamed Pirsiavash

    Abstract: We demonstrate that a deep model can be reparametrized as a linear combination of several randomly initialized and frozen deep models in the weight space. During training, we seek local minima that reside within the subspace spanned by these random models (i.e., `basis' networks). Our framework, PRANC, enables significant compaction of a deep model. The model can be reconstructed using a single sc… ▽ More

    Submitted 28 August, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

  17. arXiv:2204.05432  [pdf, other

    cs.CV cs.AI cs.LG

    A Simple Approach to Adversarial Robustness in Few-shot Image Classification

    Authors: Akshayvarun Subramanya, Hamed Pirsiavash

    Abstract: Few-shot image classification, where the goal is to generalize to tasks with limited labeled data, has seen great progress over the years. However, the classifiers are vulnerable to adversarial examples, posing a question regarding their generalization capabilities. Recent works have tried to combine meta-learning approaches with adversarial training to improve the robustness of few-shot classifie… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  18. arXiv:2203.06514  [pdf, other

    cs.LG cs.AI cs.CV

    Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

    Authors: Ali Abbasi, Parsa Nooralinejad, Vladimir Braverman, Hamed Pirsiavash, Soheil Kolouri

    Abstract: Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming… ▽ More

    Submitted 8 July, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

  19. arXiv:2202.09284  [pdf, other

    cs.LG

    Amenable Sparse Network Investigator

    Authors: Saeed Damadi, Erfan Nouri, Hamed Pirsiavash

    Abstract: We present "Amenable Sparse Network Investigator" (ASNI) algorithm that utilizes a novel pruning strategy based on a sigmoid function that induces sparsity level globally over the course of one single round of training. The ASNI algorithm fulfills both tasks that current state-of-the-art strategies can only do one of them. The ASNI algorithm has two subalgorithms: 1) ASNI-I, 2) ASNI-II. ASNI-I lea… ▽ More

    Submitted 12 January, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

  20. arXiv:2201.05131  [pdf, other

    cs.CV

    SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

    Authors: K L Navaneet, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

    Abstract: Feature regression is a simple way to distill large neural network models to smaller ones. We show that with simple changes to the network architecture, regression can outperform more complex state-of-the-art approaches for knowledge distillation from self-supervised models. Surprisingly, the addition of a multi-layer perceptron head to the CNN backbone is beneficial even if used only during disti… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

    Comments: In BMVC 2021. Code available at: https://github.com/UCDvision/simreg

  21. arXiv:2112.13884  [pdf, other

    cs.CV

    A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

    Authors: Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz

    Abstract: Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large training datasets, then the resulting aligned models perform well on zero-shot classification as downstream tasks2. In this paper, we focus on teasing out what parts of the language… ▽ More

    Submitted 5 January, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

  22. arXiv:2112.04607  [pdf, other

    cs.CV

    Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning

    Authors: KL Navaneet, Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Kossar Pourahmadi, Akshayvarun Subramanya, Hamed Pirsiavash

    Abstract: We are interested in representation learning in self-supervised, supervised, and semi-supervised settings. Some recent self-supervised learning methods like mean-shift (MSF) cluster images by pulling the embedding of a query image to be closer to its nearest neighbors (NNs). Since most NNs are close to the query by design, the averaging may not affect the embedding of the query much. On the other… ▽ More

    Submitted 14 October, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Code is available at https://github.com/UCDvision/CMSF. arXiv admin note: text overlap with arXiv:2110.10309

  23. arXiv:2112.04585  [pdf, other

    cs.CV cs.LG

    MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

    Authors: Rex Liu, Huanle Zhang, Hamed Pirsiavash, Xin Liu

    Abstract: We propose MASTAF, a Model-Agnostic Spatio-Temporal Attention Fusion network for few-shot video classification. MASTAF takes input from a general video spatial and temporal representation,e.g., using 2D CNN, 3D CNN, and Video Transformer. Then, to make the most of such representations, we use self- and cross-attention models to highlight the critical spatio-temporal region to increase the inter-cl… ▽ More

    Submitted 16 October, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: WACV 2023

  24. arXiv:2111.15667  [pdf, other

    cs.CV

    Adaptive Token Sampling For Efficient Vision Transformers

    Authors: Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, Sunando Sengupta, Hamid Reza Vaezi Joze, Eric Sommerlade, Hamed Pirsiavash, Juergen Gall

    Abstract: While state-of-the-art vision transformer models achieve promising results in image classification, they are computationally expensive and require many GFLOPs. Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Ada… ▽ More

    Submitted 26 July, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: ECCV 2022

  25. arXiv:2110.12033  [pdf, other

    cs.CV cs.LG

    A Simple Baseline for Low-Budget Active Learning

    Authors: Kossar Pourahmadi, Parsa Nooralinejad, Hamed Pirsiavash

    Abstract: Active learning focuses on choosing a subset of unlabeled data to be labeled. However, most such methods assume that a large subset of the data can be annotated. We are interested in low-budget active learning where only a small subset (e.g., 0.2% of ImageNet) can be annotated. Instead of proposing a new query strategy to iteratively sample batches of unlabeled data given an initial pool, we learn… ▽ More

    Submitted 1 April, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: 20 pages, 16 tables; additional experiments

  26. arXiv:2110.10309  [pdf, other

    cs.CV

    Constrained Mean Shift for Representation Learning

    Authors: Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed Pirsiavash

    Abstract: We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to g… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  27. arXiv:2110.00527  [pdf, other

    cs.CV

    Consistent Explanations by Contrastive Learning

    Authors: Vipin Pillai, Soroush Abbasi Koohpayegani, Ashley Ouligian, Dennis Fong, Hamed Pirsiavash

    Abstract: Post-hoc explanation methods, e.g., Grad-CAM, enable humans to inspect the spatial regions responsible for a particular network decision. However, it is shown that such explanations are not always consistent with human priors, such as consistency across image transformations. Given an interpretation algorithm, e.g., Grad-CAM, we introduce a novel training method to train the model to produce more… ▽ More

    Submitted 8 April, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: To be published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  28. arXiv:2105.10123  [pdf, other

    cs.CV

    Backdoor Attacks on Self-Supervised Learning

    Authors: Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed Pirsiavash

    Abstract: Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to bac… ▽ More

    Submitted 8 June, 2022; v1 submitted 21 May, 2021; originally announced May 2021.

    Comments: CVPR 2022 (Oral)

  29. arXiv:2105.07269  [pdf, other

    cs.CV

    Mean Shift for Self-Supervised Learning

    Authors: Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

    Abstract: Most recent self-supervised learning (SSL) algorithms learn features by contrasting between instances of images or by clustering the images and then contrasting between the image clusters. We introduce a simple mean-shift algorithm that learns representations by grouping images together without contrasting between them or adopting much of prior on the structure of the clusters. We simply "shift" t… ▽ More

    Submitted 10 September, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

  30. arXiv:2012.09259  [pdf, other

    cs.CV

    ISD: Self-Supervised Learning by Iterative Similarity Distillation

    Authors: Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Vipin Pillai, Paolo Favaro, Hamed Pirsiavash

    Abstract: Recently, contrastive learning has achieved great results in self-supervised learning, where the main idea is to push two augmentations of an image (positive pairs) closer compared to other random images (negative pairs). We argue that not all random images are equal. Hence, we introduce a self supervised learning algorithm where we use a soft similarity for the negative images rather than a binar… ▽ More

    Submitted 10 September, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

  31. arXiv:2011.00597  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

    Authors: Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox

    Abstract: Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities. The method consists o… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 27 pages, 5 figures, 19 tables. To be published in the 34th conference on Neural Information Processing Systems (NeurIPS 2020). The first two authors contributed equally to this work

    ACM Class: I.2.7; I.2.10

  32. arXiv:2010.14713  [pdf, other

    cs.CV cs.LG

    CompRess: Self-Supervised Learning by Compressing Representations

    Authors: Soroush Abbasi Koohpayegani, Ajinkya Tejankar, Hamed Pirsiavash

    Abstract: Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a mod… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

  33. arXiv:1912.11903  [pdf, other

    cs.CV

    A simple baseline for domain adaptation using rotation prediction

    Authors: Ajinkya Tejankar, Hamed Pirsiavash

    Abstract: Recently, domain adaptation has become a hot research area with lots of applications. The goal is to adapt a model trained in one domain to another domain with scarce annotated data. We propose a simple yet effective method based on self-supervised learning that outperforms or is on par with most state-of-the-art algorithms, e.g. adversarial domain adaptation. Our method involves two phases: predi… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

  34. arXiv:1910.00068  [pdf, other

    cs.CV

    Role of Spatial Context in Adversarial Robustness for Object Detection

    Authors: Aniruddha Saha, Akshayvarun Subramanya, Koninika Patil, Hamed Pirsiavash

    Abstract: The benefits of utilizing spatial context in fast object detection algorithms have been studied extensively. Detectors increase inference speed by doing a single forward pass per image which means they implicitly use contextual reasoning for their predictions. However, one can show that an adversary can design adversarial patches which do not overlap with any objects of interest in the scene and e… ▽ More

    Submitted 17 April, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: CVPR 2020 Workshop on Adversarial Machine Learning in Computer Vision

  35. arXiv:1910.00033  [pdf, other

    cs.CV

    Hidden Trigger Backdoor Attacks

    Authors: Aniruddha Saha, Akshayvarun Subramanya, Hamed Pirsiavash

    Abstract: With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pa… ▽ More

    Submitted 20 December, 2019; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: AAAI 2020 - Main Technical Track (Oral)

  36. arXiv:1906.10842  [pdf, other

    cs.CV

    Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs

    Authors: Soheil Kolouri, Aniruddha Saha, Hamed Pirsiavash, Heiko Hoffmann

    Abstract: The unprecedented success of deep neural networks in many applications has made these networks a prime target for adversarial exploitation. In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). We introduce the concept of Universal Litmus Patterns (ULPs), which enable one to reveal backdoor attacks by fee… ▽ More

    Submitted 14 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: CVPR 2020 Oral

  37. arXiv:1812.02843  [pdf, other

    cs.CV cs.LG

    Fooling Network Interpretation in Image Classification

    Authors: Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash

    Abstract: Deep neural networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches are highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversar… ▽ More

    Submitted 24 September, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

    Comments: Accepted at ICCV 2019

  38. arXiv:1805.00385  [pdf, other

    cs.CV

    Boosting Self-Supervised Learning via Knowledge Transfer

    Authors: Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, Hamed Pirsiavash

    Abstract: In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we presen… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

  39. arXiv:1708.06734  [pdf, other

    cs.CV

    Representation Learning by Learning to Count

    Authors: Mehdi Noroozi, Hamed Pirsiavash, Paolo Favaro

    Abstract: We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such re… ▽ More

    Submitted 22 August, 2017; originally announced August 2017.

    Comments: ICCV 2017(oral)

  40. arXiv:1611.08258  [pdf, other

    cs.CV

    Weakly Supervised Cascaded Convolutional Networks

    Authors: Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash, Luc Van Gool

    Abstract: Object detection is a challenging task in visual understanding domain, and even more so if the supervision is to be weak. Recently, few efforts to handle the task without expensive human annotations is established by promising deep neural network. A new architecture of cascaded networks is proposed to learn a convolutional neural network (CNN) under such conditions. We introduce two such architect… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

  41. arXiv:1610.09003  [pdf, other

    cs.CV cs.LG cs.MM

    Cross-Modal Scene Networks

    Authors: Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

    Abstract: People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize scenes well, they also learn an intermediate representation not aligned across modalit… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: See more at http://cmplaces.csail.mit.edu/. arXiv admin note: text overlap with arXiv:1607.07295

  42. arXiv:1609.02612  [pdf, other

    cs.CV cs.GR cs.LG

    Generating Videos with Scene Dynamics

    Authors: Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

    Abstract: We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this mod… ▽ More

    Submitted 26 October, 2016; v1 submitted 8 September, 2016; originally announced September 2016.

    Comments: NIPS 2016. See more at http://web.mit.edu/vondrick/tinyvideo/

  43. arXiv:1608.03217  [pdf, other

    cs.CV

    DeepCAMP: Deep Convolutional Action & Attribute Mid-Level Patterns

    Authors: Ali Diba, Ali Mohammad Pazandeh, Hamed Pirsiavash, Luc Van Gool

    Abstract: The recognition of human actions and the determination of human attributes are two tasks that call for fine-grained classification. Indeed, often rather small and inconspicuous objects and features have to be detected to tell their classes apart. In order to deal with this challenge, we propose a novel convolutional neural network that mines mid-level image patches that are sufficiently dedicated… ▽ More

    Submitted 10 August, 2016; originally announced August 2016.

    Comments: in CVPR 2016

  44. arXiv:1607.07295  [pdf, other

    cs.CV

    Learning Aligned Cross-Modal Representations from Weakly Aligned Data

    Authors: Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

    Abstract: People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned ac… ▽ More

    Submitted 25 July, 2016; originally announced July 2016.

    Comments: Conference paper at CVPR 2016

  45. arXiv:1604.07480  [pdf, other

    cs.CV

    Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks

    Authors: Arsalan Mousavian, Hamed Pirsiavash, Jana Kosecka

    Abstract: Multi-scale deep CNNs have been used successfully for problems mapping each pixel to a label, such as depth estimation and semantic segmentation. It has also been shown that such architectures are reusable and can be used for multiple tasks. These networks are typically trained independently for each task by varying the output layer(s) and training objective. In this work we present a new model fo… ▽ More

    Submitted 19 September, 2016; v1 submitted 25 April, 2016; originally announced April 2016.

  46. arXiv:1504.08023  [pdf, other

    cs.CV

    Anticipating Visual Representations from Unlabeled Video

    Authors: Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

    Abstract: Anticipating actions and objects before they start or appear is a difficult problem in computer vision with several real-world applications. This task is challenging partly because it requires leveraging extensive knowledge of the world that is difficult to write down. We believe that a promising resource for efficiently learning this knowledge is through readily available unlabeled video. We pres… ▽ More

    Submitted 29 November, 2016; v1 submitted 29 April, 2015; originally announced April 2015.

    Comments: CVPR 2016

  47. arXiv:1502.05461  [pdf, other

    cs.CV

    Visualizing Object Detection Features

    Authors: Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba

    Abstract: We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, alth… ▽ More

    Submitted 18 February, 2015; originally announced February 2015.

    Comments: In submission to IJCV

  48. arXiv:1410.4627  [pdf, other

    cs.CV

    Learning visual biases from human imagination

    Authors: Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

    Abstract: Although the human visual system can recognize many concepts under challenging conditions, it still has some biases. In this paper, we investigate whether we can extract these biases and transfer them into a machine recognition system. We introduce a novel method that, inspired by well-known tools in human psychophysics, estimates the biases that the human visual system might use for recognition,… ▽ More

    Submitted 16 November, 2015; v1 submitted 16 October, 2014; originally announced October 2014.

    Comments: To appear at NIPS 2015

  49. arXiv:1406.5472  [pdf, other

    cs.CV

    Predicting Motivations of Actions by Leveraging Text

    Authors: Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba

    Abstract: Understanding human actions is a key problem in computer vision. However, recognizing actions is only the first step of understanding what a person is doing. In this paper, we introduce the problem of predicting why a person has performed an action in images. This problem has many applications in human activity understanding, such as anticipating or explaining an action. To study this problem, we… ▽ More

    Submitted 29 November, 2016; v1 submitted 20 June, 2014; originally announced June 2014.

    Comments: CVPR 2016

  50. arXiv:1311.6510  [pdf, other

    cs.CV cs.LG stat.ML

    Are all training examples equally valuable?

    Authors: Agata Lapedriza, Hamed Pirsiavash, Zoya Bylinskii, Antonio Torralba

    Abstract: When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may… ▽ More

    Submitted 25 November, 2013; originally announced November 2013.