Skip to main content

Showing 1–50 of 59 results for author: Shakhnarovich, G

.
  1. arXiv:2506.03594  [pdf, ps, other

    cs.GR cs.CV cs.LG cs.MM cs.RO

    SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

    Authors: Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vitor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, Matthew R. Walter

    Abstract: Reconstructing articulated objects prevalent in daily environments is crucial for applications in augmented/virtual reality and robotics. However, existing methods face scalability limitations (requiring 3D supervision or costly annotations), robustness issues (being susceptible to local optima), and rendering shortcomings (lacking speed or photorealism). We introduce SplArt, a self-supervised, ca… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: https://github.com/ripl/splart

  2. arXiv:2505.04612  [pdf, other

    cs.CV

    FastMap: Revisiting Dense and Scalable Structure from Motion

    Authors: Jiahao Li, Haochen Wang, Muhammad Zubair Irshad, Igor Vasiljevic, Matthew R. Walter, Vitor Campagnolo Guizilini, Greg Shakhnarovich

    Abstract: We propose FastMap, a new global structure from motion method focused on speed and simplicity. Previous methods like COLMAP and GLOMAP are able to estimate high-precision camera poses, but suffer from poor scalability when the number of matched keypoint pairs becomes large. We identify two key factors leading to this problem: poor parallelization and computationally expensive optimization steps. T… ▽ More

    Submitted 19 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: Project webpage: https://jiahao.ai/fastmap

  3. arXiv:2501.18804  [pdf, other

    cs.CV cs.LG

    Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

    Authors: Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus

    Abstract: Current methods for 3D scene reconstruction from sparse posed images employ intermediate 3D representations such as neural fields, voxel grids, or 3D Gaussians, to achieve multi-view consistent scene appearance and geometry. In this paper we introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints, given an arbitrary num… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: Project page: https://mvgd.github.io

  4. arXiv:2411.16765  [pdf, ps, other

    cs.CL cs.CV

    SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction

    Authors: Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu, Alexander H. Liu

    Abstract: Sign language processing has traditionally relied on task-specific models, limiting the potential for transfer learning across tasks. Pre-training methods for sign language have typically focused on either supervised pre-training, which cannot take advantage of unlabeled data, or context-independent (frame or video segment) representations, which ignore the effects of relationships across time in… ▽ More

    Submitted 2 June, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

    Comments: Accepted to ACL 2025

  5. arXiv:2406.06907  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

    Authors: Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu

    Abstract: A persistent challenge in sign language video processing, including the task of sign to written language translation, is how we learn representations of sign language in an effective and efficient way that preserves the important attributes of these languages, while remaining invariant to irrelevant visual differences. Informed by the nature and linguistics of signed languages, our proposed method… ▽ More

    Submitted 2 June, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL (Findings) 2025

  6. arXiv:2404.19221  [pdf, other

    cs.CV cs.CL

    Transcrib3D: 3D Referring Expression Resolution through Large Language Models

    Authors: Jiading Fang, Xiangshan Tan, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus, Gregory Shakhnarovich, Matthew R Walter

    Abstract: If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring expressions is challenging -- it requires the ability to both parse the 3D structure of the scene and correctly ground free-form language in the presence of distraction and clutter. We introduce Transcrib3D, an approach that b… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CORLW 2023

  7. arXiv:2404.02155  [pdf, other

    cs.CV

    Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

    Authors: Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initializat… ▽ More

    Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. project page https://pals.ttic.edu/p/alpha-invariance

  8. arXiv:2311.17137  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Generative Models: What Do They Know? Do They Know Things? Let's Find Out!

    Authors: Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad

    Abstract: Generative models excel at mimicking real scenes, suggesting they might inherently encode important intrinsic scene properties. In this paper, we aim to explore the following key questions: (1) What intrinsic knowledge do generative models like GANs, Autoregressive models, and Diffusion models encode? (2) Can we establish a general framework to recover intrinsic representations from these models,… ▽ More

    Submitted 16 October, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: https://intrinsic-lora.github.io/

  9. arXiv:2311.06214  [pdf, other

    cs.CV

    Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

    Authors: Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

    Abstract: Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low-quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates… ▽ More

    Submitted 23 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Project webpage: https://jiahao.ai/instant3d/

  10. arXiv:2310.17649  [pdf, other

    cs.RO cs.CV

    6-DoF Stability Field via Diffusion Models

    Authors: Takuma Yoneda, Tianchong Jiang, Gregory Shakhnarovich, Matthew R. Walter

    Abstract: A core capability for robot manipulation is reasoning over where and how to stably place objects in cluttered environments. Traditionally, robots have relied on object-specific, hand-crafted heuristics in order to perform such reasoning, with limited generalizability beyond a small number of object instances and object interaction patterns. Recent approaches instead learn notions of physical inter… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: In submission

  11. arXiv:2310.17075  [pdf, other

    cs.CV

    HyperFields: Towards Zero-Shot Generation of NeRFs from Text

    Authors: Sudarshan Babu, Richard Liu, Avery Zhou, Michael Maire, Greg Shakhnarovich, Rana Hanocka

    Abstract: We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hyperne… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to ICML 2024, Project page: https://threedle.github.io/hyperfields/

  12. arXiv:2309.02450  [pdf, other

    cs.CV

    Self-Supervised Video Transformers for Isolated Sign Language Recognition

    Authors: Marcelo Sandoval-Castaneda, Yanhong Li, Diane Brentari, Karen Livescu, Gregory Shakhnarovich

    Abstract: This paper presents an in-depth analysis of various self-supervision methods for isolated sign language recognition (ISLR). We consider four recently introduced transformer-based approaches to self-supervised learning from videos, and four pre-training data regimes, and study all the combinations on the WLASL2000 dataset. Our findings reveal that MaskFeat achieves performance superior to pose-base… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 14 pages. Submitted to WACV 2024

  13. arXiv:2305.13307  [pdf, other

    cs.CV

    NeRFuser: Large-Scale Scene Representation by NeRF Fusion

    Authors: Jiading Fang, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Adrien Gaidon, Gregory Shakhnarovich, Matthew R. Walter

    Abstract: A practical benefit of implicit visual representations like Neural Radiance Fields (NeRFs) is their memory efficiency: large scenes can be efficiently stored and shared as small neural nets instead of collections of images. However, operating on these implicit visual data structures requires extending classical image-based vision techniques (e.g., registration, blending) from image sets to neural… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Code available at https://github.com/ripl/nerfuser

  14. arXiv:2212.04981  [pdf, other

    cs.GR cs.CV

    LoopDraw: a Loop-Based Autoregressive Model for Shape Synthesis and Editing

    Authors: Nam Anh Dinh, Haochen Wang, Greg Shakhnarovich, Rana Hanocka

    Abstract: There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthes… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: accepted to AI4CC 2024 workshop at CVPR 2024. See project page at https://threedle.github.io/LoopDraw

  15. arXiv:2212.00774  [pdf, other

    cs.CV cs.LG

    Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation

    Authors: Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, Greg Shakhnarovich

    Abstract: A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D dat… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: project page https://pals.ttic.edu/p/score-jacobian-chaining

  16. arXiv:2209.03953  [pdf, other

    cs.CV cs.LG

    Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

    Authors: Xiaodan Du, Raymond A. Yeh, Nicholas Kolkin, Eli Shechtman, Greg Shakhnarovich

    Abstract: We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  17. arXiv:2207.14287  [pdf, other

    cs.CV cs.LG

    Depth Field Networks for Generalizable Multi-view Scene Representation

    Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg Shakhnarovich, Matthew Walter, Adrien Gaidon

    Abstract: Modern 3D computer vision leverages learning to boost geometric reasoning, mapping image data to classical structures such as cost volumes or epipolar constraints to improve matching. These architectures are specialized according to the particular problem, and thus require significant task-specific tuning, often leading to poor domain generalization performance. Recently, generalist Transformer ar… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. Project page: https://sites.google.com/view/tri-define

  18. arXiv:2205.12870  [pdf, other

    cs.CV cs.CL

    Open-Domain Sign Language Translation Learned from Online Video

    Authors: Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: Existing work on sign language translation - that is, translation from sign language videos into sentences in a written language - has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings. In this paper, we introduce OpenASL, a large-scale American Sign Language (ASL) - English dataset collected fro… ▽ More

    Submitted 19 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  19. arXiv:2204.03647  [pdf, other

    cs.CV cs.AI

    Adapting CLIP For Phrase Localization Without Further Training

    Authors: Jiahao Li, Greg Shakhnarovich, Raymond A. Yeh

    Abstract: Supervised or weakly supervised methods for phrase localization (textual grounding) either rely on human annotations or some other supervised models, e.g., object detectors. Obtaining these annotations is labor-intensive and may be difficult to scale in practice. We propose to leverage recent advances in contrastive language-vision models, CLIP, pre-trained on image and caption pairs collected fro… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  20. arXiv:2203.13291  [pdf, other

    cs.CV cs.CL

    Searching for fingerspelled content in American Sign Language

    Authors: Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: Natural language processing for sign language video - including tasks like recognition, translation, and search - is crucial for making artificial intelligence technologies accessible to deaf individuals, and is gaining research interest in recent years. In this paper, we address the problem of searching for fingerspelled key-words or key phrases in raw sign language videos. This is an important t… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  21. arXiv:2203.13215  [pdf, other

    cs.CV cs.GR

    Neural Neighbor Style Transfer

    Authors: Nicholas Kolkin, Michal Kucera, Sylvain Paris, Daniel Sykora, Eli Shechtman, Greg Shakhnarovich

    Abstract: We propose Neural Neighbor Style Transfer (NNST), a pipeline that offers state-of-the-art quality, generalization, and competitive efficiency for artistic style transfer. Our approach is based on explicitly replacing neural features extracted from the content input (to be stylized) with those from a style exemplar, then synthesizing the final output based on these rearranged features. While the sp… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Code for NNST-Opt available at https://github.com/nkolkin13/NeuralNeighborStyleTransfer

  22. arXiv:2202.05920  [pdf, ps, other

    cs.LG stat.ML

    Boosting Barely Robust Learners: A New Perspective on Adversarial Robustness

    Authors: Avrim Blum, Omar Montasser, Greg Shakhnarovich, Hongyang Zhang

    Abstract: We present an oracle-efficient algorithm for boosting the adversarial robustness of barely robust learners. Barely robust learning algorithms learn predictors that are adversarially robust only on a small fraction $β\ll 1$ of the data distribution. Our proposed notion of barely robust learning requires robustness with respect to a "larger" perturbation set; which we show is necessary for strongly… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  23. arXiv:2112.03325  [pdf, other

    cs.CV cs.RO

    Self-Supervised Camera Self-Calibration from Video

    Authors: Jiading Fang, Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Greg Shakhnarovich, Adrien Gaidon, Matthew R. Walter

    Abstract: Camera calibration is integral to robotics and computer vision algorithms that seek to infer geometric properties of the scene from visual input streams. In practice, calibration is a laborious procedure requiring specialized data collection and careful tuning. This process must be repeated whenever the parameters of the camera change, which can be a frequent occurrence for mobile robots and auton… ▽ More

    Submitted 1 March, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: The project page: https://sites.google.com/ttic.edu/self-sup-self-calib

  24. arXiv:2104.01291  [pdf, other

    cs.CV cs.CL

    Fingerspelling Detection in American Sign Language

    Authors: Bowen Shi, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: Fingerspelling, in which words are signed letter by letter, is an important component of American Sign Language. Most previous work on automatic fingerspelling recognition has assumed that the boundaries of fingerspelling regions in signing videos are known beforehand. In this paper, we consider the task of fingerspelling detection in raw, untrimmed sign language videos. This is an important step… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  25. arXiv:2104.00152  [pdf, other

    cs.CV

    Full Surround Monodepth from Multiple Cameras

    Authors: Vitor Guizilini, Igor Vasiljevic, Rares Ambrus, Greg Shakhnarovich, Adrien Gaidon

    Abstract: Self-supervised monocular depth and ego-motion estimation is a promising approach to replace or supplement expensive depth sensors such as LiDAR for robotics applications like autonomous driving. However, most research in this area focuses on a single monocular camera or stereo pairs that cover only a fraction of the scene around the vehicle. In this work, we extend monocular self-supervised depth… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  26. arXiv:2012.07287  [pdf, other

    cs.CV

    Information-Theoretic Segmentation by Inpainting Error Maximization

    Authors: Pedro Savarese, Sunnie S. Y. Kim, Michael Maire, Greg Shakhnarovich, David McAllester

    Abstract: We study image segmentation from an information-theoretic perspective, proposing a novel adversarial method that performs unsupervised segmentation by partitioning images into maximally independent sets. More specifically, we group image pixels into foreground and background, with the goal of minimizing predictability of one set from the other. An easily computed loss drives a greedy search proces… ▽ More

    Submitted 29 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Published as a conference paper at CVPR 2021

  27. arXiv:2008.06630  [pdf, other

    cs.CV cs.LG cs.RO

    Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion

    Authors: Igor Vasiljevic, Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Wolfram Burgard, Greg Shakhnarovich, Adrien Gaidon

    Abstract: Self-supervised learning has emerged as a powerful tool for depth and ego-motion estimation, leading to state-of-the-art results on benchmark datasets. However, one significant limitation shared by current methods is the assumption of a known parametric camera model -- usually the standard pinhole geometry -- leading to failure when applied to imaging systems that deviate significantly from this a… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  28. arXiv:2006.16705  [pdf, other

    cs.CV cs.LG

    Classification Confidence Estimation with Test-Time Data-Augmentation

    Authors: Yuval Bahat, Gregory Shakhnarovich

    Abstract: Machine learning plays an increasingly significant role in many aspects of our lives (including medicine, transportation, security, justice and other domains), making the potential consequences of false predictions increasingly devastating. These consequences may be mitigated if we can automatically flag such false predictions and potentially assign them to alternative, more reliable mechanisms, t… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  29. arXiv:2005.14386  [pdf, other

    cs.CV cs.CL

    Controlling Length in Image Captioning

    Authors: Ruotian Luo, Greg Shakhnarovich

    Abstract: We develop and evaluate captioning models that allow control of caption length. Our models can leverage this control to generate captions of different style and descriptiveness.

    Submitted 29 May, 2020; originally announced May 2020.

  30. arXiv:2004.01849  [pdf, other

    cs.CV

    Pixel Consensus Voting for Panoptic Segmentation

    Authors: Haochen Wang, Ruotian Luo, Michael Maire, Greg Shakhnarovich

    Abstract: The core of our approach, Pixel Consensus Voting, is a framework for instance segmentation based on the Generalized Hough transform. Pixels cast discretized, probabilistic votes for the likely regions that contain instance centroids. At the detected peaks that emerge in the voting heatmap, backprojection is applied to collect pixels and produce instance masks. Unlike a sliding window detector that… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  31. arXiv:2003.13170  [pdf, other

    cs.CV

    Space-Time-Aware Multi-Resolution Video Enhancement

    Authors: Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

    Abstract: We consider the problem of space-time super-resolution (ST-SR): increasing spatial resolution of video frames and simultaneously interpolating frames to increase the frame rate. Modern approaches handle these axes one at a time. In contrast, our proposed model called STARnet super-resolves jointly in space and time. This allows us to leverage mutually informative relationships between time and spa… ▽ More

    Submitted 29 March, 2020; originally announced March 2020.

    Comments: To appear in CVPR2020

  32. arXiv:2003.12633  [pdf, other

    cs.CV

    Detection and Description of Change in Visual Streams

    Authors: Davis Gilton, Ruotian Luo, Rebecca Willett, Greg Shakhnarovich

    Abstract: This paper presents a framework for the analysis of changes in visual streams: ordered sequences of images, possibly separated by significant time gaps. We propose a new approach to incorporating unlabeled data into training to generate natural language descriptions of change. We also develop a framework for estimating the time of change in visual stream. We use learned representations for change… ▽ More

    Submitted 9 April, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

  33. arXiv:2003.11038  [pdf, other

    cs.CV cs.GR cs.LG

    Deformable Style Transfer

    Authors: Sunnie S. Y. Kim, Nicholas Kolkin, Jason Salavon, Gregory Shakhnarovich

    Abstract: Both geometry and texture are fundamental aspects of visual style. Existing style transfer methods, however, primarily focus on texture, almost entirely ignoring geometry. We propose deformable style transfer (DST), an optimization-based approach that jointly stylizes the texture and geometry of a content image to better match a style image. Unlike previous geometry-aware stylization methods, our… ▽ More

    Submitted 19 July, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 (21 pages, 11 figures including the supplementary material)

  34. arXiv:2002.11848  [pdf, other

    cs.CL cs.CV

    Analysis of diversity-accuracy tradeoff in image captioning

    Authors: Ruotian Luo, Gregory Shakhnarovich

    Abstract: We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions. Our results show that 1) simple decoding by naive sampling, coupled with low temperature is a competitive and fast method to produce diverse and accurate caption sets; 2) training with CIDEr-based reward using R… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  35. arXiv:1908.10546  [pdf, other

    cs.CV cs.CL

    Fingerspelling recognition in the wild with iterative visual attention

    Authors: Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  36. arXiv:1908.00463  [pdf, other

    cs.CV

    DIODE: A Dense Indoor and Outdoor DEpth Dataset

    Authors: Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, Gregory Shakhnarovich

    Abstract: We introduce DIODE, a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements. DIODE (Dense Indoor/Outdoor DEpth) is the first public dataset to include RGBD images of indoor and outdoor scenes obtained with one sensor suite. This is in contrast to existing datasets that focus on just one domain/scene type and employ different sen… ▽ More

    Submitted 29 August, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

  37. arXiv:1904.12785  [pdf, other

    cs.CV

    Style Transfer by Relaxed Optimal Transport and Self-Similarity

    Authors: Nicholas Kolkin, Jason Salavon, Greg Shakhnarovich

    Abstract: Style transfer algorithms strive to render the content of one image using the style of another. We propose Style Transfer by Relaxed Optimal Transport and Self-Similarity (STROTSS), a new optimization-based style transfer algorithm. We extend our method to allow user-specified point-to-point or region-to-region control over visual similarity between the style image and the output. Such guidance ca… ▽ More

    Submitted 9 October, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: To Appear CVPR 2019, Webdemo Available at http://style.ttic.edu

  38. arXiv:1904.05677  [pdf, other

    cs.CV

    Deep Back-Projection Networks for Single Image Super-resolution

    Authors: Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

    Abstract: Previous feed-forward architectures of recently proposed deep super-resolution networks learn the features of low-resolution inputs and the non-linear mapping from those to a high-resolution output. However, this approach does not fully address the mutual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), the winner of two image super-resolution chall… ▽ More

    Submitted 12 June, 2020; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: To appear in TPAMI 2020. The code is available at https://github.com/alterzero/DBPN-Pytorch arXiv admin note: substantial text overlap with arXiv:1803.02735

  39. arXiv:1903.10128  [pdf, other

    cs.CV

    Recurrent Back-Projection Network for Video Super-Resolution

    Authors: Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

    Abstract: We proposed a novel architecture for the problem of video super-resolution. We integrate spatial and temporal contexts from continuous video frames using a recurrent encoder-decoder module, that fuses multi-frame information with the more traditional, single frame super-resolution path for the target frame. In contrast to most prior work where frames are pooled together by stacking or warping, our… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: To appear in CVPR2019

  40. arXiv:1902.00236  [pdf, other

    cs.LG cs.CV stat.ML

    Natural and Adversarial Error Detection using Invariance to Image Transformations

    Authors: Yuval Bahat, Michal Irani, Gregory Shakhnarovich

    Abstract: We propose an approach to distinguish between correct and incorrect image classifications. Our approach can detect misclassifications which either occur $\it{unintentionally}$ ("natural errors"), or due to $\it{intentional~adversarial~attacks}$ ("adversarial errors"), both in a single $\it{unified~framework}$. Our approach is based on the observation that correctly classified images tend to exhibi… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

  41. arXiv:1810.11438  [pdf, other

    cs.CV cs.CL

    American Sign Language fingerspelling recognition in the wild

    Authors: Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: We address the problem of American Sign Language fingerspelling recognition in the wild, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike… ▽ More

    Submitted 17 February, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: accepted in SLT 2018

  42. arXiv:1804.02009  [pdf, other

    cs.CV

    Regularizing Deep Networks by Modeling and Predicting Label Structure

    Authors: Mohammadreza Mostajabi, Michael Maire, Gregory Shakhnarovich

    Abstract: We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an autoencoder over the set of annotations. Training thereby becomes a two-phase procedure. The first phase models labels with an autoencoder. The second phase trains… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.

    Comments: to appear at CVPR 2018

  43. arXiv:1804.00657  [pdf, other

    cs.CV

    Confidence from Invariance to Image Transformations

    Authors: Yuval Bahat, Gregory Shakhnarovich

    Abstract: We develop a technique for automatically detecting the classification errors of a pre-trained visual classifier. Our method is agnostic to the form of the classifier, requiring access only to classifier responses to a set of inputs. We train a parametric binary classifier (error/correct) on a representation derived from a set of classifier responses generated from multiple copies of the same input… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

  44. arXiv:1803.11316  [pdf, other

    cs.CV

    Task-Driven Super Resolution: Object Detection in Low-resolution Images

    Authors: Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

    Abstract: We consider how image super resolution (SR) can contribute to an object detection task in low-resolution images. Intuitively, SR gives a positive impact on the object detection task. While several previous works demonstrated that this intuition is correct, SR and detector are optimized independently in these works. This paper proposes a novel framework to train a deep neural network where the SR s… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

  45. arXiv:1803.04376  [pdf, other

    cs.CV

    Discriminability objective for training descriptive captions

    Authors: Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich

    Abstract: One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we o… ▽ More

    Submitted 8 June, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: CVPR2018

  46. arXiv:1803.02735  [pdf, other

    cs.CV

    Deep Back-Projection Networks For Super-Resolution

    Authors: Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita

    Abstract: The feed-forward architectures of recently proposed deep super-resolution networks learn representations of low-resolution inputs, and the non-linear mapping from those to high-resolution output. However, this approach does not fully address the mutual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), that exploit iterative up- and down-sampling laye… ▽ More

    Submitted 7 March, 2018; originally announced March 2018.

    Comments: To appear in CVPR2018

  47. arXiv:1712.04850  [pdf, other

    cs.CV

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Authors: Huaizu Jiang, Erik Learned-Miller, Gustav Larsson, Michael Maire, Greg Shakhnarovich

    Abstract: As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion i… ▽ More

    Submitted 2 April, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

  48. arXiv:1710.01949  [pdf, other

    cs.CL cs.CV eess.AS

    Semantic speech retrieval with a visually grounded model of untranscribed speech

    Authors: Herman Kamper, Gregory Shakhnarovich, Karen Livescu

    Abstract: There is growing interest in models that can learn from unlabelled speech paired with visual context. This setting is relevant for low-resource speech processing, robotics, and human language acquisition research. Here we study how a visually grounded speech model, trained on images of scenes paired with spoken captions, captures aspects of semantics. We use an external image tagger to generate so… ▽ More

    Submitted 31 October, 2018; v1 submitted 5 October, 2017; originally announced October 2017.

    Comments: 10 pages, 3 figures, 5 tables; accepted to the IEEE/ACM Transactions on Audio, Speech and Language Processing

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing 27 (2019) 89-98

  49. arXiv:1708.02212  [pdf, other

    cs.CV

    Training Deep Networks to be Spatially Sensitive

    Authors: Nicholas Kolkin, Gregory Shakhnarovich, Eli Shechtman

    Abstract: In many computer vision tasks, for example saliency prediction or semantic segmentation, the desired output is a foreground map that predicts pixels where some criteria is satisfied. Despite the inherently spatial nature of this task commonly used learning objectives do not incorporate the spatial relationships between misclassified pixels and the underlying ground truth. The Weighted F-measure, a… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.

    Comments: ICCV 2017

  50. arXiv:1703.08136  [pdf, other

    cs.CL cs.CV

    Visually grounded learning of keyword prediction from untranscribed speech

    Authors: Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu

    Abstract: During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whet… ▽ More

    Submitted 25 May, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

    Comments: 5 pages, 3 figures, 5 tables; small updates, added link to code; accepted to Interspeech 2017