Skip to main content

Showing 1–24 of 24 results for author: Little, J J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22395  [pdf, ps, other

    cs.CV

    Test-Time Consistency in Vision Language Models

    Authors: Shih-Han Chou, Shivam Chandhok, James J. Little, Leonid Sigal

    Abstract: Vision-Language Models (VLMs) have achieved impressive performance across a wide range of multimodal tasks, yet they often exhibit inconsistent behavior when faced with semantically equivalent inputs, undermining their reliability and robustness. Recent benchmarks, such as MM-R3, highlight that even state-of-the-art VLMs can produce divergent predictions across semantically equivalent inputs, desp… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  2. arXiv:2503.10779  [pdf, other

    cs.CV

    The Power of One: A Single Example is All it Takes for Segmentation in VLMs

    Authors: Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

    Abstract: Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This emergent ability enables zero-shot object detection and segmentation, using techniques that rely on text-image attention maps, without necessarily training on abund… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  3. arXiv:2410.04778  [pdf, ps, other

    cs.CV

    MM-R$^3$: On (In-)Consistency of Vision-Language Models (VLMs)

    Authors: Shih-Han Chou, Shivam Chandhok, James J. Little, Leonid Sigal

    Abstract: With the advent of LLMs and variants, a flurry of research has emerged, analyzing the performance of such models across an array of tasks. While most studies focus on evaluating the capabilities of state-of-the-art (SoTA) Vision Language Models (VLMs) through task accuracy (e.g., visual question answering, grounding), our work explores the related but complementary aspect of consistency - the abil… ▽ More

    Submitted 27 June, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

  4. arXiv:2404.11732  [pdf, other

    cs.CV

    Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

    Authors: Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

    Abstract: The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we exa… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  5. arXiv:2303.07545  [pdf, other

    cs.CV

    Implicit and Explicit Commonsense for Multi-sentence Video Captioning

    Authors: Shih-Han Chou, James J. Little, Leonid Sigal

    Abstract: Existing dense or paragraph video captioning approaches rely on holistic representations of videos, possibly coupled with learned object/action representations, to condition hierarchical language decoders. However, they fundamentally lack the commonsense knowledge of the world required to reason about progression of events, causality, and even the function of certain objects within a scene. To add… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: The paper is under consideration at Computer Vision and Image Understanding Journal

  6. Framework-agnostic Semantically-aware Global Reasoning for Segmentation

    Authors: Mir Rayat Imtiaz Hossain, Leonid Sigal, James J. Little

    Abstract: Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we a… ▽ More

    Submitted 17 April, 2024; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Published in WACV 2024

    Journal ref: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024, pp. 988-998

  7. arXiv:2210.15121  [pdf, other

    cs.CV

    Bootstrapping Human Optical Flow and Pose

    Authors: Aritro Roy Arko, James J. Little, Kwang Moo Yi

    Abstract: We propose a bootstrapping framework to enhance human optical flow and pose. We show that, for videos involving humans in scenes, we can improve both the optical flow and the pose estimation quality of humans by considering the two tasks at the same time. We enhance optical flow estimates by fine-tuning them to fit the human pose estimates and vice versa. In more detail, we optimize the pose and o… ▽ More

    Submitted 28 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at BMVC 2022. Supplementary qualitative results - https://aritro30.github.io/results/. Code at https://github.com/ubc-vision/bootstrapping-human-optical-flow-and-pose

  8. arXiv:2206.11952  [pdf, other

    cs.CV cs.GR

    UNeRF: Time and Memory Conscious U-Shaped Network for Training Neural Radiance Fields

    Authors: Abiramy Kuganesan, Shih-yang Su, James J. Little, Helge Rhodin

    Abstract: Neural Radiance Fields (NeRFs) increase reconstruction detail for novel view synthesis and scene reconstruction, with applications ranging from large static scenes to dynamic human motion. However, the increased resolution and model-free nature of such neural fields come at the cost of high training times and excessive memory requirements. Recent advances improve the inference time by using comple… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  9. arXiv:2112.07088  [pdf, other

    cs.CV

    ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses

    Authors: Bastian Wandt, James J. Little, Helge Rhodin

    Abstract: Human pose estimation from single images is a challenging problem that is typically solved by supervised learning. Unfortunately, labeled training data does not yet exist for many human activities since 3D annotation requires dedicated motion capture systems. Therefore, we propose an unsupervised approach that learns to predict a 3D human pose from a single image while only being trained with 2D p… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

  10. arXiv:1912.00076  [pdf, other

    cs.CV

    OptiBox: Breaking the Limits of Proposals for Visual Grounding

    Authors: Zicong Fan, Si Yi Meng, Leonid Sigal, James J. Little

    Abstract: The problem of language grounding has attracted much attention in recent years due to its pivotal role in more general image-lingual high level reasoning tasks (e.g., image captioning, VQA). Despite the tremendous progress in visual grounding, the performance of most approaches has been hindered by the quality of bounding box proposals obtained in the early stages of all recent pipelines. To addre… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

  11. arXiv:1907.08816  [pdf, other

    cs.CV

    Pan-tilt-zoom SLAM for Sports Videos

    Authors: Jikai Lu, Jianhui Chen, James J. Little

    Abstract: We present an online SLAM system specifically designed to track pan-tilt-zoom (PTZ) cameras in highly dynamic sports such as basketball and soccer games. In these games, PTZ cameras rotate very fast and players cover large image areas. To overcome these challenges, we propose to use a novel camera model for tracking and to use rays as landmarks in mapping. Rays overcome the missing depth in pure-r… ▽ More

    Submitted 20 July, 2019; originally announced July 2019.

    Comments: 10+3 pages, BMVC 2019 accepted

  12. arXiv:1810.10658  [pdf, other

    cs.CV

    Sports Camera Calibration via Synthetic Data

    Authors: Jianhui Chen, James J. Little

    Abstract: Calibrating sports cameras is important for autonomous broadcasting and sports analysis. Here we propose a highly automatic method for calibrating sports cameras from a single image using synthetic data. First, we develop a novel camera pose engine. The camera pose engine has only three significant free parameters so that it can effectively generate a lot of camera poses and corresponding edge (i.… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

    Comments: 6 + 1 pages

  13. arXiv:1809.04729  [pdf, other

    cs.LG cs.CV stat.ML

    A Less Biased Evaluation of Out-of-distribution Sample Detectors

    Authors: Alireza Shafaei, Mark Schmidt, James J. Little

    Abstract: In the real world, a learning system could receive an input that is unlike anything it has seen during training. Unfortunately, out-of-distribution samples can lead to unpredictable behaviour. We need to know whether any given input belongs to the population distribution of the training/evaluation data to prevent unpredictable behaviour in deployed systems. A recent surge of interest in this probl… ▽ More

    Submitted 20 August, 2019; v1 submitted 12 September, 2018; originally announced September 2018.

    Comments: to appear in BMVC 2019; v2 is more compact, with more results

  14. arXiv:1809.02854  [pdf, other

    cs.CV

    Learning Sports Camera Selection from Internet Videos

    Authors: Jianhui Chen, Keyu Lu, Sijia Tian, James J. Little

    Abstract: This work addresses camera selection, the task of predicting which camera should be "on air" from multiple candidate cameras for soccer broadcast. The task is challenging because of the scarcity of learning data with all candidate views. Meanwhile, broadcast videos are freely available on the Internet (e.g. Youtube). However, these videos only record the selected camera views, omitting the other c… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.

    Comments: 8 + 2 pages, WACV2019 accepted

  15. arXiv:1801.09005  [pdf, other

    cs.CV

    A Two-point Method for PTZ Camera Calibration in Sports

    Authors: Jianhui Chen, Fangrui Zhu, James J. Little

    Abstract: Calibrating narrow field of view soccer cameras is challenging because there are very few field markings in the image. Unlike previous solutions, we propose a two-point method, which requires only two point correspondences given the prior knowledge of base location and orientation of a pan-tilt-zoom (PTZ) camera. We deploy this new calibration method to annotate pan-tilt-zoom data from soccer vide… ▽ More

    Submitted 26 January, 2018; originally announced January 2018.

    Comments: WACV 2018 accepted

  16. Exploiting temporal information for 3D pose estimation

    Authors: Mir Rayat Imtiaz Hossain, James J. Little

    Abstract: In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to predict from images directly, the top-performing approaches have shown the effectiveness of dividing the task of 3D pose estimation into two steps: using a s… ▽ More

    Submitted 12 September, 2018; v1 submitted 23 November, 2017; originally announced November 2017.

  17. arXiv:1710.10519  [pdf, other

    cs.CV cs.AI

    Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization

    Authors: Lili Meng, Frederick Tung, James J. Little, Julien Valentin, Clarence de Silva

    Abstract: Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection. Recent random forests based methods exploit randomly sampled pixel comparison features to predict 3D world locations for 2D image locations to guide the camera pose optimization. However, these image features are only sampled r… ▽ More

    Submitted 28 July, 2018; v1 submitted 28 October, 2017; originally announced October 2017.

    Comments: published as a conference paper at 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  18. arXiv:1710.07965  [pdf, other

    cs.CV

    Backtracking Regression Forests for Accurate Camera Relocalization

    Authors: Lili Meng, Jianhui Chen, Frederick Tung, James J. Little, Julien Valentin, Clarence W. de Silva

    Abstract: Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure, and loop closure detection. Recent random forests based methods directly predict 3D world locations for 2D image locations to guide the camera pose optimization. During training, each tree greedily splits the samples to minimize the spatial variance. How… ▽ More

    Submitted 22 October, 2017; originally announced October 2017.

    Comments: 8 pages. Appear in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017

  19. arXiv:1709.10230  [pdf, other

    cs.CV

    Light Cascaded Convolutional Neural Networks for Accurate Player Detection

    Authors: Keyu Lu, Jianhui Chen, James J. Little, Hangen He

    Abstract: Vision based player detection is important in sports applications. Accuracy, efficiency, and low memory consumption are desirable for real-time tasks such as intelligent broadcasting and automatic event classification. In this paper, we present a cascaded convolutional neural network (CNN) that satisfies all three of these requirements. Our method first trains a binary (player/non-player) classifi… ▽ More

    Submitted 28 September, 2017; originally announced September 2017.

    Comments: Published in proceedings of BMVC 2017

  20. arXiv:1705.03098  [pdf, other

    cs.CV

    A simple yet effective baseline for 3d human pose estimation

    Authors: Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little

    Abstract: Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-… ▽ More

    Submitted 4 August, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

    Comments: Accepted to ICCV 17

  21. arXiv:1608.01745  [pdf, other

    cs.CV

    Play and Learn: Using Video Games to Train Computer Vision Models

    Authors: Alireza Shafaei, James J. Little, Mark Schmidt

    Abstract: Video games are a compelling source of annotated data as they can readily provide fine-grained groundtruth for diverse tasks. However, it is not clear whether the synthetically generated data has enough resemblance to the real-world images to improve the performance of computer vision models in practice. We present experiments assessing the effectiveness on real-world data of systems trained on sy… ▽ More

    Submitted 15 August, 2016; v1 submitted 4 August, 2016; originally announced August 2016.

    Comments: To appear in the British Machine Vision Conference (BMVC), September 2016. -v2: fixed a typo in the references

  22. arXiv:1605.08068  [pdf, other

    cs.CV

    Real-Time Human Motion Capture with Multiple Depth Cameras

    Authors: Alireza Shafaei, James J. Little

    Abstract: Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-o… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

    Comments: Accepted to computer robot vision 2016

  23. arXiv:1411.2173  [pdf, other

    cs.CV

    Stacked Quantizers for Compositional Vector Compression

    Authors: Julieta Martinez, Holger H. Hoos, James J. Little

    Abstract: Recently, Babenko and Lempitsky introduced Additive Quantization (AQ), a generalization of Product Quantization (PQ) where a non-independent set of codebooks is used to compress vectors into small binary codes. Unfortunately, under this scheme encoding cannot be done independently in each codebook, and optimal encoding is an NP-hard problem. In this paper, we observe that PQ and AQ are both compos… ▽ More

    Submitted 8 November, 2014; originally announced November 2014.

  24. arXiv:1307.7198  [pdf, other

    cs.CV cs.AI

    Self-Learning for Player Localization in Sports Video

    Authors: Kenji Okuma, David G. Lowe, James J. Little

    Abstract: This paper introduces a novel self-learning framework that automates the label acquisition process for improving models for detecting players in broadcast footage of sports games. Unlike most previous self-learning approaches for improving appearance-based object detectors from videos, we allow an unknown, unconstrained number of target objects in a more generalized video sequence with non-static… ▽ More

    Submitted 26 July, 2013; originally announced July 2013.