Skip to main content

Showing 1–9 of 9 results for author: Hickson, S

.
  1. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Jijun Jiang, Yiran Xu, Zhuofang Li, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops in an image. Conventional methods are trained on specific datasets and fail to adapt to new requirements. Recent breakthroughs in large vision-language models (VLMs) enable visual in-context learning without explicit training. However, downstream tasks with VLMs remain under explored. In this paper, we propose an effective approach… ▽ More

    Submitted 31 March, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:1906.06792  [pdf, other

    cs.CV cs.LG

    Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

    Authors: Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

    Abstract: We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on syntheti… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

  3. arXiv:1801.08985  [pdf, other

    cs.CV cs.LG

    Object category learning and retrieval with weak supervision

    Authors: Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

    Abstract: We consider the problem of retrieving objects from image data and learning to classify them into meaningful semantic categories with minimal supervision. To that end, we propose a fully differentiable unsupervised deep clustering approach to learn semantic classes in an end-to-end fashion without individual class labeling using only unlabeled object proposals. The key contributions of our work are… ▽ More

    Submitted 23 July, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

    Comments: Camera-ready version for NIPS 2017 workshop Learning with Limited Labeled Data

  4. Efficient Hierarchical Graph-Based Segmentation of RGBD Videos

    Authors: Steven Hickson, Stan Birchfield, Irfan Essa, Henrik Christensen

    Abstract: We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach. Our algorithm processes a moving window over several point clouds to group similar regions over a graph, resulting in an initial over-segmentation. These regions are then merged to yield a dendrogram using aggl… ▽ More

    Submitted 26 January, 2018; originally announced January 2018.

    Comments: CVPR 2014

  5. arXiv:1801.07388  [pdf, other

    cs.CV

    Let's Dance: Learning From Online Dance Videos

    Authors: Daniel Castro, Steven Hickson, Patsorn Sangkloy, Bhavishya Mittal, Sean Dai, James Hays, Irfan Essa

    Abstract: In recent years, deep neural network approaches have naturally extended to the video domain, in their simplest case by aggregating per-frame classifications as a baseline for action recognition. A majority of the work in this area extends from the imaging domain, leading to visual-feature heavy approaches on temporal data. To address this issue we introduce "Let's Dance", a 1000 video dataset (and… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

    Comments: first submitted November 2016

    ACM Class: I.4; I.5; I.5.1

  6. Semantic Instance Labeling Leveraging Hierarchical Segmentation

    Authors: Steven Hickson, Irfan Essa, Henrik Christensen

    Abstract: Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmen- tation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decis… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

  7. An Energy Minimization Approach to 3D Non-Rigid Deformable Surface Estimation Using RGBD Data

    Authors: Bryan Willimon, Steven Hickson, Ian Walker, Stan Birchfield

    Abstract: We propose an algorithm that uses energy mini- mization to estimate the current configuration of a non-rigid object. Our approach utilizes an RGBD image to calculate corresponding SURF features, depth, and boundary informa- tion. We do not use predetermined features, thus enabling our system to operate on unmodified objects. Our approach relies on a 3D nonlinear energy minimization framework to so… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

  8. arXiv:1707.07204  [pdf, other

    cs.CV

    Eyemotion: Classifying facial expressions in VR using eye-tracking cameras

    Authors: Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, Irfan Essa

    Abstract: One of the main challenges of social interaction in virtual reality settings is that head-mounted displays occlude a large portion of the face, blocking facial expressions and thereby restricting social engagement cues among users. Hence, auxiliary means of sensing and conveying these expressions are needed. We present an algorithm to automatically infer expressions by analyzing only a partially o… ▽ More

    Submitted 28 July, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

    Comments: Uploaded Supplementary PDF. Fixed author affiliation. Corrected typo in personalization accuracy

  9. Predicting Daily Activities From Egocentric Images Using Deep Learning

    Authors: Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa

    Abstract: We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning an… ▽ More

    Submitted 6 October, 2015; originally announced October 2015.

    Comments: 8 pages

    ACM Class: I.5; J.4; J.3

    Journal ref: ISWC '15 Proceedings of the 2015 ACM International Symposium on Wearable Computers - Pages 75-82