Skip to main content

Showing 1–4 of 4 results for author: Sokhandan, N

.
  1. arXiv:2407.16145  [pdf

    cs.LG cs.CV

    Improved Few-Shot Image Classification Through Multiple-Choice Questions

    Authors: Dipika Khullar, Emmett Goodman, Negin Sokhandan

    Abstract: Through a simple multiple choice language prompt a VQA model can operate as a zero-shot image classifier, producing a classification label. Compared to typical image encoders, VQA models offer an advantage: VQA-produced image embeddings can be infused with the most relevant visual information through tailored language prompts. Nevertheless, for most tasks, zero-shot VQA performance is lacking, eit… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  2. arXiv:2405.01130  [pdf, other

    cs.CV

    Automated Virtual Product Placement and Assessment in Images using Diffusion Models

    Authors: Mohammad Mahmudul Alam, Negin Sokhandan, Emmett Goodman

    Abstract: In Virtual Product Placement (VPP) applications, the discrete integration of specific brand products into images or videos has emerged as a challenging yet important task. This paper introduces a novel three-stage fully automated VPP system. In the first stage, a language-guided image segmentation model identifies optimal regions within images for product inpainting. In the second stage, Stable Di… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at the 6th AI for Content Creation (AI4CC) workshop at CVPR 2024

  3. arXiv:2202.01414  [pdf, other

    cs.CV

    DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts

    Authors: Wenzhen Zhu, Negin Sokhandan, Guang Yang, Sujitha Martin, Suchitra Sathyanarayana

    Abstract: Digitization of newspapers is of interest for many reasons including preservation of history, accessibility and search ability, etc. While digitization of documents such as scientific articles and magazines is prevalent in literature, one of the main challenges for digitization of newspaper lies in its complex layout (e.g. articles spanning multiple columns, text interrupted by images) analysis, w… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: 7 pages, 6 figures, The Thirty-Fourth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-22), Collocated with AAAI-22

  4. arXiv:2007.01899  [pdf, other

    cs.CV cs.LG

    A Few-Shot Sequential Approach for Object Counting

    Authors: Negin Sokhandan, Pegah Kamousi, Alejandro Posada, Eniola Alese, Negar Rostamzadeh

    Abstract: In this work, we address the problem of few-shot multi-class object counting with point-level annotations. The proposed technique leverages a class agnostic attention mechanism that sequentially attends to objects in the image and extracts their relevant features. This process is employed on an adapted prototypical-based few-shot approach that uses the extracted features to classify each one eithe… ▽ More

    Submitted 7 July, 2020; v1 submitted 3 July, 2020; originally announced July 2020.