Skip to main content

Showing 1–6 of 6 results for author: Amini-Naieni, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15368  [pdf, ps, other

    cs.CV cs.AI

    Open-World Object Counting in Videos

    Authors: Niki Amini-Naieni, Andrew Zisserman

    Abstract: We introduce a new task of open-world object counting in videos: given a text description, or an image example, that specifies the target object, the objective is to enumerate all the unique instances of the target objects in the video. This task is especially challenging in crowded scenes with occlusions and similar objects, where avoiding double counting and identifying reappearances is crucial.… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2501.08083  [pdf, other

    cs.CV

    Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving

    Authors: Mert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann

    Abstract: Deep neural networks (DNNs) remain challenged by distribution shifts in complex open-world domains like automated driving (AD): Robustness against yet unknown novel objects (semantic shift) or styles like lighting conditions (covariate shift) cannot be guaranteed. Hence, reliable operation-time monitors for identification of out-of-training-data-distribution (OOD) scenarios are imperative. Current… ▽ More

    Submitted 4 April, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

  3. arXiv:2409.17109  [pdf, other

    cs.CV cs.AI

    Unveiling Ontological Commitment in Multi-Modal Foundation Models

    Authors: Mert Keser, Gesina Schwalbe, Niki Amini-Naieni, Matthias Rottmann, Alois Knoll

    Abstract: Ontological commitment, i.e., used concepts, relations, and assumptions, are a corner stone of qualitative reasoning (QR) models. The state-of-the-art for processing raw inputs, though, are deep neural networks (DNNs), nowadays often based off from multimodal foundation models. These automatically learn rich representations of concepts and respective reasoning. Unfortunately, the learned qualitati… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Qualitative Reasoning Workshop 2024 (QR2024) colocated with ECAI2024, camera-ready submission; first two authors contributed equally; 10 pages, 4 figures, 3 tables

  4. arXiv:2407.04619  [pdf, other

    cs.CV

    CountGD: Multi-Modal Open-World Counting

    Authors: Niki Amini-Naieni, Tengda Han, Andrew Zisserman

    Abstract: The goal of this paper is to improve the generality and accuracy of open-vocabulary object counting in images. To improve the generality, we repurpose an open-vocabulary detection foundation model (GroundingDINO) for the counting task, and also extend its capabilities by introducing modules to enable specifying the target object to count by visual exemplars. In turn, these new capabilities - being… ▽ More

    Submitted 10 March, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024

  5. arXiv:2312.02350  [pdf, other

    cs.CV

    Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator

    Authors: Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, Ronald Clark

    Abstract: Although Neural Radiance Fields (NeRFs) have markedly improved novel view synthesis, accurate uncertainty quantification in their image predictions remains an open problem. The prevailing methods for estimating uncertainty, including the state-of-the-art Density-aware NeRF Ensembles (DANE) [29], quantify uncertainty without calibration. This frequently leads to over- or under-confidence in image p… ▽ More

    Submitted 20 September, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: ECCV 2024

  6. arXiv:2306.01851  [pdf, other

    cs.CV

    Open-world Text-specified Object Counting

    Authors: Niki Amini-Naieni, Kiana Amini-Naieni, Tengda Han, Andrew Zisserman

    Abstract: Our objective is open-world object counting in images, where the target object class is specified by a text description. To this end, we propose CounTX, a class-agnostic, single-stage model using a transformer decoder counting head on top of pre-trained joint text-image representations. CounTX is able to count the number of instances of any class given only an image and a text description of the t… ▽ More

    Submitted 15 September, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: BMVC 2023