Skip to main content

Showing 1–10 of 10 results for author: Chhikara, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19413  [pdf, other

    cs.CL cs.AI

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Authors: Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav

    Abstract: Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient informatio… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  2. arXiv:2502.17422  [pdf, other

    cs.CV cs.AI cs.CL

    MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Multimodal Large Language Models (MLLMs) have experienced rapid progress in visual recognition tasks in recent years. Given their potential integration into many critical applications, it is important to understand the limitations of their visual perception. In this work, we study whether MLLMs can perceive small visual details as effectively as large ones when answering questions about images. We… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: Published as a conference paper at ICLR 2025. Code at: https://github.com/saccharomycetes/mllms_know

  3. arXiv:2502.11028  [pdf, ps, other

    cs.CL cs.AI

    Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models

    Authors: Prateek Chhikara

    Abstract: Large Language Models (LLMs) show remarkable proficiency in natural language tasks, yet their frequent overconfidence-misalignment between predicted confidence and true correctness-poses significant risks in critical decision-making applications. We present a comprehensive analysis on calibration in LLMs across nine LLMs and three factual Question-Answering (QA) datasets, systematically comparing… ▽ More

    Submitted 5 June, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  4. arXiv:2310.16033  [pdf, other

    cs.CV cs.CL

    Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Multimodal Large Language Models (MLLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate… ▽ More

    Submitted 12 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 20 pages, 12 figures, 7 tables

  5. arXiv:2308.14391  [pdf, other

    cs.CV cs.CL

    FIRE: Food Image to REcipe generation

    Authors: Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip Ilievski

    Abstract: Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learne… ▽ More

    Submitted 12 May, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Published at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024

  6. arXiv:2306.05652  [pdf, other

    cs.CL cs.AI cs.HC

    Privacy Aware Question-Answering System for Online Mental Health Risk Assessment

    Authors: Prateek Chhikara, Ujjwal Pasupulety, John Marshall, Dhiraj Chaurasia, Shweta Kumari

    Abstract: Social media platforms have enabled individuals suffering from mental illnesses to share their lived experiences and find the online support necessary to cope. However, many users fail to receive genuine clinical support, thus exacerbating their symptoms. Screening users based on what they post online can aid providers in administering targeted healthcare and minimize false positives. Pre-trained… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 3 tables

  7. arXiv:2306.00228  [pdf, other

    cs.CV cs.AI cs.CL

    Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initia… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 5 figures, 7 tables

  8. arXiv:2305.05091  [pdf, other

    cs.CL cs.AI cs.HC

    Knowledge-enhanced Agents for Interactive Text Games

    Authors: Prateek Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan Francis, Kaixin Ma

    Abstract: Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding. Yet, various sequential interactive tasks, as in text-based games, ha… ▽ More

    Submitted 16 December, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Published at K-CAP '23

  9. arXiv:2301.06680  [pdf, other

    cs.CV cs.GR cs.LG

    DIGITOUR: Automatic Digital Tours for Real-Estate Properties

    Authors: Prateek Chhikara, Harshul Kuhar, Anil Goyal, Chirag Sharma

    Abstract: A virtual or digital tour is a form of virtual reality technology which allows a user to experience a specific location remotely. Currently, these virtual tours are created by following a 2-step strategy. First, a photographer clicks a 360 degree equirectangular image; then, a team of annotators manually links these images for the "walkthrough" user experience. The major challenge in the mass adop… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: Published at CODS-COMAD '23

  10. RE-Tagger: A light-weight Real-Estate Image Classifier

    Authors: Prateek Chhikara, Anil Goyal, Chirag Sharma

    Abstract: Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., b… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (DEMO TRACK)