Skip to main content

Showing 1–3 of 3 results for author: Kollipara, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.21922  [pdf, ps, other

    cs.CV

    Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding

    Authors: Vahid Mirjalili, Ramin Giahi, Sriram Kollipara, Akshay Kekuda, Kehui Yao, Kai Zhao, Jianpeng Xu, Kaushiki Nag, Sinduja Subramaniam, Topojoy Biswas, Evren Korpeoglu, Kannan Achan

    Abstract: Spatial understanding is a critical capability for vision foundation models. While recent advances in large vision models or vision-language models (VLMs) have expanded recognition capabilities, most benchmarks emphasize localization accuracy rather than whether models capture how objects are arranged and related within a scene. This gap is consequential; effective scene understanding requires not… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 4 pages, NeurIPS Workshop SpaVLE

  2. arXiv:2507.17080  [pdf, ps, other

    cs.IR cs.AI cs.CV

    VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings

    Authors: Ramin Giahi, Kehui Yao, Sriram Kollipara, Kai Zhao, Vahid Mirjalili, Jianpeng Xu, Topojoy Biswas, Evren Korpeoglu, Kannan Achan

    Abstract: Multimodal learning plays a critical role in e-commerce recommendation platforms today, enabling accurate recommendations and product understanding. However, existing vision-language models, such as CLIP, face key challenges in e-commerce recommendation systems: 1) Weak object-level alignment, where global image embeddings fail to capture fine-grained product attributes, leading to suboptimal retr… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: Accepted at RecSys 2025; DOI:https://doi.org/10.1145/3705328.3748064

  3. arXiv:2506.21934  [pdf, ps, other

    cs.IR cs.CV

    CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design

    Authors: Najmeh Forouzandehmehr, Reza Yousefi Maragheh, Sriram Kollipara, Kai Zhao, Topojoy Biswas, Evren Korpeoglu, Kannan Achan

    Abstract: Automated content-aware layout generation -- the task of arranging visual elements such as text, logos, and underlays on a background canvas -- remains a fundamental yet under-explored problem in intelligent design systems. While recent advances in deep generative models and large language models (LLMs) have shown promise in structured content generation, most existing approaches lack grounding in… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    ACM Class: I.3.3; I.2.11; H.5.2