Search | arXiv e-print repository

Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation

Authors: Daniel Csizmadia, Andrei Codreanu, Victor Sim, Vighnesh Prabhu, Michael Lu, Kevin Zhu, Sean O'Brien, Vasu Sharma

Abstract: We present Distill CLIP (DCLIP), a fine-tuned variant of the CLIP model that enhances multimodal image-text retrieval while preserving the original model's strong zero-shot classification capabilities. CLIP models are typically constrained by fixed image resolutions and limited context, which can hinder their effectiveness in retrieval tasks that require fine-grained cross-modal understanding. DCL… ▽ More We present Distill CLIP (DCLIP), a fine-tuned variant of the CLIP model that enhances multimodal image-text retrieval while preserving the original model's strong zero-shot classification capabilities. CLIP models are typically constrained by fixed image resolutions and limited context, which can hinder their effectiveness in retrieval tasks that require fine-grained cross-modal understanding. DCLIP addresses these challenges through a meta teacher-student distillation framework, where a cross-modal transformer teacher is fine-tuned to produce enriched embeddings via bidirectional cross-attention between YOLO-extracted image regions and corresponding textual spans. These semantically and spatially aligned global representations guide the training of a lightweight student model using a hybrid loss that combines contrastive learning and cosine similarity objectives. Despite being trained on only ~67,500 samples curated from MSCOCO, Flickr30k, and Conceptual Captions-just a fraction of CLIP's original dataset-DCLIP significantly improves image-text retrieval metrics (Recall@K, MAP), while retaining approximately 94% of CLIP's zero-shot classification performance. These results demonstrate that DCLIP effectively mitigates the trade-off between task specialization and generalization, offering a resource-efficient, domain-adaptive, and detail-sensitive solution for advanced vision-language tasks. Code available at https://anonymous.4open.science/r/DCLIP-B772/README.md. △ Less

Submitted 15 June, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

arXiv:2209.01825 [pdf, ps, other]

Detecting Unjustified Assumptions in Subclasses via Elegant Objects Representation

Authors: Vitaliy Korbashov, Nikolai Kudasov, Mikhail Olokin, Violetta Sim

Abstract: Elegant Objects (EO) is a programming language based on ideas of pure objects and the Decorator pattern. Bugayenko has suggested it as an intermediate representation for object-oriented programs. This paper presents a version of dynamic dispatch modelled in EO and formulates a problem of unjustified assumptions in decorator objects, which parallels similar problem in subclasses. Then, we introduce… ▽ More Elegant Objects (EO) is a programming language based on ideas of pure objects and the Decorator pattern. Bugayenko has suggested it as an intermediate representation for object-oriented programs. This paper presents a version of dynamic dispatch modelled in EO and formulates a problem of unjustified assumptions in decorator objects, which parallels similar problem in subclasses. Then, we introduce an approach to detect such problems in EO programs via method inlining and limited property inference. Finally, we discuss prototype implementation of this approach in Scala programming language. △ Less

Submitted 13 October, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2204.07454 [pdf, ps, other]

Formalizing $\varphi$-calculus: a purely object-oriented calculus of decorated objects

Authors: Nikolai Kudasov, Violetta Sim

Abstract: Many calculi exist for modelling various features of object-oriented languages. Many of them are based on $λ$-calculus and focus either on statically typed class-based languages or dynamic prototype-based languages. We formalize untyped calculus of decorated objects, informally presented by Bugayenko, which is defined in terms of objects and relies on decoration as a primary mechanism of object ex… ▽ More Many calculi exist for modelling various features of object-oriented languages. Many of them are based on $λ$-calculus and focus either on statically typed class-based languages or dynamic prototype-based languages. We formalize untyped calculus of decorated objects, informally presented by Bugayenko, which is defined in terms of objects and relies on decoration as a primary mechanism of object extension. It is not based on $λ$-calculus, yet with only four basic syntactic constructions is just as complete. We prove the calculus is confluent (i.e. possesses Church-Rosser property), and introduce an abstract machine for call-by-name evaluation. Finally, we provide a sound translation to $λ$-calculus with records. △ Less

Submitted 2 December, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Showing 1–3 of 3 results for author: Sim, V