-
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Authors:
Daniel Csizmadia,
Andrei Codreanu,
Victor Sim,
Vighnesh Prabhu,
Michael Lu,
Kevin Zhu,
Sean O'Brien,
Vasu Sharma
Abstract:
We present Distill CLIP (DCLIP), a fine-tuned variant of the CLIP model that enhances multimodal image-text retrieval while preserving the original model's strong zero-shot classification capabilities. CLIP models are typically constrained by fixed image resolutions and limited context, which can hinder their effectiveness in retrieval tasks that require fine-grained cross-modal understanding. DCL…
▽ More
We present Distill CLIP (DCLIP), a fine-tuned variant of the CLIP model that enhances multimodal image-text retrieval while preserving the original model's strong zero-shot classification capabilities. CLIP models are typically constrained by fixed image resolutions and limited context, which can hinder their effectiveness in retrieval tasks that require fine-grained cross-modal understanding. DCLIP addresses these challenges through a meta teacher-student distillation framework, where a cross-modal transformer teacher is fine-tuned to produce enriched embeddings via bidirectional cross-attention between YOLO-extracted image regions and corresponding textual spans. These semantically and spatially aligned global representations guide the training of a lightweight student model using a hybrid loss that combines contrastive learning and cosine similarity objectives. Despite being trained on only ~67,500 samples curated from MSCOCO, Flickr30k, and Conceptual Captions-just a fraction of CLIP's original dataset-DCLIP significantly improves image-text retrieval metrics (Recall@K, MAP), while retaining approximately 94% of CLIP's zero-shot classification performance. These results demonstrate that DCLIP effectively mitigates the trade-off between task specialization and generalization, offering a resource-efficient, domain-adaptive, and detail-sensitive solution for advanced vision-language tasks. Code available at https://anonymous.4open.science/r/DCLIP-B772/README.md.
△ Less
Submitted 15 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
Detecting Unjustified Assumptions in Subclasses via Elegant Objects Representation
Authors:
Vitaliy Korbashov,
Nikolai Kudasov,
Mikhail Olokin,
Violetta Sim
Abstract:
Elegant Objects (EO) is a programming language based on ideas of pure objects and the Decorator pattern. Bugayenko has suggested it as an intermediate representation for object-oriented programs. This paper presents a version of dynamic dispatch modelled in EO and formulates a problem of unjustified assumptions in decorator objects, which parallels similar problem in subclasses. Then, we introduce…
▽ More
Elegant Objects (EO) is a programming language based on ideas of pure objects and the Decorator pattern. Bugayenko has suggested it as an intermediate representation for object-oriented programs. This paper presents a version of dynamic dispatch modelled in EO and formulates a problem of unjustified assumptions in decorator objects, which parallels similar problem in subclasses. Then, we introduce an approach to detect such problems in EO programs via method inlining and limited property inference. Finally, we discuss prototype implementation of this approach in Scala programming language.
△ Less
Submitted 13 October, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Formalizing $\varphi$-calculus: a purely object-oriented calculus of decorated objects
Authors:
Nikolai Kudasov,
Violetta Sim
Abstract:
Many calculi exist for modelling various features of object-oriented languages. Many of them are based on $λ$-calculus and focus either on statically typed class-based languages or dynamic prototype-based languages. We formalize untyped calculus of decorated objects, informally presented by Bugayenko, which is defined in terms of objects and relies on decoration as a primary mechanism of object ex…
▽ More
Many calculi exist for modelling various features of object-oriented languages. Many of them are based on $λ$-calculus and focus either on statically typed class-based languages or dynamic prototype-based languages. We formalize untyped calculus of decorated objects, informally presented by Bugayenko, which is defined in terms of objects and relies on decoration as a primary mechanism of object extension. It is not based on $λ$-calculus, yet with only four basic syntactic constructions is just as complete. We prove the calculus is confluent (i.e. possesses Church-Rosser property), and introduce an abstract machine for call-by-name evaluation. Finally, we provide a sound translation to $λ$-calculus with records.
△ Less
Submitted 2 December, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.