Search | arXiv e-print repository

Teaching a Language Model to Speak the Language of Tools

Abstract: External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted… ▽ More External tool integration through function-calling is essential for practical language model applications, yet most multilingual models lack reliable tool-use capabilities in non-English languages. Even state-of-the-art multilingual models struggle with determining when to use tools and generating the structured outputs required for function calls, often exhibiting language confusion when prompted in lower-resource languages. This work presents a methodology for adapting existing language models to enable robust tool use in any target language, using Bulgarian as a case study. The approach involves continued training of the BgGPT model series (2.6B, 9B, 27B parameters) on a novel bilingual dataset of 10,035 function-calling examples designed to support standardized protocols like MCP (Model Context Protocol). The research introduces TUCAN (Tool-Using Capable Assistant Navigator), which achieves up to 28.75% improvement in function-calling accuracy over base models while preserving core language understanding, as verified on established Bulgarian benchmarks. Beyond accuracy gains, TUCAN models demonstrate production-ready response formatting with clean, parsable function calls, contrasting with the verbose and inconsistent outputs of base models. The models, evaluation framework, and dataset are released to enable replication for other languages. This work demonstrates a practical approach for extending tool-augmented capabilities beyond English-centric systems. △ Less

Submitted 29 June, 2025; originally announced June 2025.

ACM Class: I.2.7; I.2.1

arXiv:2501.13442 [pdf]

doi 10.2478/cait-2024-0035

Billion-scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering

Authors: Simeon Emanuilov, Aleksandar Dimov

Abstract: This paper presents a novel approach for similarity search with complex filtering capabilities on billion-scale datasets, optimized for CPU inference. Our method extends the classical IVF-Flat index structure to integrate multi-dimensional filters. The proposed algorithm combines dense embeddings with discrete filtering attributes, enabling fast retrieval in high-dimensional spaces. Designed speci… ▽ More This paper presents a novel approach for similarity search with complex filtering capabilities on billion-scale datasets, optimized for CPU inference. Our method extends the classical IVF-Flat index structure to integrate multi-dimensional filters. The proposed algorithm combines dense embeddings with discrete filtering attributes, enabling fast retrieval in high-dimensional spaces. Designed specifically for CPU-based systems, our disk-based approach offers a cost-effective solution for large-scale similarity search. We demonstrate the effectiveness of our method through a case study, showcasing its potential for various practical uses. △ Less

Submitted 23 January, 2025; originally announced January 2025.

Comments: 14 pages, 3 figures, published in Cybernetics and Information Technologies

MSC Class: 68P20; 62H30; 68T07 ACM Class: H.3.3; I.2.6; G.1.0

Journal ref: Cybernetics and Information Technologies, Vol. 24, No 4 (2024), pp. 45-58

arXiv:2501.11543 [pdf]

A quantitative framework for evaluating architectural patterns in ML systems

Authors: Simeon Emanuilov, Aleksandar Dimov

Abstract: Contemporary intelligent systems incorporate software components, including machine learning components. As they grow in complexity and data volume such machine learning systems face unique quality challenges like scalability and performance. To overcome them, engineers may often use specific architectural patterns, however their impact on ML systems is difficult to quantify. The effect of softwar… ▽ More Contemporary intelligent systems incorporate software components, including machine learning components. As they grow in complexity and data volume such machine learning systems face unique quality challenges like scalability and performance. To overcome them, engineers may often use specific architectural patterns, however their impact on ML systems is difficult to quantify. The effect of software architecture on traditional systems is well studied, however more work is needed in the area of machine learning systems. This study proposes a framework for quantitative assessment of architectural patterns in ML systems, focusing on scalability and performance metrics for cost-effective CPU-based inference. We integrate these metrics into a systematic evaluation process for selection of architectural patterns and demonstrate its application through a case study. The approach shown in the paper should enable software architects to objectively analyze and select optimal patterns, addressing key challenges in ML system design. △ Less

Submitted 20 January, 2025; originally announced January 2025.

arXiv:2312.03049 [pdf]

doi 10.54941/ahfe1002521

Architectural Approaches to Overcome Challenges in the Development of Data-Intensive Systems

Authors: Aleksandar Dimov, Simeon Emanuilov, Boyan Bontchev, Yavor Dankov, Tasos Papapostolu

Abstract: Orientation of modern software systems towards data-intensive processing raises new difficulties in software engineering on how to build and maintain such systems. Some of the important challenges concern the design of software architecture. In this article, we survey the fundamental challenges when designing data-intensive computing systems and present some of the most popular software architectu… ▽ More Orientation of modern software systems towards data-intensive processing raises new difficulties in software engineering on how to build and maintain such systems. Some of the important challenges concern the design of software architecture. In this article, we survey the fundamental challenges when designing data-intensive computing systems and present some of the most popular software architectural styles together with their potential to tackle these challenges. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Journal ref: Human Factors in Software and Systems Engineering, Vol. 61, 2022, 38-43

Showing 1–4 of 4 results for author: Emanuilov, S