Skip to main content

Showing 1–14 of 14 results for author: Zharov, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.20934  [pdf, other

    cs.SE

    Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring

    Authors: Fraol Batole, Abhiram Bellur, Malinda Dilhara, Mohammed Raihan Ullah, Yaroslav Zharov, Timofey Bryksin, Kai Ishikawa, Haifeng Chen, Masaharu Morimoto, Shota Motoura, Takeo Hosomi, Tien N. Nguyen, Hridesh Rajan, Nikolaos Tsantalis, Danny Dig

    Abstract: MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes ar… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 12 pages, 2 figures

  2. arXiv:2503.14443  [pdf, other

    cs.LG cs.SE

    EnvBench: A Benchmark for Automated Environment Setup

    Authors: Aleksandra Eliseeva, Alexander Kovrigin, Ilia Kholkin, Egor Bogomolov, Yaroslav Zharov

    Abstract: Recent advances in Large Language Models (LLMs) have enabled researchers to focus on practical repository-level tasks in software engineering domain. In this work, we consider a cornerstone task for automating work with software repositories-environment setup, i.e., a task of configuring a repository-specific development environment on a system. Existing studies on environment setup introduce inno… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted at the DL4Code workshop at ICLR'25

  3. arXiv:2410.14393  [pdf, other

    cs.LG cs.AI

    Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

    Authors: Konstantin Grotov, Artem Borzilov, Maksim Krivobok, Timofey Bryksin, Yaroslav Zharov

    Abstract: Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. With the rise of code-fluent Large Language Models empowered with agentic techniques, smart bug-fixing tools with a high level of autonom… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 System Demonstrations

  4. arXiv:2410.12046  [pdf, other

    cs.SE cs.HC cs.LG

    Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings

    Authors: Petr Tsvetkov, Aleksandra Eliseeva, Danny Dig, Alexander Bezzubov, Yaroslav Golubev, Timofey Bryksin, Yaroslav Zharov

    Abstract: When a Commit Message Generation (CMG) system is integrated into the IDEs and other products at JetBrains, we perform online evaluation based on user acceptance of the generated messages. However, performing online experiments with every change to a CMG system is troublesome, as each iteration affects users and requires time to collect enough statistics. On the other hand, offline evaluation, a pr… ▽ More

    Submitted 8 January, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: 10 pages, 5 figures (Published at ICSE'2025)

  5. arXiv:2406.04464  [pdf, other

    cs.SE cs.AI cs.LG

    On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing

    Authors: Alexander Kovrigin, Aleksandra Eliseeva, Yaroslav Zharov, Timofey Bryksin

    Abstract: Recent advancements in code-fluent Large Language Models (LLMs) enabled the research on repository-level code editing. In such tasks, the model navigates and modifies the entire codebase of a project according to request. Hence, such tasks require efficient context retrieval, i.e., navigating vast codebases to gather relevant context. Despite the recognized importance of context retrieval, existin… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2405.01559  [pdf, other

    cs.SE cs.LG

    Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks

    Authors: Konstantin Grotov, Sergey Titov, Yaroslav Zharov, Timofey Bryksin

    Abstract: Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. There are many tools for bug fixing; however, they are generally targeted at the classical linear code. With the rise of code-fluent Larg… ▽ More

    Submitted 26 March, 2024; originally announced May 2024.

    Comments: accepted at 1st ACM CHI Workshop on Human-Notebook Interactions

  7. Tool-Augmented LLMs as a Universal Interface for IDEs

    Authors: Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov

    Abstract: Modern-day Integrated Development Environments (IDEs) have come a long way from the early text editing utilities to the complex programs encompassing thousands of functions to help developers. However, with the increasing number of efficiency-enhancing tools incorporated, IDEs gradually became sophisticated software with a steep learning curve. The rise of the Large Language Models (LLMs) capable… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: First IDE Workshop, ICSE'24

  8. arXiv:2312.08976  [pdf, other

    cs.SE cs.LG

    Dynamic Retrieval-Augmented Generation

    Authors: Anton Shapkin, Denis Litvinov, Yaroslav Zharov, Egor Bogomolov, Timur Galimzyanov, Timofey Bryksin

    Abstract: Current state-of-the-art large language models are effective in generating high-quality text and encapsulating a broad spectrum of world knowledge. These models, however, often hallucinate and lack locally relevant factual data. Retrieval-augmented approaches were introduced to overcome these problems and provide more accurate responses. Typically, the retrieved information is simply appended to t… ▽ More

    Submitted 20 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages

  9. arXiv:2303.14429  [pdf, other

    cs.CV

    Shot Noise Reduction in Radiographic and Tomographic Multi-Channel Imaging with Self-Supervised Deep Learning

    Authors: Yaroslav Zharov, Evelina Ametova, Rebecca Spiecker, Tilo Baumbach, Genoveva Burca, Vincent Heuveline

    Abstract: Noise is an important issue for radiographic and tomographic imaging techniques. It becomes particularly critical in applications where additional constraints force a strong reduction of the Signal-to-Noise Ratio (SNR) per image. These constraints may result from limitations on the maximum available flux or permissible dose and the associated restriction on exposure time. Often, a high SNR per ima… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: To be submitted to Optics Express

  10. arXiv:2303.14089  [pdf, other

    cs.CV

    Optimizing the Procedure of CT Segmentation Labeling

    Authors: Yaroslav Zharov, Tilo Baumbach, Vincent Heuveline

    Abstract: In Computed Tomography, machine learning is often used for automated data processing. However, increasing model complexity is accompanied by increasingly large volume datasets, which in turn increases the cost of model training. Unlike most work that mitigates this by advancing model architectures and training algorithms, we consider the annotation procedure and its effect on the model performance… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Under review

  11. arXiv:2302.12562  [pdf, other

    cs.CV cs.LG

    A Knowledge Distillation framework for Multi-Organ Segmentation of Medaka Fish in Tomographic Image

    Authors: Jwalin Bhatt, Yaroslav Zharov, Sungho Suh, Tilo Baumbach, Vincent Heuveline, Paul Lukowicz

    Abstract: Morphological atlases are an important tool in organismal studies, and modern high-throughput Computed Tomography (CT) facilities can produce hundreds of full-body high-resolution volumetric images of organisms. However, creating an atlas from these volumes requires accurate organ segmentation. In the last decade, machine learning approaches have achieved incredible results in image segmentation t… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: Accepted at IEEE International Symposium on Biomedical Imaging 2023 (ISBI 2023)

  12. arXiv:2203.09372  [pdf, other

    eess.IV cs.CV

    Using the Order of Tomographic Slices as a Prior for Neural Networks Pre-Training

    Authors: Yaroslav Zharov, Alexey Ershov, Tilo Baumbach, Vincent Heuveline

    Abstract: The technical advances in Computed Tomography (CT) allow to obtain immense amounts of 3D data. For such datasets it is very costly and time-consuming to obtain the accurate 3D segmentation markup to train neural networks. The annotation is typically done for a limited number of 2D slices, followed by an interpolation. In this work, we propose a pre-training method SortingLoss. It performs pre-trai… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Under review

  13. arXiv:2011.03353  [pdf, other

    cs.CV cs.LG

    Self-Supervised Learning for Biological Sample Localization in 3D Tomographic Images

    Authors: Yaroslav Zharov, Alexey Ershov, Tilo Baumbach, Vincent Heuveline

    Abstract: In synchrotron-based Computed Tomography (CT) there is a trade-off between spatial resolution, field of view and speed of positioning and alignment of samples. The problem is even more prominent for high-throughput tomography--an automated setup, capable of scanning large batches of samples without human interaction. As a result, in many applications, only 20-30% of the reconstructed volume contai… ▽ More

    Submitted 11 January, 2023; v1 submitted 6 November, 2020; originally announced November 2020.

  14. arXiv:1811.02783  [pdf, other

    cs.LG stat.ML

    YASENN: Explaining Neural Networks via Partitioning Activation Sequences

    Authors: Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin

    Abstract: We introduce a novel approach to feed-forward neural network interpretation based on partitioning the space of sequences of neuron activations. In line with this approach, we propose a model-specific interpretation method, called YASENN. Our method inherits many advantages of model-agnostic distillation, such as an ability to focus on the particular input region and to express an explanation in te… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.