Skip to main content

Showing 1–5 of 5 results for author: Markovtsev, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:1905.06782  [pdf, ps, other

    cs.SE cs.LG cs.SI stat.ML

    Identifying collaborators in large codebases

    Authors: Waren Long, Vadim Markovtsev, Hugo Mougard, Egor Bulychev, Jan Hula

    Abstract: The way developers collaborate inside and particularly across teams often escapes management's attention, despite a formal organization with designated teams being defined. Observability of the actual, organically formed engineering structure provides decision makers invaluable additional tools to manage their talent pool. To identify existing inter and intra-team interactions - and suggest releva… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: 4 pages; Workshop on Machine Learning for Software Engineering 2019

  2. arXiv:1904.00935  [pdf, other

    cs.LG cs.SE stat.ML

    STYLE-ANALYZER: fixing code style inconsistencies with interpretable unsupervised algorithms

    Authors: Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, Egor Bulychev

    Abstract: Source code reviews are manual, time-consuming, and expensive. Human involvement should be focused on analyzing the most relevant aspects of the program, such as logic and maintainability, rather than amending style, syntax, or formatting defects. Some tools with linting capabilities can format code automatically and report various stylistic violations for supported programming languages. They are… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: 10 pages; Mining Software Repositories 2019

  3. arXiv:1805.11651  [pdf, other

    cs.CL cs.PL

    Splitting source code identifiers using Bidirectional LSTM Recurrent Neural Network

    Authors: Vadim Markovtsev, Waren Long, Egor Bulychev, Romain Keramitas, Konstantin Slavnov, Gabor Markowski

    Abstract: Programmers make rich use of natural language in the source code they write through identifiers and comments. Source code identifiers are selected from a pool of tokens which are strongly related to the meaning, naming conventions, and context. These tokens are often combined to produce more precise and obvious designations. Such multi-part identifiers count for 97% of all naming tokens in the Pub… ▽ More

    Submitted 19 July, 2018; v1 submitted 26 May, 2018; originally announced May 2018.

    Comments: 8 pages

  4. Public Git Archive: a Big Code dataset for all

    Authors: Vadim Markovtsev, Waren Long

    Abstract: The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research potential enabled by the available open source code is clearly substantial, no significant large-scale open source code datasets exist. In this paper, we prese… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

  5. arXiv:1704.00135  [pdf, other

    cs.PL cs.CL

    Topic modeling of public repositories at scale using names in source code

    Authors: Vadim Markovtsev, Eiso Kant

    Abstract: Programming languages themselves have a limited number of reserved keywords and character based tokens that define the language specification. However, programmers have a rich use of natural language within their code through comments, text literals and naming entities. The programmer defined names that can be found in source code are a rich source of information to build a high level understandin… ▽ More

    Submitted 20 May, 2017; v1 submitted 1 April, 2017; originally announced April 2017.

    Comments: 11 pages