Search | arXiv e-print repository

doi 10.1109/ICSME58944.2024.00044

Broken Windows: Exploring the Applicability of a Controversial Theory on Code Quality

Authors: Diomidis Spinellis, Panos Louridas, Maria Kechagia, Tushar Sharma

Abstract: Is the quality of existing code correlated with the quality of subsequent changes? According to the (controversial) broken windows theory, which inspired this study, disorder sets descriptive norms and signals behavior that further increases it. From a large code corpus, we examine whether code history does indeed affect the evolution of code quality. We examine C code quality metrics and Java cod… ▽ More Is the quality of existing code correlated with the quality of subsequent changes? According to the (controversial) broken windows theory, which inspired this study, disorder sets descriptive norms and signals behavior that further increases it. From a large code corpus, we examine whether code history does indeed affect the evolution of code quality. We examine C code quality metrics and Java code smells in specific files, and see whether subsequent commits by developers continue on that path. We check whether developers tailor the quality of their commits based on the quality of the file they commit to. Our results show that history matters, that developers behave differently depending on some aspects of the code quality they encounter, and that programming style inconsistency is not necessarily related to structural qualities. These findings have implications for both software practice and research. Software practitioners can emphasize current quality practices as these influence the code that will be developed in the future. Researchers in the field may replicate and extend the study to improve our understanding of the theory and its practical implications on artifacts, processes, and people. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 15 pages, 5 figures, to be published in the proceedings of ICSME '24: 40th IEEE International Conference on Software Maintenance and Evolution

Journal ref: ICSME 2024: International Conference on Software Maintenance and Evolution

arXiv:2402.01339 [pdf, other]

doi 10.1145/3711667

Improving Sequential Recommendations with LLMs

Authors: Artun Boz, Wouter Zorgdrager, Zoe Kotti, Jesse Harte, Panos Louridas, Dietmar Jannach, Vassilios Karakoidas, Marios Fragkoulis

Abstract: The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design thre… ▽ More The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design three orthogonal approaches and hybrids of those to leverage the power of LLMs in different ways. In addition, we investigate the potential of each approach by focusing on its comprising technical aspects and determining an array of alternative choices for each one. We conduct extensive experiments on three datasets and explore a large variety of configurations, including different language models and baseline recommendation models, to obtain a comprehensive picture of the performance of each approach. Among other observations, we highlight that initializing state-of-the-art sequential recommendation models such as BERT4Rec or SASRec with embeddings obtained from an LLM can lead to substantial performance gains in terms of accuracy. Furthermore, we find that fine-tuning an LLM for recommendation tasks enables it to learn not only the tasks, but also concepts of a domain to some extent. We also show that fine-tuning OpenAI GPT leads to considerably better performance than fine-tuning Google PaLM 2. Overall, our extensive experiments indicate a huge potential value of leveraging LLMs in future recommendation approaches. We publicly share the code and data of our experiments to ensure reproducibility. △ Less

Submitted 11 January, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 12 figures, 7 tables

arXiv:2309.09261 [pdf, other]

doi 10.1145/3604915.3610639

Leveraging Large Language Models for Sequential Recommendation

Authors: Jesse Harte, Wouter Zorgdrager, Panos Louridas, Asterios Katsifodimos, Dietmar Jannach, Marios Fragkoulis

Abstract: Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifical… ▽ More Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we devise and evaluate three approaches to leverage the power of LLMs in different ways. Our results from experiments on two datasets show that initializing the state-of-the-art sequential recommendation model BERT4Rec with embeddings obtained from an LLM improves NDCG by 15-20% compared to the vanilla BERT4Rec model. Furthermore, we find that a simple approach that leverages LLM embeddings for producing recommendations, can provide competitive performance by highlighting semantically related items. We publicly share the code and data of our experiments to ensure reproducibility. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: 9 pages

Report number: In Seventeenth ACM Conference on Recommender Systems (RecSys '23), September 18--22, 2023, Singapore, Singapore. ACM, New York, NY, USA

arXiv:2210.12883 [pdf, other]

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

Authors: Konstantina Dritsa, Kaiti Thoma, John Pavlopoulos, Panos Louridas

Abstract: Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it wa… ▽ More Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: Accepted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

arXiv:2103.00587 [pdf, other]

PyCG: Practical Call Graph Generation in Python

Authors: Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, Dimitris Mitropoulos

Abstract: Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions. Despite the language's popularity, there have been very few tools aiming to generate call grap… ▽ More Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis. Generating call graphs in an efficient manner can be a challenging task when it comes to high-level languages that are modular and incorporate dynamic features and higher-order functions. Despite the language's popularity, there have been very few tools aiming to generate call graphs for Python programs. Worse, these tools suffer from several effectiveness issues that limit their practicality in realistic programs. We propose a pragmatic, static approach for call graph generation in Python. We compute all assignment relations between program identifiers of functions, variables, classes, and modules through an inter-procedural analysis. Based on these assignment relations, we produce the resulting call graph by resolving all calls to potentially invoked functions. Notably, the underlying analysis is designed to be efficient and scalable, handling several Python features, such as modules, generators, function closures, and multiple inheritance. We have evaluated our prototype implementation, which we call PyCG, using two benchmarks: a micro-benchmark suite containing small Python programs and a set of macro-benchmarks with several popular real-world Python packages. Our results indicate that PyCG can efficiently handle thousands of lines of code in less than a second (0.38 seconds for 1k LoC on average). Further, it outperforms the state-of-the-art for Python in both precision and recall: PyCG achieves high rates of precision ~99.2%, and adequate recall ~69.9%. Finally, we demonstrate how PyCG can aid dependency impact analysis by showcasing a potential enhancement to GitHub's "security advisory" notification service using a real-world example. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2011.00650 [pdf, other]

doi 10.1007/978-3-030-62008-0_2

Search Engine Similarity Analysis: A Combined Content and Rankings Approach

Authors: Konstantina Dritsa, Thodoris Sotiropoulos, Haris Skarpetis, Panos Louridas

Abstract: How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habi… ▽ More How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habit. In this work we present a thorough analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo, which goes to great lengths to emphasize its privacy-friendly credentials. To do so, we collected search results using a comprehensive set of 300 unique queries for two time periods in 2016 and 2019, and developed a new similarity metric that leverages both the content and the ranking of search responses. We evaluated the characteristics of the metric against other metrics and approaches that have been proposed in the literature, and used it to (1) investigate the similarities of search engine results, (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services. We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other. △ Less

Submitted 6 November, 2020; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: Shorter version of this paper was accepted in the 21st International Conference on Web Information Systems Engineering (WISE 2020). The final authenticated version is available online at https://doi.org/10.1007/978-3-030-62008-0_2

arXiv:2002.03927 [pdf, ps, other]

doi 10.1145/3379597.3387495

A Dataset of Enterprise-Driven Open Source Software

Authors: Diomidis Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas

Abstract: We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it,… ▽ More We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders. △ Less

Submitted 21 April, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: 5 pages

arXiv:1904.03031 [pdf, other]

doi 10.1016/j.jss.2021.110936

On the Feasibility of Transfer-learning Code Smells using Deep Learning

Authors: Tushar Sharma, Vasiliki Efstathiou, Panos Louridas, Diomidis Spinellis

Abstract: Context: A substantial amount of work has been done to detect smells in source code using metrics-based and heuristics-based methods. Machine learning methods have been recently applied to detect source code smells; however, the current practices are considered far from mature. Objective: First, explore the feasibility of applying deep learning models to detect smells without extensive feature eng… ▽ More Context: A substantial amount of work has been done to detect smells in source code using metrics-based and heuristics-based methods. Machine learning methods have been recently applied to detect source code smells; however, the current practices are considered far from mature. Objective: First, explore the feasibility of applying deep learning models to detect smells without extensive feature engineering, just by feeding the source code in tokenized form. Second, investigate the possibility of applying transfer-learning in the context of deep learning models for smell detection. Method: We use existing metric-based state-of-the-art methods for detecting three implementation smells and one design smell in C# code. Using these results as the annotated gold standard, we train smell detection models on three different deep learning architectures. These architectures use Convolution Neural Networks (CNNs) of one or two dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden layers. For the first objective of our study, we perform training and evaluation on C# samples, whereas for the second objective, we train the models from C# code and evaluate the models over Java code samples. We perform the experiments with various combinations of hyper-parameters for each model. Results: We find it feasible to detect smells using deep learning methods. Our comparative experiments find that there is no clearly superior method between CNN-1D and CNN-2D. We also observe that performance of the deep learning models is smell-specific. Our transfer-learning experiments show that transfer-learning is definitely feasible for implementation smells with performance comparable to that of direct-learning. This work opens up a new paradigm to detect code smells by transfer-learning especially for the programming languages where the comprehensive code smell detection tools are not available. △ Less

Submitted 16 September, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

Showing 1–8 of 8 results for author: Louridas, P