Search | arXiv e-print repository

An Empirical Study of Safetensors' Usage Trends and Developers' Perceptions

Authors: Beatrice Casey, Kaia Damian, Andrew Cotaj, Joanna C. S. Santos

Abstract: Developers are sharing pre-trained Machine Learning (ML) models through a variety of model sharing platforms, such as Hugging Face, in an effort to make ML development more collaborative. To share the models, they must first be serialized. While there are many methods of serialization in Python, most of them are unsafe. To tame this insecurity, Hugging Face released safetensors as a way to mitigat… ▽ More Developers are sharing pre-trained Machine Learning (ML) models through a variety of model sharing platforms, such as Hugging Face, in an effort to make ML development more collaborative. To share the models, they must first be serialized. While there are many methods of serialization in Python, most of them are unsafe. To tame this insecurity, Hugging Face released safetensors as a way to mitigate the threats posed by unsafe serialization formats. In this context, this paper investigates developer's shifts towards using safetensors on Hugging Face in an effort to understand security practices in the ML development community, as well as how developers react to new methods of serialization. Our results find that more developers are adopting safetensors, and many safetensor adoptions were made by automated conversions of existing models by Hugging Face's conversion tool. We also found, however, that a majority of developers ignore the conversion tool's pull requests, and that while many developers are facing issues with using safetensors, they are eager to learn about and adapt the format. △ Less

Submitted 3 January, 2025; originally announced January 2025.

arXiv:2410.17736 [pdf, other]

MojoBench: Language Modeling and Benchmarks for Mojo

Authors: Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri

Abstract: The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. Mojo… ▽ More The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. MojoBench includes HumanEval-Mojo, a benchmark dataset designed for evaluating code LLMs on Mojo, and Mojo-Coder, the first LLM pretrained and finetuned for Mojo code generation, which supports instructions in 5 natural languages (NLs). Our results show that Mojo-Coder achieves a 30-35% performance improvement over leading models like GPT-4o and Claude-3.5-Sonnet. Furthermore, we provide insights into LLM behavior with underrepresented and unseen PLs, offering potential strategies for enhancing model adaptability. MojoBench contributes to our understanding of LLM capabilities and limitations in emerging programming paradigms fostering more robust code generation systems. △ Less

Submitted 23 October, 2024; originally announced October 2024.

arXiv:2410.16349 [pdf, other]

Large Language Models in Computer Science Education: A Systematic Literature Review

Authors: Nishat Raihan, Mohammed Latif Siddiq, Joanna C. S. Santos, Marcos Zampieri

Abstract: Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). Foundational models such as the Generative Pre-trained Transformer (GPT) and LLaMA… ▽ More Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). Foundational models such as the Generative Pre-trained Transformer (GPT) and LLaMA series have set strong baseline performances in various NL and PL tasks. Additionally, several models have been fine-tuned specifically for code generation, showing significant improvements in code-related applications. Both foundational and fine-tuned models are increasingly used in education, helping students write, debug, and understand code. We present a comprehensive systematic literature review to examine the impact of LLMs in computer science and computer engineering education. We analyze their effectiveness in enhancing the learning experience, supporting personalized education, and aiding educators in curriculum development. We address five research questions to uncover insights into how LLMs contribute to educational outcomes, identify challenges, and suggest directions for future research. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Accepted at 56th ACM Technical Symposium on Computer Science Education (SIGCSE TS 2025)

arXiv:2410.04490 [pdf]

A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models

Authors: Beatrice Casey, Joanna C. S. Santos, Mehdi Mirakhorli

Abstract: The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods a… ▽ More The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect malicious models. Our results show that Hugging Face is home to a wide range of potentially vulnerable models. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2404.10155 [pdf, other]

The Fault in our Stars: Quality Assessment of Code Generation Benchmarks

Authors: Mohammed Latif Siddiq, Simantika Dristi, Joy Saha, Joanna C. S. Santos

Abstract: Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we conduct the first-of-its-kind study of the quality of prompts within benchmarks used to compare the perfo… ▽ More Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we conduct the first-of-its-kind study of the quality of prompts within benchmarks used to compare the performance of different code generation models. To conduct this study, we analyzed 3,566 prompts from 9 code generation benchmarks to identify quality issues in them. We also investigated whether fixing the identified quality issues in the benchmarks' prompts affects a model's performance. We also studied memorization issues of the evaluation dataset, which can put into question a benchmark's trustworthiness. We found that code generation evaluation benchmarks mainly focused on Python and coding exercises and had very limited contextual dependencies to challenge the model. These datasets and the developers' prompts suffer from quality issues like spelling and grammatical errors, unclear sentences to express developers' intent, and not using proper documentation style. Fixing all these issues in the benchmarks can lead to a better performance for Python code generation, but not a significant improvement was observed for Java code generation. We also found evidence that GPT-3.5-Turbo and CodeGen-2.5 models may have data contamination issues. △ Less

Submitted 4 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at the 24th IEEE International Conference on Source Code Analysis and Manipulation(SCAM 2024) Research Track

arXiv:2403.10646 [pdf]

doi 10.1145/3721977

A Survey of Source Code Representations for Machine Learning-Based Cybersecurity Tasks

Authors: Beatrice Casey, Joanna C. S. Santos, George Perry

Abstract: Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better unders… ▽ More Machine learning techniques for cybersecurity-related software engineering tasks are becoming increasingly popular. The representation of source code is a key portion of the technique that can impact the way the model is able to learn the features of the source code. With an increasing number of these techniques being developed, it is valuable to see the current state of the field to better understand what exists and what is not there yet. This article presents a study of these existing machine learning based approaches and demonstrates what type of representations were used for different cybersecurity tasks and programming languages. Additionally, we study what types of models are used with different representations. We have found that graph-based representations are the most popular category of representation, and tokenizers and Abstract Syntax Trees (ASTs) are the two most popular representations overall (e.g., AST and tokenizers are the representations with the highest count of papers, whereas graph-based representations is the category with the highest count of papers). We also found that the most popular cybersecurity task is vulnerability detection, and the language that is covered by the most techniques is C. Finally, we found that sequence-based models are the most popular category of models, and Support Vector Machines are the most popular model overall. △ Less

Submitted 9 April, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2401.01200 [pdf, other]

Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithms

Authors: Flavio P. Loss, Pedro H. da Cunha, Matheus B. Rocha, Madson Poltronieri Zanoni, Leandro M. de Lima, Isadora Tavares Nascimento, Isabella Rezende, Tania R. P. Canuto, Luciana de Paula Vieira, Renan Rossoni, Maria C. S. Santos, Patricia Lyra Frasson, Wanderson Romão, Paulo R. Filgueiras, Renato A. Krohling

Abstract: Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability… ▽ More Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.12598 [pdf, other]

A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges

Authors: Roberto Francisco de Lima Junior, Luiz Fernando Paes de Barros Presta, Lucca Santos Borborema, Vanderson Nogueira da Silva, Marcio Leal de Melo Dahia, Anderson Carlos Sousa e Santos

Abstract: This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a cas… ▽ More This paper presents a detailed case study examining the application of Large Language Models (LLMs) in the construction of test cases within the context of software engineering. LLMs, characterized by their advanced natural language processing capabilities, are increasingly garnering attention as tools to automate and enhance various aspects of the software development life cycle. Leveraging a case study methodology, we systematically explore the integration of LLMs in the test case construction process, aiming to shed light on their practical efficacy, challenges encountered, and implications for software quality assurance. The study encompasses the selection of a representative software application, the formulation of test case construction methodologies employing LLMs, and the subsequent evaluation of outcomes. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency. Additionally, delves into challenges such as model interpretability and adaptation to diverse software contexts. The findings from this case study contributes with nuanced insights into the practical utility of LLMs in the domain of test case construction, elucidating their potential benefits and limitations. By addressing real-world scenarios and complexities, this research aims to inform software practitioners and researchers alike about the tangible implications of incorporating LLMs into the software testing landscape, fostering a more comprehensive understanding of their role in optimizing the software development process. △ Less

Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2311.00943 [pdf]

doi 10.1145/3649851

Seneca: Taint-Based Call Graph Construction for Java Object Deserialization

Authors: Joanna C. S. Santos, Mehdi Mirakhorli, Ali Shokri

Abstract: Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorith… ▽ More Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithms do not fully support object serialization/deserialization, i.e., they are unable to uncover the callback methods that are invoked when objects are serialized and deserialized. Since call graphs are a core data structure for multiple types of analysis (e.g., vulnerability detection), an appropriate analysis cannot be performed since the call graph does not capture hidden (vulnerable) paths that occur via callback methods. In this paper, we present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction. Our approach relies on taint analysis and API modeling to construct sound call graphs. We evaluated our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities. Our results show that Seneca can create sound call graphs with respect to serialization features. The resulting call graphs do not incur significant runtime overhead and were shown to be useful for performing identification of vulnerable paths caused by untrusted object deserialization. △ Less

Submitted 2 September, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted at OOPSLA 2024

arXiv:2311.00889 [pdf, other]

doi 10.1145/3691621.3694934

SALLM: Security Assessment of Generated Code

Authors: Mohammed Latif Siddiq, Joanna C. S. Santos, Sajith Devareddy, Anna Muller

Abstract: With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to… ▽ More With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Therefore, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, configurable assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation. △ Less

Submitted 4 September, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted at the 6th International Workshop on Automated and verifiable Software sYstem DEvelopment (ASYDE) with ASE Conference 2024

Journal ref: 39th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW '24), October 27-November 1, 2024, Sacramento, CA, USA, ACM, New York, NY, USA, 12 pages

arXiv:2307.08220 [pdf, other]

FRANC: A Lightweight Framework for High-Quality Code Generation

Authors: Mohammed Latif Siddiq, Beatrice Casey, Joanna C. S. Santos

Abstract: In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts… ▽ More In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is time-consuming and resource-intensive. Thus, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated the framework with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (SOEval). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds. △ Less

Submitted 28 August, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

Comments: Accepted at the 24th IEEE International Conference on Source Code Analysis and Manipulation (SCAM 2024)

arXiv:2305.00418 [pdf, other]

doi 10.1145/3661167.3661216

Using Large Language Models to Generate JUnit Tests: An Empirical Study

Authors: Mohammed Latif Siddiq, Joanna C. S. Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, Vinicius Carvalho Lopes

Abstract: A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning for a strongly typed language like Java. To fill this gap, we investigated how well… ▽ More A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning for a strongly typed language like Java. To fill this gap, we investigated how well three models (Codex, GPT-3.5-Turbo, and StarCoder) can generate unit tests. We used two benchmarks (HumanEval and Evosuite SF110) to investigate the effect of context generation on the unit test generation process. We evaluated the models based on compilation rates, test correctness, test coverage, and test smells. We found that the Codex model achieved above 80% coverage for the HumanEval dataset, but no model had more than 2% coverage for the EvoSuite SF110 benchmark. The generated tests also suffered from test smells, such as Duplicated Asserts and Empty Tests. △ Less

Submitted 8 March, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted in Research Track of The 28th International Conference on Evaluation and Assessment in Software Engineering (EASE 2024)

Journal ref: The 28th International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024, 313-322

arXiv:2304.08999 [pdf, other]

doi 10.1145/3555776.3578577

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Authors: Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, Mário Amorim Lopes

Abstract: Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of tho… ▽ More Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over $10$ years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved $F_1$ scores of $88.6$, $95.0$, and $55.8$ per cent in the mention extraction of procedures, drugs, and diseases, respectively. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.07840 [pdf, other]

Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering

Authors: Rishov Paul, Md. Mohib Hossain, Mohammed Latif Siddiq, Masum Hasan, Anindya Iqbal, Joanna C. S. Santos

Abstract: Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence that code review could improve the program repair further. Large language models, trained with Natural Language (NL) and Programming Language (PL), can contain inherent knowledge of both. In this study… ▽ More Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence that code review could improve the program repair further. Large language models, trained with Natural Language (NL) and Programming Language (PL), can contain inherent knowledge of both. In this study, we investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair. We applied PLBART and CodeT5, two state-of-the-art language models that are pre-trained with both PL and NL, on two such natural language-based program repair datasets and found that the pre-trained language models fine-tuned with datasets containing both code review and subsequent code changes notably outperformed each of the previous models. With the advent of code generative models like Codex and GPT-3.5-Turbo, we also performed zero-shot and few-shots learning-based prompt engineering to assess their performance on these datasets. However, the practical application of using LLMs in the context of automated program repair is still a long way off based on our manual analysis of the generated repaired codes by the learning models. △ Less

Submitted 21 July, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

Comments: 12 pages, 2 figures, 4 tables

arXiv:2102.08372 [pdf, other]

doi 10.1109/ICSA51549.2021.00021

ArCode: Facilitating the Use of Application Frameworks to Implement Tactics and Patterns

Authors: Ali Shokri, Joanna C. S. Santos, Mehdi Mirakhorli

Abstract: Software designers and developers are increasingly relying on application frameworks as first-class design concepts. They instantiate the services that frameworks provide to implement various architectural tactics and patterns. One of the challenges in using frameworks for such tasks is the difficulty of learning and correctly using frameworks' APIs. This paper introduces a learning-based approach… ▽ More Software designers and developers are increasingly relying on application frameworks as first-class design concepts. They instantiate the services that frameworks provide to implement various architectural tactics and patterns. One of the challenges in using frameworks for such tasks is the difficulty of learning and correctly using frameworks' APIs. This paper introduces a learning-based approach called ArCode to help novice programmers correctly use frameworks' APIs to implement architectural tactics and patterns. ArCode has several novel components: a graph-based approach for learning specification of a framework from a limited number of training software, a program analysis algorithm to eliminate erroneous training data, and a recommender module to help programmers use APIs correctly and identify API misuses in their programs. We evaluated our technique across two popular frameworks: JAAS security framework used for authentication and authorization tactic and Java RMI framework used to enable remote method invocation between client and server and other object-oriented patterns. Our evaluation results show (i) the feasibility of using ArCode to learn the specification of a framework; (ii) ArCode generates accurate recommendations for finding the next API call to implement an architectural tactic/pattern based on the context of the programmer's code; (iii) it accurately detects API misuses in the code that implements a tactic/pattern and provides fix recommendations. Comparison of ArCode with two prior techniques (MAPO and GrouMiner) on API recommendation and misuse detection shows that ArCode outperforms these approaches. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: This paper has been accepted in the main track of 2021 IEEE International Conference on Software Architecture (ICSA 2021) and is going to be published. Please feel free to cite it

arXiv:1710.04132 [pdf]

Aprendendo Programacao Orientada a Objetos com uma Abordagem Ludica Baseada em Greenfoot e Robocode

Authors: Cleison Simoes Santos, Allen Hichard Marques Santos, Suenny Mascarenhas Souza, Roberto Almeida Bittencourt

Abstract: One the major challenges in undergraduate computing programs is the learning of object-oriented programming (OOP). This paradigm has a variety of concepts with an abstraction level usually high for most beginners, even the ones who already code in an imperative language. Furthermore, transitioning from imperative programming to OOP is a complex issue, with various inappropriate side effects. A sig… ▽ More One the major challenges in undergraduate computing programs is the learning of object-oriented programming (OOP). This paradigm has a variety of concepts with an abstraction level usually high for most beginners, even the ones who already code in an imperative language. Furthermore, transitioning from imperative programming to OOP is a complex issue, with various inappropriate side effects. A significant effort has been pursued in the search of motivating and attractive solutions for such issues. One of those is the use of playful environments that merge games with learning. In this work, we report our experience with OOP learning workshops by means of games, challenges and competitions, supported by Greenfoot and Robocode learning environments. A workshop with sophomore students in a Computer Engineering program is presented here. Lessons learning to motive students include: design of motivating examples, use of competitive challenges, and an appropriate ratio between tutors and students. Results suggest that the workshop was a practical and effective way to introduce OOP and motivate students to learn it. △ Less

Submitted 16 October, 2017; v1 submitted 7 October, 2017; originally announced October 2017.

Comments: 10 pages, 3 figures, 2 tables, COBENGE 2015 - XLIII Congresso Brasileiro de Educação em Engenharia, in Portuguese

arXiv:1704.08412 [pdf, other]

doi 10.1109/MSR.2017.8

A Large-Scale Study on the Usage of Testing Patterns that Address Maintainability Attributes (Patterns for Ease of Modification, Diagnoses, and Comprehension)

Authors: Danielle Gonzalez, Joanna C. S. Santos, Andrew Popovich, Mehdi Mirakhorli, Mei Nagappan

Abstract: Test case maintainability is an important concern, especially in open source and distributed development environments where projects typically have high contributor turnover with varying backgrounds and experience, and where code ownership changes often. Similar to design patterns, patterns for unit testing promote maintainability quality attributes such as ease of diagnoses, modifiability, and co… ▽ More Test case maintainability is an important concern, especially in open source and distributed development environments where projects typically have high contributor turnover with varying backgrounds and experience, and where code ownership changes often. Similar to design patterns, patterns for unit testing promote maintainability quality attributes such as ease of diagnoses, modifiability, and comprehension. In this paper, we report the results of a large-scale study on the usage of four xUnit testing patterns which can be used to satisfy these maintainability attributes. This is a first-of-its-kind study which developed automated techniques to investigate these issues across 82,447 open source projects, and the findings provide more insight into testing practices in open source projects. Our results indicate that only 17% of projects had test cases, and from the 251 testing frameworks we studied, 93 of them were being used. We found 24% of projects with test files implemented patterns that could help with maintainability, while the remaining did not use these patterns. Multiple qualitative analyses indicate that usage of patterns was an ad-hoc decision by individual developers, rather than motivated by the characteristics of the project, and that developers sometimes used alternative techniques to address maintainability concerns. △ Less

Submitted 26 April, 2017; originally announced April 2017.

Comments: Mining Software Repositories (MSR) 2017 Research Track

Journal ref: 017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), Buenos Aires, 2017, pp. 391-401

Showing 1–17 of 17 results for author: Santos, C S