-
I Can't Share Code, but I need Translation -- An Empirical Study on Code Translation through Federated LLM
Authors:
Jahnavi Kumar,
Venkata Lakshmana Sasaank Janapati,
Mokshith Reddy Tanguturi,
Sridhar Chimalakonda
Abstract:
Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating transl…
▽ More
Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating translations between legacy and modern languages, recent work has increasingly turned to pre-trained Large Language Models (LLMs) to translate efficiently.
Given the proprietary nature of code, organizations prefer fine-tuning LLMs locally rather than relying on external APIs. This is one of the first empirical studies that proposes a Federated LLM-based approach for code translation. The proposed approach enables clients to jointly train a code translator without sharing sensitive data. This study demonstrates that participants can collaboratively develop a FedLLM for efficient code translation (particularly C\# to Java and vice-versa) with superior results (more than 40\% improvement in CodeLLaMA's CodeBLEU score) compared to individual client models. Our findings indicate that FedLLM offers a collaborative approach to code translation and could serve as a promising direction for future research in this field.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Code Review Automation Via Multi-task Federated LLM -- An Empirical Study
Authors:
Jahnavi Kumar,
Sridhar Chimalakonda
Abstract:
Code review is a crucial process before deploying code to production, as it validates the code, provides suggestions for improvements, and identifies errors such as missed edge cases. In projects with regular production releases, the effort required for peer code-reviews remains high. Consequently, there has been significant interest from software engineering (SE) researchers in automating the cod…
▽ More
Code review is a crucial process before deploying code to production, as it validates the code, provides suggestions for improvements, and identifies errors such as missed edge cases. In projects with regular production releases, the effort required for peer code-reviews remains high. Consequently, there has been significant interest from software engineering (SE) researchers in automating the code review process. Previous research on code review automation has typically approached the task as three independent sub-tasks: review necessity prediction, review comment generation, and code refinement. Our study attempts to (i) leverage the relationships between the sub-tasks of code review automation, by developing a multi-task model that addresses all tasks in an integrated manner, and (ii) increase model robustness on unseen data via collaborative large language model (LLM) modeling, while retaining the proprietary nature of code, by using federated learning (FL). The study explores five simple techniques for multi-task training, including two sequential methods, one parallel method, and two cumulative methods. The results indicate that sequentially training a federated LLM (FedLLM) for our code review multi-task use case is less efficient in terms of time, computation, and performance metrics, compared to training separate models for each task. Because sequential training demonstrates catastrophic forgetting, alternatively cumulative fine-tuning for multi-task training performs better than training models for individual tasks. This study highlights the need for research focused on effective fine-tuning of multi-task FedLLMs for SE tasks.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
CPPJoules: An Energy Measurement Tool for C++
Authors:
Shivadharshan S,
Akilesh P,
Rajrupa Chattaraj,
Sridhar Chimalakonda
Abstract:
With the increasing complexity of modern software and the demand for high performance, energy consumption has become a critical factor for developers and researchers. While much of the research community is focused on evaluating the energy consumption of machine learning and artificial intelligence systems -- often implemented in Python -- there is a gap when it comes to tools and frameworks for m…
▽ More
With the increasing complexity of modern software and the demand for high performance, energy consumption has become a critical factor for developers and researchers. While much of the research community is focused on evaluating the energy consumption of machine learning and artificial intelligence systems -- often implemented in Python -- there is a gap when it comes to tools and frameworks for measuring energy usage in other programming languages. C++, in particular, remains a foundational language for a wide range of software applications, from game development to parallel programming frameworks, yet lacks dedicated energy measurement solutions. To address this, we have developed CPPJoules, a tool built on top of Intel-RAPL to measure the energy consumption of C++ code snippets. We have evaluated the tool by measuring the energy consumption of the standard computational tasks from the Rosetta Code repository. The demonstration of the tool is available at \url{https://www.youtube.com/watch?v=GZXYF3AKzPk} and related artifacts at \url{https://rishalab.github.io/CPPJoules/}.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs
Authors:
Alex Mathai,
Kranthi Sedamaki,
Debeshee Das,
Noble Saji Mathews,
Srikanth Tamilselvam,
Sridhar Chimalakonda,
Atul Kumar
Abstract:
Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspir…
▽ More
Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspired by natural language processing (NLP), have shown remarkable success in SE tasks. However, source code contains structural and semantic properties embedded within its grammar, which can be extracted from structured code-views like the Abstract Syntax Tree (AST), Data-Flow Graph (DFG), and Control-Flow Graph (CFG). These code-views can complement NLP techniques, further improving SE tasks. Unfortunately, there are no flexible frameworks to infuse arbitrary code-views into existing transformer-based models effectively. Therefore, in this work, we propose CodeSAM, a novel scalable framework to infuse multiple code-views into transformer-based models by creating self-attention masks. We use CodeSAM to fine-tune a small language model (SLM) like CodeBERT on the downstream SE tasks of semantic code search, code clone detection, and program classification. Experimental results show that by using this technique, we improve downstream performance when compared to SLMs like GraphCodeBERT and CodeBERT on all three tasks by utilizing individual code-views or a combination of code-views during fine-tuning. We believe that these results are indicative that techniques like CodeSAM can help create compact yet performant code SLMs that fit in resource constrained settings.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
DBJoules: An Energy Measurement Tool for Database Management Systems
Authors:
Hemasri Sai Lella,
Kurra Manasa,
Rajrupa Chattaraj,
Sridhar Chimalakonda
Abstract:
In the rapidly evolving landscape of modern data-driven technologies, software relies on large datasets and constant data center operations using various database systems to support computation-intensive tasks. As energy consumption in software systems becomes a growing concern, selecting the right database from energy-efficiency perspective is also critical. To address this, we introduce \textbf{…
▽ More
In the rapidly evolving landscape of modern data-driven technologies, software relies on large datasets and constant data center operations using various database systems to support computation-intensive tasks. As energy consumption in software systems becomes a growing concern, selecting the right database from energy-efficiency perspective is also critical. To address this, we introduce \textbf{\textit{DBJoules}}, a tool that measures the energy consumption of activities in database systems. \textit{DBJoules} supports energy measurement of CRUD operations for four popular databases. Through evaluations on two widely-used datasets, we identify disparities of 7\% to 38\% in the energy consumption of these databases. Hence, the goal is to raise developer awareness about the effect of running queries in different databases from an energy consumption perspective, enabling them to select appropriate database for sustainable usage. The tool's demonstration is available at \url{https://youtu.be/D1MTZum0jok} and related artifacts at \url{https://rishalab.github.io/DBJoules/}.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
COMEX: A Tool for Generating Customized Source Code Representations
Authors:
Debeshee Das,
Noble Saji Mathews,
Alex Mathai,
Srikanth Tamilselvam,
Kranthi Sedamaki,
Sridhar Chimalakonda,
Atul Kumar
Abstract:
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, v…
▽ More
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE.
Tool: https://pypi.org/project/comex - GitHub: https://github.com/IBM/tree-sitter-codeviews - Demo: https://youtu.be/GER6U87FVbU
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
X-COBOL: A Dataset of COBOL Repositories
Authors:
Mir Sameed Ali,
Nikhil Manjunath,
Sridhar Chimalakonda
Abstract:
Despite being proposed as early as 1959, COBOL (Common Business-Oriented Language) still predominantly acts as an integral part of the majority of operations of several financial, banking, and governmental organizations. To support the inevitable modernization and maintenance of legacy systems written in COBOL, it is essential for organizations, researchers, and developers to understand the nature…
▽ More
Despite being proposed as early as 1959, COBOL (Common Business-Oriented Language) still predominantly acts as an integral part of the majority of operations of several financial, banking, and governmental organizations. To support the inevitable modernization and maintenance of legacy systems written in COBOL, it is essential for organizations, researchers, and developers to understand the nature and source code of COBOL programs. However, to the best of our knowledge, we are unaware of any dataset that provides data on COBOL software projects, motivating the need for the dataset. Thus, to aid empirical research on comprehending COBOL in open-source repositories, we constructed a dataset of 84 COBOL repositories mined from GitHub, containing rich metadata on the development cycle of the projects. We envision that researchers can utilize our dataset to study COBOL projects' evolution, code properties and develop tools to support their development. Our dataset also provides 1255 COBOL files present inside the mined repositories. The dataset and artifacts are available at https://doi.org/10.5281/zenodo.7968845.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Statically Detecting Buffer Overflow in Cross-language Android Applications Written in Java and C/C++
Authors:
Kishanthan Thangarajah,
Noble Mathews,
Michael Pu,
Meiyappan Nagappan,
Yousra Aafer,
Sridhar Chimalakonda
Abstract:
Many applications are being written in more than one language to take advantage of the features that different languages provide such as native code support, improved performance, and language-specific libraries. However, there are few static analysis tools currently available to analyse the source code of such multilingual applications. Existing work on cross-language (Java and C/C++) analysis fa…
▽ More
Many applications are being written in more than one language to take advantage of the features that different languages provide such as native code support, improved performance, and language-specific libraries. However, there are few static analysis tools currently available to analyse the source code of such multilingual applications. Existing work on cross-language (Java and C/C++) analysis fails to detect buffer overflow vulnerabilities that are of cross-language nature. In this work, we are addressing how to do cross-language analysis between Java and C/C++. Specifically, we propose an approach to do data flow analysis between Java and C/C++ to detect buffer overflow. We have developed PilaiPidi, a tool that can automatically analyse the data flow in projects written in Java and C/C++. Using our approach, we were able to detect 23 buffer overflow vulnerabilities, which are of cross-language nature, in six different well-known Android applications, and out of these, developers have confirmed 11 vulnerabilities in three applications.
△ Less
Submitted 17 May, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
On the Energy Consumption of Different Dataframe Processing Libraries -- An Exploratory Study
Authors:
Shriram Shanbhag,
Sridhar Chimalakonda
Abstract:
Background: The energy consumption of machine learning and its impact on the environment has made energy efficient ML an emerging area of research. However, most of the attention stays focused on the model creation and the training and inferencing phase. Data oriented stages like preprocessing, cleaning and exploratory analysis form a critical part of the machine learning workflow. However, the en…
▽ More
Background: The energy consumption of machine learning and its impact on the environment has made energy efficient ML an emerging area of research. However, most of the attention stays focused on the model creation and the training and inferencing phase. Data oriented stages like preprocessing, cleaning and exploratory analysis form a critical part of the machine learning workflow. However, the energy efficiency of these stages have gained little attention from the researchers. Aim: Our study aims to explore the energy consumption of different dataframe processing libraries as a first step towards studying the energy efficiency of the data oriented stages of the machine learning pipeline. Method: We measure the energy consumption of 3 popular libraries used to work with dataframes, namely Pandas, Vaex and Dask for 21 different operations grouped under 4 categories on 2 datasets. Results: The results of our analysis show that for a given dataframe processing operation, the choice of library can indeed influence the energy consumption with some libraries consuming 202 times lesser energy over others. Conclusion: The results of our study indicates that there is a potential for optimizing the energy consumption of the data oriented stages of the machine learning pipeline and further research is needed in the direction.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
An Empirical Study On Correlation between Readme Content and Project Popularity
Authors:
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
Readme in GitHub repositories serves as a preliminary source of information, and thus helps developers in understanding about the projects, for reuse or extension. Different types of contextual and structural content, which we refer to as categories of the content and features in the content respectively, are present in readme files, and could determine the extent of comprehension about project. C…
▽ More
Readme in GitHub repositories serves as a preliminary source of information, and thus helps developers in understanding about the projects, for reuse or extension. Different types of contextual and structural content, which we refer to as categories of the content and features in the content respectively, are present in readme files, and could determine the extent of comprehension about project. Consequently, the structural and contextual aspects of the content could impact the project popularity. Studying the correlation between the content and project popularity could help in focusing on the aspects that could improve popularity, while designing the readme files. However, existing studies explore the categories of content and types of features in readme files, and do not explore their usefulness towards project popularity. Hence, we present an empirical study to understand correlation between readme file content and project popularity. We perform the study on 1950 readme files of public GitHub projects, spanning across ten programming languages, and observe that readme files in majority of the popular projects are well organised using lists and images, and comprise links to external sources. Also, repositories with readme files containing contribution guidelines and references were observed to be associated with higher popularity.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
NoteG: A Computational Notebook to Facilitate Rapid Game Prototyping
Authors:
Noble Saji Mathews,
Sridhar Chimalakonda
Abstract:
Game development-based approaches are increasingly used to design curricula that can engage students, as these can help them apply and practice learnt computer science concepts. However, it can become complex to develop a minimum working game or a prototype with the help of high-end game engines. Game prototyping is one of the most essential parts of the game design and development cycle as it all…
▽ More
Game development-based approaches are increasingly used to design curricula that can engage students, as these can help them apply and practice learnt computer science concepts. However, it can become complex to develop a minimum working game or a prototype with the help of high-end game engines. Game prototyping is one of the most essential parts of the game design and development cycle as it allows developers to continuously test and improve their ideas. In recent years, computational notebooks have gained widespread popularity among developers. They can help run individual code snippets, visualize the output, consolidate the source code, and share live code easily. However, its use has not been explored in the field of game development and prototyping. In this paper, we propose NoteG, a computational notebook towards rapid game prototyping. We evaluated the tool with 18 novice game developers through a questionnaire-based user survey. A majority of the volunteers (66%) found it easy to use and were of the opinion that it saves time. A few of the participants successfully extended the existing framework to implement new game mechanics within their prototypes.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
VedicViz: Towards Visualizing Vedic Principles in Mental Arithmetic
Authors:
Noble Saji Mathews,
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
Augmenting teaching with visualization can help students understand concepts better. Researchers have leveraged visualization to teach conventional mathematics some examples being spatial and origami visualizations. Apart from conventional mathematics, systems such as mental arithmetic involve techniques for rapid calculation without the use of any computing tools and hence have been used in devel…
▽ More
Augmenting teaching with visualization can help students understand concepts better. Researchers have leveraged visualization to teach conventional mathematics some examples being spatial and origami visualizations. Apart from conventional mathematics, systems such as mental arithmetic involve techniques for rapid calculation without the use of any computing tools and hence have been used in developing computational competence among students. Vedic Mathematics is one such set of techniques for mental computation. However, there is a lack of technical tools which tackle mental arithmetic concepts and provide aid in the teaching of these topics to school students. Therefore, we propose VedicViz, a web portal that provides dynamic visualization of mathematical operations such as addition, multiplication and square root calculation, based on techniques in Vedic Mathematics. The web portal also provides visualization that enables learners to compare and contrast the mental mathematics based approach with the traditional methods for various inputs and operations. We evaluated VedicViz with 20 volunteers, who were in their high school education level. They found our web portal to be useful in practicing and learning to use the methods to perform various mathematical operations.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
eGEN: An Energy-saving Modeling Language and Code Generator for Location-sensing of Mobile Apps
Authors:
Kowndinya Boyalakuntla,
Marimuthu C,
Sridhar Chimalakonda,
Chandrasekaran K
Abstract:
The demand for reducing the energy consumption of location-based applications has increased in recent years. The abnormal battery-draining behavior of GPS makes it difficult for the developers to decide on battery optimization during the development phase directly. It will reduce the burden on developers if battery-saving strategies are considered early, and relevant battery-aware code is generate…
▽ More
The demand for reducing the energy consumption of location-based applications has increased in recent years. The abnormal battery-draining behavior of GPS makes it difficult for the developers to decide on battery optimization during the development phase directly. It will reduce the burden on developers if battery-saving strategies are considered early, and relevant battery-aware code is generated from the design phase artifacts. Therefore, we aim to develop tool support, eGEN, to specify and create native location-based mobile apps. eGEN consists of Domain-specific Modeling Language (DSML) and a code generator for location-sensing. It is developed using Xtext and Xtend as an Eclipse plug-in, and currently, it supports native Android apps. eGEN is evaluated through controlled experiments by instrumenting the generated code in five location-based open-source Android applications. The experimental results show 4.35 minutes of average GPS reduction per hour and 188 mA of average reduction in battery consumption while showing only 97 meters degrade in location accuracy over 3 kilometers of a cycling path. Hence, we believe that code generated by eGEN would help developers to balance between energy and accuracy requirements of location-based applications. The source code, documentation, tool demo video, and tool installation video are available at https://github.com/Kowndinya2000/egen.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
WAccess -- A Web Accessibility Tool based on WCAG 2.2, 2.1 and 2.0 Guidelines
Authors:
Kowndinya Boyalakuntla,
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
The vision of providing access to all web content equally for all users makes web accessibility a fundamental goal of today's internet. Web accessibility is the practice of removing barriers from websites that could hinder functionality for users with various disabilities. Web accessibility is measured against the accessibility guidelines such as WCAG, GIGW, and so on. WCAG 2.2 is the latest set o…
▽ More
The vision of providing access to all web content equally for all users makes web accessibility a fundamental goal of today's internet. Web accessibility is the practice of removing barriers from websites that could hinder functionality for users with various disabilities. Web accessibility is measured against the accessibility guidelines such as WCAG, GIGW, and so on. WCAG 2.2 is the latest set of guidelines for web accessibility that helps in making websites accessible. The web accessibility tools available in the World Wide Web Consortium (W3C), only conform up to WCAG 2.1 guidelines, while no tools exist for the latest set of guidelines. Despite the availability of several tools to check the conformity of websites with WCAG 2.1 guidelines, there is a scarcity of tools that are both open source and scalable. To support automated accessibility evaluation of numerous websites against WCAG 2.2, 2.1, and 2.0 we present a tool, WAccess. WAccess highlights violations of 13 guidelines from WCAG 2.0, 9 guidelines from WCAG 2.1, and 7 guidelines from WCAG 2.2 of a specific web page on the web console and suggests the fix for violations while specifying violating code snippet simultaneously. We evaluated WAccess against 2227 government websites of India and observed a total of about 6.1 million violations.
△ Less
Submitted 20 September, 2021; v1 submitted 14 July, 2021;
originally announced July 2021.
-
ML-Quest: A Game for Introducing Machine Learning Concepts to K-12 Students
Authors:
Shruti Priya,
Shubhankar Bhadra,
Sridhar Chimalakonda
Abstract:
Today, Machine Learning (ML) is of a great importance to society due to the availability of huge data and high computational resources. This ultimately led to the introduction of ML concepts at multiple levels of education including K-12 students to promote computational thinking. However, teaching these concepts to K-12 through traditional methodologies such as video lectures and books is challen…
▽ More
Today, Machine Learning (ML) is of a great importance to society due to the availability of huge data and high computational resources. This ultimately led to the introduction of ML concepts at multiple levels of education including K-12 students to promote computational thinking. However, teaching these concepts to K-12 through traditional methodologies such as video lectures and books is challenging. Many studies in the literature have reported that using interactive environments such as games to teach computational thinking and programming improves retention capacity and motivation among students. Therefore, introducing ML concepts using a game might enhance students' understanding of the subject and motivate them to learn further. However, we are not aware of any existing game which explicitly focuses on introducing ML concepts to students using game play. Hence, in this paper, we propose ML-Quest, a 3D video game to provide conceptual overview of three ML concepts: Supervised Learning, Gradient Descent and K-Nearest Neighbor (KNN) Classification. The crux of the game is to introduce the definition and working of these concepts, which we call conceptual overview, in a simulated scenario without overwhelming students with the intricacies of ML. The game has been predominantly evaluated for its usefulness and player experience using the Technology Acceptance Model (TAM) model with the help of 23 higher-secondary school students. The survey result shows that around 70% of the participants either agree or strongly agree that the ML-Quest is quite interactive and useful in introducing them to ML concepts.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
GitQ- Towards Using Badges as Visual Cues for GitHub Projects
Authors:
Akhila Sri Manasa Venigalla,
Kowndinya Boyalakunta,
Sridhar Chimalakonda
Abstract:
GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However, developers willing to contribute to projects on GitHub often find it challenging to select appropriate projects to contribute to or reuse due to the large numbe…
▽ More
GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However, developers willing to contribute to projects on GitHub often find it challenging to select appropriate projects to contribute to or reuse due to the large number of repositories present on GitHub. Further, obtaining this required information often becomes a tedious process, as one has to carefully mine information hidden inside the repository. To alleviate the effort intensive mining procedures, researchers have proposed npm-badges to outline information relating to build status of a project. However, these badges are static and limit their usage to package dependency and build details. Adding visual cues such as badges to the repositories might reduce the search space for developers. Hence, we present GitQ, to automatically augment GitHub repositories with badges representing information about source code and project maintenance. Presenting GitQ as a browser plugin to GitHub could make it easily accessible to developers using GitHub. GitQ is evaluated with 15 developers based on the UTAUT model to understand developer perception towards its usefulness. We observed that 11 out of 15 developers perceived GitQ to be useful in identifying the right set of repositories using visual cues such as generated by GitQ. The source code and tool are available for download on GitHub at https://github.com/gitq-for-github/plugin, and the demo can be found at https://youtu.be/c0yohmIat3A.
△ Less
Submitted 2 May, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based Approach
Authors:
Abhishek Kumar,
Deep Ghadiyali,
Sridhar Chimalakonda
Abstract:
Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising. This issue is common across various community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. Clustering is one of the approaches used by these communities to address this challenge. Specifically, Intent-ba…
▽ More
Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising. This issue is common across various community Question & Answering platforms (Q&A) such as Yahoo, Quora and so on. Clustering is one of the approaches used by these communities to address this challenge. Specifically, Intent-based clustering could be leveraged to answer unanswered questions using other answered questions in the same cluster and can also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions based on intent using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k & 40k SO questions without code-snippets or images involved, and performed intent-based clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, & Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90% threshold similarity, it shows the best value for the three evaluation metrics on all four datasets. The source code and tool are available for download on Github at: https://github.com/Liveitabhi/SOCluster, and the demo can be found here: https://youtu.be/uyn8ie4h3NY.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
COSPEX: A Program Comprehension Tool for Novice Programmers
Authors:
Ashutosh Rajput,
Nakshatra Gupta,
Sridhar Chimalakonda
Abstract:
Developers often encounter unfamiliar code during software maintenance which consumes a significant amount of time for comprehension, especially for novice programmers. Automated techniques that analyze a source code and present key information to the developers can lead to an effective comprehension of the code. Researchers have come up with automated code summarization techniques that focus on c…
▽ More
Developers often encounter unfamiliar code during software maintenance which consumes a significant amount of time for comprehension, especially for novice programmers. Automated techniques that analyze a source code and present key information to the developers can lead to an effective comprehension of the code. Researchers have come up with automated code summarization techniques that focus on code summarization by generating brief summaries rather than aiding its comprehension. Existing debuggers represent the execution states of the program but they do not show the complete execution at a single point. Studies have revealed that the effort required for program comprehension can be reduced if novice programmers are provided with worked examples. Hence, we propose COSPEX (Comprehension using Summarization via Program Execution) - an Atom plugin that dynamically extracts key information for every line of code executed and presents it to the developers in the form of an interactive example-like dynamic information instance. As a preliminary evaluation, we presented 14 undergraduates having Python programming experience up to 1 year with a code comprehension task in a user survey. We observed that COSPEX helped novice programmers in program comprehension and improved their understanding of the code execution. The source code and tool are available at: https://bit.ly/3utHOBM, and the demo on Youtube is available at: https://bit.ly/2Sp08xQ.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
MuseumViz -- Towards Visualizing Online Museum Collections
Authors:
Dheeraj Vagavolu,
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
Despite the growth of online museums for India's cultural heritage data, there is limited increase in terms of visitors. Over the years, online museums adopted many techniques to improve the overall user experience. However, many Indian online museums display artifacts as lists and grids with basic search functionality, making it less visually appealing and difficult to comprehend. Our work aims t…
▽ More
Despite the growth of online museums for India's cultural heritage data, there is limited increase in terms of visitors. Over the years, online museums adopted many techniques to improve the overall user experience. However, many Indian online museums display artifacts as lists and grids with basic search functionality, making it less visually appealing and difficult to comprehend. Our work aims to enhance the user experience of accessing Indian online museums by utilizing advancements in information visualization. Hence, we propose MuseumViz, a framework which processes data from online museums and visualizes it using four different interactive visualizations: the Network Graph, TreepMap, Polygon Chart and SunBurst Chart. We demonstrate MuseumViz on a total of 723 cultural heritage artifacts present in the Archaeological Survey of India, Goa. Based on our evaluation with 25 users, about 83% of them find it easier and more comprehensible to browse cultural heritage artifacts through MuseumViz.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study
Authors:
Karthik Chandra Swarna,
Noble Saji Mathews,
Dheeraj Vagavolu,
Sridhar Chimalakonda
Abstract:
Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph (PDG), which contain information about source code that AST does not. Even though some works tried t…
▽ More
Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph (PDG), which contain information about source code that AST does not. Even though some works tried to utilize multiple representations, they do not provide any insights about the costs and benefits of using multiple representations. The primary goal of this paper is to discuss the implications of utilizing multiple code representations, specifically AST, CFG, and PDG. We modify an AST path-based approach to accept multiple representations as input to an attention-based model. We do this to measure the impact of additional representations (such as CFG and PDG) over AST. We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection. Our approach increases the performance on these tasks by 11% (F1), 15.7% (Accuracy), and 9.3% (F1), respectively, over the baseline. In addition to the effect on performance, we discuss timing overheads incurred with multiple representations. We envision this work providing researchers with a lens to evaluate combinations of code representations for various tasks.
△ Less
Submitted 24 December, 2023; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Detox Browser -- Towards Filtering Sensitive Content On the Web
Authors:
Noble Saji Mathews,
Sridhar Chimalakonda
Abstract:
The annual consumption of web-based resources is increasing at a very fast rate, mainly due to an increase in affordability and accessibility of the internet. Many are relying on the web to get diverse perspectives, but at the same time, it can expose them to content that is harmful to their mental well-being. Catchy headlines and emotionally charged articles increase the number of readers which i…
▽ More
The annual consumption of web-based resources is increasing at a very fast rate, mainly due to an increase in affordability and accessibility of the internet. Many are relying on the web to get diverse perspectives, but at the same time, it can expose them to content that is harmful to their mental well-being. Catchy headlines and emotionally charged articles increase the number of readers which in turn increases ad revenue for websites. When a user consumes a large quantity of negative content, it adversely impacts the user's happiness and has a significant impact on his/her mood and state of mind. Many studies carried out during the COVID-19 pandemic has shown that people across the globe irrespective of their country of origin have experienced higher levels of anxiety and depression. Web filters can help in constructing a digital environment that is more suitable for people prone to depression, anxiety and stress. A significant amount of work has been done in the field of web filtering, but there has been limited focus on helping Highly Sensitive Persons (HSP's) or those with stress disorders induced by trauma. Through this paper, we propose detox Browser, a simple tool that enables end-users to tune out of or control their exposure to topics that can affect their mental well being. The extension makes use of sentiment analysis and keywords to filter out flagged content from google search results and warns users if any blacklisted topics are detected when navigating across websites
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
SurviveCovid-19++ : A collaborative healthcare game towards educating people about safety measures and vaccination for Covid-19
Authors:
Akhila Sri Manasa Venigalla,
Dheeraj Vagavolu,
Sridhar Chimalakonda
Abstract:
Covid-19 has been affecting population across the world for more than an year, with diverse strains of this virus being identified in many countries. Vaccines to help in curbing the virus are being developed and administered. Preventing the spread of the disease requires collaborative efforts from everyone. People with varied professional backgrounds have varied responsibilities in controlling the…
▽ More
Covid-19 has been affecting population across the world for more than an year, with diverse strains of this virus being identified in many countries. Vaccines to help in curbing the virus are being developed and administered. Preventing the spread of the disease requires collaborative efforts from everyone. People with varied professional backgrounds have varied responsibilities in controlling the pandemic. It is important that everyone is aware of their respective responsibilities and also empathize with efforts and duties of other individuals. It is here, we wish to leverage the potential of games in healthcare domain, towards educating about Covid-19. With an aim to educate the population about vaccination against Covid-19, responsibilities of citizens with varied professional backgrounds, and emphasize on the need for collaboration to fight against the pandemic, by following safety measures, we present SurviveCovid-19++, a collaborative multiplayer desktop based game. The game essentially revolves around four roles - doctor, sanitation worker, citizen and law enforcer, delivering their duties, following safety measures and collaboratively clearing multiple stages in the game. We have performed a preliminary evaluation of the game through a qualitative and quantitative user survey. The results of the user survey were encouraging, with volunteers expressing their increased empathy towards efforts of individuals with varied professional backgrounds, and better understanding of the importance of safety measures against Covid-19.
△ Less
Submitted 8 July, 2021; v1 submitted 17 April, 2021;
originally announced April 2021.
-
Apples, Oranges & Fruits -- Understanding Similarity of Software Projects Through The Lens of Dissimilar Artifacts
Authors:
A Eashaan Rao,
Sridhar Chimalakonda
Abstract:
The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements…
▽ More
The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to find similarities between three repositories, two similar and one different project comparing similar and dissimilar artifacts (documentation, commits, and source-code). We observed similarities between dissimilar artifacts such as Commits, Source Code, and Readme Files in the context of both similar and different repositories.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Understanding Emotions of Developer Community Towards Software Documentation
Authors:
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
The availability of open-source projects facilitates developers to contribute and collaborate on a wide range of projects. As a result, the developer community contributing to such open-source projects is also increasing. Many of the projects involve frequent updates and extensive reuses. A well-updated documentation helps in a better understanding of the software project and also facilitates effi…
▽ More
The availability of open-source projects facilitates developers to contribute and collaborate on a wide range of projects. As a result, the developer community contributing to such open-source projects is also increasing. Many of the projects involve frequent updates and extensive reuses. A well-updated documentation helps in a better understanding of the software project and also facilitates efficient contribution and reuse. Though software documentation plays an important role in the development and maintenance of software, it also suffers from various issues that include insufficiency, inconsistency, ill-maintainability, and so on. Exploring the perception of developers towards documentation could help in understanding the reasons behind prevalent issues in software documentation. It could further aid in deciding on training that could be given to the developer community towards building more sustainable projects for society. Analyzing sentiments of contributors to a project could provide insights on understanding developer perceptions. Hence, as the first step towards this direction, we analyze sentiments of commit messages specific to the documentation of a software project. To this end, we considered the commit history of 998 GitHub projects from the GHTorrent dataset and identified 10,996 commits that correspond to the documentation of repositories. Further, we apply sentiment analysis techniques to obtain insights on the type of sentiment being expressed in commit messages of the selected commits. We observe that around 45% of the identified commit messages express trust emotion.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
What's in a GitHub Repository? -- A Software Documentation Perspective
Authors:
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a repository is diversified, among various files, with most of it present in ReadMe files. However, other software artifacts in the repository, such as issue reports…
▽ More
Developers use and contribute to repositories on GitHub. Documentation present in the repositories serves as an important source by helping developers to understand, maintain and contribute to the project. Currently, documentation in a repository is diversified, among various files, with most of it present in ReadMe files. However, other software artifacts in the repository, such as issue reports and pull requests could also contribute to documentation, without documentation being explicitly specified. Hence, in this paper, we propose a taxonomy of documentation sources by analyzing different software artifacts, developer interviews and card-sorting approach. We inspected multiple artifacts of 950 public GitHub repositories, written in four different programming languages, C++, C#, Python and Java, and analyzed the type and amount of documentation that could be extracted from these artifacts. To this end, we observe that, about 25.93% of information extracted from all sources proposed in the taxonomy contains error-related documentation, and that pull requests contribute to around 18.21% of extracted information.
△ Less
Submitted 1 March, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
APIScanner -- Towards Automated Detection of Deprecated APIs in Python Libraries
Authors:
Aparna Vadlamani,
Rishitha Kalicheti,
Sridhar Chimalakonda
Abstract:
Python libraries are widely used for machine learning and scientific computing tasks today. APIs in Python libraries are deprecated due to feature enhancements and bug fixes in the same way as in other languages. These deprecated APIs are discouraged from being used in further software development. Manually detecting and replacing deprecated APIs is a tedious and time-consuming task due to the lar…
▽ More
Python libraries are widely used for machine learning and scientific computing tasks today. APIs in Python libraries are deprecated due to feature enhancements and bug fixes in the same way as in other languages. These deprecated APIs are discouraged from being used in further software development. Manually detecting and replacing deprecated APIs is a tedious and time-consuming task due to the large number of API calls used in the projects. Moreover, the lack of proper documentation for these deprecated APIs makes the task challenging. To address this challenge, we propose an algorithm and a tool APIScanner that automatically detects deprecated APIs in Python libraries. This algorithm parses the source code of the libraries using abstract syntax tree (ASTs) and identifies the deprecated APIs via decorator, hard-coded warning or comments. APIScanner is a Visual Studio Code Extension that highlights and warns the developer on the use of deprecated API elements while writing the source code. The tool can help developers to avoid using deprecated API elements without the execution of code. We tested our algorithm and tool on six popular Python libraries, which detected 838 of 871 deprecated API elements. Demo of APIScanner: https://youtu.be/1hy_ugf-iek. Documentation, tool, and source code can be found here: https://rishitha957.github.io/APIScanner.
△ Less
Submitted 10 May, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
AC2 -- Towards Understanding Architectural Changes in Rapid Releases
Authors:
A Eashaan Rao,
Dheeraj Vagavolu,
Sridhar Chimalakonda
Abstract:
Open source projects are adopting faster release cycles that reflect various changes. Therefore, comprehending the effects of these changes on software's architecture over the releases becomes necessary. However, it is challenging to keep architecture in-check and add new changes simultaneously for every release. To this end, we propose a visualization tool called AC2, which allows its users to ex…
▽ More
Open source projects are adopting faster release cycles that reflect various changes. Therefore, comprehending the effects of these changes on software's architecture over the releases becomes necessary. However, it is challenging to keep architecture in-check and add new changes simultaneously for every release. To this end, we propose a visualization tool called AC2, which allows its users to examine the alterations in the architecture at both higher and lower levels of abstraction for the python projects. AC2 uses call graphs and collaboration graphs to show the interaction between different architectural components. The tool provides four different views to see the architectural changes. The user can examine two releases at a time to comprehend the architectural changes between the releases. AC2 can support the maintainers and developers to observe changes in the project and its influence on the architecture, which allow them to see its increasing complexity over the releases at the component level. AC2 can be downloaded at https://github.com/dheerajrox/AC2 and its demo can be seen at the website https://dheerajrox.github.io/AC2doc or on youtube https://www.youtube.com/watch?v=GNrJfZ0RCVI
△ Less
Submitted 21 December, 2020; v1 submitted 21 December, 2020;
originally announced December 2020.
-
DRAST -- A Deep Learning and AST Based Approach for Bug Localization
Authors:
Shubham Sangle,
Sandeep Muvva,
Sridhar Chimalakonda,
Karthikeyan Ponnalagu,
Vijendran Gopalan Venkoparao
Abstract:
Context: Given a bug report and source code of the project, bug localization can help developers to focus on fixing probable buggy files rather than searching the entire source code repository. While existing research uses information retrieval (IR) and/or combination of machine learning (ML) or deep learning (DL) approaches, they focus primarily on benchmark Java projects, and also motivate the n…
▽ More
Context: Given a bug report and source code of the project, bug localization can help developers to focus on fixing probable buggy files rather than searching the entire source code repository. While existing research uses information retrieval (IR) and/or combination of machine learning (ML) or deep learning (DL) approaches, they focus primarily on benchmark Java projects, and also motivate the need for multi-language bug localization approach. Objective: To create a novel bug localization approach that leverages the syntactic structure of source code, bug report information and which can support multi-language projects along with a new dataset of C projects. Method: The proposed DRAST approach represents source code as code vectors by using its high-level AST and combines rVSM, an IR technique with ML/DL models such as Random Forest and Deep Neural Network regressor to rank the list of buggy files. We also use features such as textual similarity using IR techniques, lexical mismatch using DNNs, and history of the project using the metadata of BugC dataset. Results: We tested DRAST on seven projects from the BugC dataset, which consists of 2462 bug reports from 21 open-source C projects. The results show that DRAST can locate correct buggy files 90% of the time from top 1, 5, and 10 suggested files with MAP and MRR scores of above 90% for the randomly selected seven projects. We also tested DRAST on Tomcat and AspectJ, projects from benchmark dataset with better results at accuracy@1, MAP and MRR when compared with state-of-the-art. Conclusions: This paper presents a novel bug localization approach that works on C and Java projects and a bug localization C dataset along with a novel source code representation. The results for C projects using DRAST are promising and could motivate researchers/practitioners to focus on developing and creating multi-language bug localization approaches.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
EmoG- Towards Emojifying Gmail Conversations
Authors:
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
Emails are one of the most frequently used medium of communication in the present day across multiple domains including industry and educational institutions. Understanding sentiments being expressed in an email could have a considerable impact on the recipients' action or response to the email. However, it is difficult to interpret emotions of the sender from pure text in which emotions are not e…
▽ More
Emails are one of the most frequently used medium of communication in the present day across multiple domains including industry and educational institutions. Understanding sentiments being expressed in an email could have a considerable impact on the recipients' action or response to the email. However, it is difficult to interpret emotions of the sender from pure text in which emotions are not explicitly present. Researchers have tried to predict customer attrition by integrating emails in client-company environment with emotions. However, most of the existing works deal with static assessment of email emotions. Presenting sentiments of emails dynamically to the reader could help in understanding senders' emotion and as well have an impact on readers' action. Hence, in this paper, we present EmoG as a Google Chrome Extension which is intended to support university students. It augments emails with emojis based on the sentiment being conveyed in the email, which might also offer faster overview of email sentiments and act as tags that could help in automatic sorting and processing of emails. Currently, EmoG has been developed to support Gmail inbox on a Google Chrome browser, and could be extended to other inboxes and browsers with ease. We have conducted a user survey with 15 university students to understand the usefulness of EmoG and received positive feedback.
△ Less
Submitted 14 October, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
A Catalogue of Game-Specific Software Nuggets
Authors:
Vartika Agrahari,
Sridhar Chimalakonda
Abstract:
With the ever-increasing use of games, game developers are expected to write efficient code supporting several qualities such as security, maintainability, and performance. However, the continuous need to update the features of games in less duration might compel the developers to use anti-patterns, code smells and quick-fix solutions that may affect the functional and non-functional requirements…
▽ More
With the ever-increasing use of games, game developers are expected to write efficient code supporting several qualities such as security, maintainability, and performance. However, the continuous need to update the features of games in less duration might compel the developers to use anti-patterns, code smells and quick-fix solutions that may affect the functional and non-functional requirements of the game. These bad practices may lead to technical debt, poor program comprehension, and can cause several issues during software maintenance. Here, in this paper, we introduce "Software Nuggets" as a concept that affects software quality in a negative way and as a superset of anti-patterns, code smells, bugs, software bad practices. We call these Software Nuggets as "G-Nuggets" in the context of games. While there exists empirical research on games, we are not aware of any work on understanding and cataloguing these G-Nuggets. Thus, we propose a catalogue of G-Nuggets by mining and analyzing 892 commits, 189 issues, and 104 pull requests from 100 open-source GitHub game repositories. We use regular expressions and thematic analysis on this dataset for cataloguing game-specific Software Nuggets. We present a catalogue of ten G-Nuggets and provide examples for them present online at: https://phoebs88.github.io/A-Catalogue-of-Game-Specific-Software-Nuggets. We believe this catalogue might be helpful for researchers for further empirical research in the domain of games as well as for game developers to improve quality of games.
△ Less
Submitted 4 May, 2021; v1 submitted 23 June, 2020;
originally announced June 2020.
-
AiR -- An Augmented Reality Application for Visualizing Air Pollution
Authors:
Noble Saji Mathews,
Sridhar Chimalakonda,
Suresh Jain
Abstract:
Air quality is a term used to describe the concentration levels of various pollutants in the air we breathe. The air quality, which is degrading rapidly across the globe, has been a source of great concern. Across the globe, governments are taking various measures to reduce air pollution. Bringing awareness about environmental pollution among the public plays a major role in controlling air pollut…
▽ More
Air quality is a term used to describe the concentration levels of various pollutants in the air we breathe. The air quality, which is degrading rapidly across the globe, has been a source of great concern. Across the globe, governments are taking various measures to reduce air pollution. Bringing awareness about environmental pollution among the public plays a major role in controlling air pollution, as the programs proposed by governments require the support of the public. Though information on air quality is present on multiple portals such as the Central Pollution Control Board (CPCB), which provides Air Quality Index that could be accessed by the public. However, such portals are scarcely visited by the general public. Visualizing air quality in the location where an individual resides could help in bringing awareness among the public. This visualization could be rendered using Augmented Reality techniques. Considering the widespread usage of Android based mobile devices in India, and the importance of air quality visualization, we present AiR, as an Android based mobile application. AiR considers the air quality measured by CPCB, in a locality that is detected by the user's GPS or in a locality of user's choice, and visualizes various air pollutants present in the locality $(PM_1{}_0, PM_2{}_.{}_5, NO_2, SO_2, CO, O_3 \& NH_3)$ and displays them in the user's surroundings. AiR also creates awareness in an interactive manner about the different pollutants, sources, and their impacts on health.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Mood of India During Covid-19 -- An Interactive Web Portal Based on Emotion Analysis of Twitter Data
Authors:
Akhila Sri Manasa Venigalla,
Dheeraj Vagavolu,
Sridhar Chimalakonda
Abstract:
The severe outbreak of Covid-19 pandemic has affected many countries across the world, and disrupted the day to day activities of many people. During such outbreaks, understanding the emotional state of citizens of a country could be of interest to various organizations to carry out tasks and to take necessary measures. Several studies have been performed on data available on various social media…
▽ More
The severe outbreak of Covid-19 pandemic has affected many countries across the world, and disrupted the day to day activities of many people. During such outbreaks, understanding the emotional state of citizens of a country could be of interest to various organizations to carry out tasks and to take necessary measures. Several studies have been performed on data available on various social media platforms and websites to understand the emotions of people against many events, inclusive of Covid-19, across the world. Twitter and other social media platforms have been bridging the gap between the citizens and government in various countries and are of more prominence in India. Sentiment Analysis of posts on twitter is observed to accurately reveal the sentiments. Analysing real time posts on twitter in India during Covid-19, could help in identifying the mood of the nation. However, most of the existing studies related to Covid-19, on twitter and other social media platforms are performed on data posted during a specific interval. We are not aware of any research that identifies emotional state of India on a daily basis. Hence, we present a web portal that aims to display mood of India during Covid-19, based on real time twitter data. This portal also enables users to select date range, specific date and state in India to display mood of people belonging to the specified region, on the specified date or during the specified date range. Also, the number of Covid-19 cases and mood of people at specific cities and states on specific dates is visualized on the country map. As of May 6 2020, the web portal has about 194370 tweets, and each of these tweets are classified into seven categories that include six basic emotions and a neutral category. A list of Trigger Events are also specified, to allow users to view the mood of India on specific events happening in the country during Covid-19.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
SurviveCovid-19 -- An Educational Game to Facilitate Habituation of Social Distancing and Other Health Measures for Covid-19 Pandemic
Authors:
Akhila Sri Manasa Venigalla,
Dheeraj Vagavolu,
Sridhar Chimalakonda
Abstract:
Covid-19 has been causing severe loss to the human race. Considering the mode of spread and severity, it is essential to make it a habit to follow various safety precautions such as using sanitizers and masks and maintaining social distancing to prevent the spread of Covid-19. Individuals are widely educated about the safety measures against the disease through various modes such as announcements…
▽ More
Covid-19 has been causing severe loss to the human race. Considering the mode of spread and severity, it is essential to make it a habit to follow various safety precautions such as using sanitizers and masks and maintaining social distancing to prevent the spread of Covid-19. Individuals are widely educated about the safety measures against the disease through various modes such as announcements through online or physical awareness campaigns, advertisements in the media and so on. The younger generations today spend considerably more time on mobile phones and games. However, there are very few applications or games aimed to help in practicing safety measures against a pandemic, which is much lesser in the case of Covid-19. Hence, we propose a 2D survival-based game, SurviveCovid-19, aimed to educate people about safety precautions to be taken for Covid-19 outside their homes by incorporating social distancing and usage of masks and sanitizers in the game. SurviveCovid-19 has been designed as an Android-based mobile game, along with a desktop (browser) version, and has been evaluated through a remote quantitative user survey, with 30 volunteers using the questionnaire based on the MEEGA+ model. The survey results are promising, with all the survey questions having a mean value greater than 3.5. The game's quality factor was 69.3, indicating that the game could be classified as excellent quality, according to the MEEGA+ model.
△ Less
Submitted 3 May, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
BuGL -- A Cross-Language Dataset for Bug Localization
Authors:
Sandeep Muvva,
A Eashaan Rao,
Sridhar Chimalakonda
Abstract:
Bug Localization is the process of locating potential error-prone files or methods from a given bug report and source code. There is extensive research on bug localization in the literature that focuses on applying information retrieval techniques or machine learning/deep learning approaches or both, to detect location of bugs. The common premise for all approaches is the availability of a good da…
▽ More
Bug Localization is the process of locating potential error-prone files or methods from a given bug report and source code. There is extensive research on bug localization in the literature that focuses on applying information retrieval techniques or machine learning/deep learning approaches or both, to detect location of bugs. The common premise for all approaches is the availability of a good dataset, which in this case, is the standard benchmark dataset that comprises of 6 Java projects and in some cases, more than 6 Java projects. The existing dataset do not comprise projects of other programming languages, despite of the need to investigate specific and cross project bug localization. To the best of our knowledge, we are not aware of any dataset that addresses this concern. In this paper, we present BuGL, a large-scale cross-language dataset. BuGL constitutes of more than 10,000 bug reports drawn from open-source projects written in four programming languages, namely C, C++, Java, and Python. The dataset consists of information which includes Bug Reports and Pull-Requests. BuGL aims to unfold new research opportunities in the area of bug localization.
△ Less
Submitted 19 April, 2020;
originally announced April 2020.
-
An Exploratory Study of Code Smells in Web Games
Authors:
Vartika Agrahari,
Sridhar Chimalakonda
Abstract:
With the continuous growth of the internet market, games are becoming more and more popular worldwide. However, increased market competition for game demands developers to write more efficient games in terms of performance, security, and maintenance. The continuous evolution of software systems and its increasing complexity may result in bad design decisions. Researchers analyzed the cognitive, be…
▽ More
With the continuous growth of the internet market, games are becoming more and more popular worldwide. However, increased market competition for game demands developers to write more efficient games in terms of performance, security, and maintenance. The continuous evolution of software systems and its increasing complexity may result in bad design decisions. Researchers analyzed the cognitive, behavioral and social effects of games. Also, gameplay and game mechanics have been a research area to enhance game playing, but to the extent of our knowledge, there hardly exists any research work that studies the bad coding practices in game development. Hence, through our study, we try to analyze and identify the presence of bad coding practices called code smells that may cause quality issues in games. To accomplish this, we created a dataset of 361 web games written in JavaScript. On this dataset, we run a JavaScript code smell detection tool JSNose to find the occurrence and distribution of code smell in web games. Further, we did a manual study on 9 web games to find violation of existing game programming patterns. Our results show that existing tools are mostly language-specific and are not enough in the context of games as they were not able to detect the anti-patterns or bad coding practices that are game-specific, motivating the need of game-specific code smell detection tools.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
StackEmo-Towards Enhancing User Experience by Augmenting Stack Overflow with Emojis
Authors:
Akhila Sri Manasa Venigalla,
Sridhar Chimalakonda
Abstract:
With the increase in acceptance of open source platforms for knowledge sharing, Question and Answer (Q\&A) websites such as Stack Overflow have become increasingly popular in the programming domain. Many novice programmers visit Stack Overflow for reasons that include posing questions, finding answers for issues they come across in the process of programming. Practitioners voluntarily answer quest…
▽ More
With the increase in acceptance of open source platforms for knowledge sharing, Question and Answer (Q\&A) websites such as Stack Overflow have become increasingly popular in the programming domain. Many novice programmers visit Stack Overflow for reasons that include posing questions, finding answers for issues they come across in the process of programming. Practitioners voluntarily answer questions on Stack Overflow based on their experience or prior knowledge. Most of these answers are also accompanied by comments from users of Stack Overflow. Questions, answers and comments on Stack Overflow also include sentiments of users, which when analysed and presented could motivate users in reading and contributing to the posts. However, the sentiment of these posts is not being depicted in the current Stack Overflow platform. There is extensive research on analysing sentiments on social networking platforms such as twitter. Representing sentiment of a post might motivate users to follow or answer certain posts. While there exist several tools that augment or annotate Stack Overflow platform for developers, we are not aware of tools that deal with sentiment of the posts. In this paper, we propose StackEmo as a Google Chrome plugin to augment comments on Stack Overflow with emojis, based on the sentiment of the comments posted, with the aim to provide users with visual cues that could motivate the users to review and contribute to available comments. We evaluated StackEmo through an in-user likert scale based survey with 30 university students. The results of the survey provided us insights on improving StackEmo, with 83% participants having recommended the plugin to their peers.
△ Less
Submitted 17 June, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
GE852: A Dataset of 852 Game Engines
Authors:
Chaitanya S. Lakkundi,
Vartika Agrahari,
Sridhar Chimalakonda
Abstract:
Game engines provide a platform for developers to build games with an interface tailored to handle the complexity during game development. To reduce effort and improve quality of game development, there is a strong need to understand and analyze the quality of game engines and their various aspects such as API usability, code quality, code reuse and so on. To the best our knowledge, we are not awa…
▽ More
Game engines provide a platform for developers to build games with an interface tailored to handle the complexity during game development. To reduce effort and improve quality of game development, there is a strong need to understand and analyze the quality of game engines and their various aspects such as API usability, code quality, code reuse and so on. To the best our knowledge, we are not aware of any dataset that caters to game engines in the literature. To this end, we present GE852, a dataset of 852 game engine repositories mined from GitHub in two languages, namely Java and C++. The dataset contains metadata of all the mined repositories including commits, pull requests, issues and so on. We believe that our dataset can lay foundation for empirical investigation in the area of game engines.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
A Family of Software Product Lines in Educational Technologies
Authors:
Sridhar Chimalakonda,
Kesav V. Nori
Abstract:
Rapid advances in education domain demand the design and customization of educational technologies for a large scale and variety of evolving requirements. Here, scale is the number of systems to be developed and variety stems from a diversified range of instructional designs such as varied goals, processes, content, teacher styles, learner styles and, also for eLearning Systems for 22 Indian Langu…
▽ More
Rapid advances in education domain demand the design and customization of educational technologies for a large scale and variety of evolving requirements. Here, scale is the number of systems to be developed and variety stems from a diversified range of instructional designs such as varied goals, processes, content, teacher styles, learner styles and, also for eLearning Systems for 22 Indian Languages and variants. In this paper, we present a family of software product lines as an approach to address this challenge of modeling a family of instructional designs as well as a family of eLearning Systems and demonstrate it for the case of adult literacy in India (287 million learners). We present a multi-level product line that connects product lines at multiple levels of granularity in education domain. We then detail two concrete product lines (http://rice.iiit.ac.in), one that generates instructional design editors and two, which generates a family of eLearning Systems based on flexible instructional designs. Finally, we demonstrate our approach by generating eLearning Systems for Hindi and Telugu languages (both web and android versions), which led to significant cost savings of 29 person months for 9 eLearning Systems.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
An Ontology Based Modeling Framework for Design of Educational Technologies
Authors:
Sridhar Chimalakonda,
Kesav V. Nori
Abstract:
Despite rapid progress, most of the educational technologies today lack a strong instructional design knowledge basis leading to questionable quality of instruction. In addition, a major challenge is to customize these educational technologies for a wide range of instructional designs. Ontologies are one of the pertinent mechanisms to represent instructional design in the literature. However, exis…
▽ More
Despite rapid progress, most of the educational technologies today lack a strong instructional design knowledge basis leading to questionable quality of instruction. In addition, a major challenge is to customize these educational technologies for a wide range of instructional designs. Ontologies are one of the pertinent mechanisms to represent instructional design in the literature. However, existing approaches do not support modeling of flexible instructional designs. To address this problem, in this paper, we propose an ontology based framework for systematic modeling of different aspects of instructional design knowledge based on domain patterns. As part of the framework, we present ontologies for modeling goals, instructional processes and instructional materials. We demonstrate the ontology framework by presenting instances of the ontology for the large scale case study of adult literacy in India (287 million learners spread across 22 Indian Languages), which requires creation of 1000 similar but varied eLearning Systems based on flexible instructional designs. The implemented framework is available at http://rice.iiit.ac.in and is transferred to National Literacy Mission of Government of India. This framework could be used for modeling instructional design knowledge of systems for skills, school education and beyond.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
A Patterns Based Approach for Design of Educational Technologies
Authors:
Sridhar Chimalakonda,
Kesav V. Nori
Abstract:
Instructional design is a fundamental base for educational technologies as it lays the foundation to facilitate learning and teaching based on pedagogical underpinnings. However, most of the educational technologies today face two core challenges in this context: (i) lack of instructional design as a basis (ii) lack of support for a variety of instructional designs. In order to address these chall…
▽ More
Instructional design is a fundamental base for educational technologies as it lays the foundation to facilitate learning and teaching based on pedagogical underpinnings. However, most of the educational technologies today face two core challenges in this context: (i) lack of instructional design as a basis (ii) lack of support for a variety of instructional designs. In order to address these challenges, we propose a patterns based approach for design of educational technologies. This is in contrast with existing literature that focuses either on patterns in education or in software, and not both. The core idea of our approach is to leverage patterns for modeling instructional design knowledge and to connect it with patterns in software architecture. We discuss different categories of patterns in instructional design. We then present the notion of Pattern-Oriented Instructional Design (POID) as a way to model instructional design as a connection of patterns (GoalPattern, ProcessPattern, ContentPattern) and integrate it with Pattern-Oriented Software Architecture (POSA) based on fundamental principles in software engineering. We demonstrate our approach through adult literacy case study (287 million learners, 22 Indian Languages and a variety of instructional designs). The results of our approach (both web and mobile versions) are available at http://rice.iiit.ac.in and were adopted by National Literacy Mission Authority of Government of India.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.