-
Handwriting Analysis on the Diaries of Rosamond Jacob
Authors:
Sharmistha S. Sawant,
Saloni D. Thakare,
Derek Greene,
Gerardine Meaney,
Alan F. Smeaton
Abstract:
Handwriting is an art form that most people learn at an early age. Each person's writing style is unique with small changes as we grow older and as our mood changes. Here we analyse handwritten text in a culturally significant personal diary. We compare changes in handwriting and relate this to the sentiment of the written material and to the topic of diary entries. We identify handwritten text fr…
▽ More
Handwriting is an art form that most people learn at an early age. Each person's writing style is unique with small changes as we grow older and as our mood changes. Here we analyse handwritten text in a culturally significant personal diary. We compare changes in handwriting and relate this to the sentiment of the written material and to the topic of diary entries. We identify handwritten text from digitised images and generate a canonical form for words using shape matching to compare how the same handwritten word appears over a period of time. For determining the sentiment of diary entries, we use the Hedonometer, a dictionary-based approach to scoring sentiment. We apply these techniques to the historical diary entries of Rosamond Jacob (1888-1960), an Irish writer and political activist whose daily diary entries report on the major events in Ireland during the first half of the last century.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Auto Tuning of Hadoop and Spark parameters
Authors:
Tanuja Patanshetti,
Ashish Anil Pawar,
Disha Patel,
Sanket Thakare
Abstract:
Data of the order of terabytes, petabytes, or beyond is known as Big Data. This data cannot be processed using the traditional database software, and hence there comes the need for Big Data Platforms. By combining the capabilities and features of various big data applications and utilities, Big Data Platforms form a single solution. It is a platform that helps to develop, deploy and manage the big…
▽ More
Data of the order of terabytes, petabytes, or beyond is known as Big Data. This data cannot be processed using the traditional database software, and hence there comes the need for Big Data Platforms. By combining the capabilities and features of various big data applications and utilities, Big Data Platforms form a single solution. It is a platform that helps to develop, deploy and manage the big data environment. Hadoop and Spark are the two open-source Big Data Platforms provided by Apache. Both these platforms have many configurational parameters, which can have unforeseen effects on the execution time, accuracy, etc. Manual tuning of these parameters can be tiresome, and hence automatic ways should be needed to tune them. After studying and analyzing various previous works in automating the tuning of these parameters, this paper proposes two algorithms - Grid Search with Finer Tuning and Controlled Random Search. The performance indicator studied in this paper is Execution Time. These algorithms help to tune the parameters automatically. Experimental results have shown a reduction in execution time of about 70% and 50% for Hadoop and 81.19% and 77.77% for Spark by Grid Search with Finer Tuning and Controlled Random Search, respectively.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Redefining measures of Layered Architecture
Authors:
Sanjay Thakare,
Arvind W Kiwelekar
Abstract:
Layered architecture represents the software structure in the form of layers. Every element in the software is assigned to one of the layers such that the relationship amongst the elements is maintained. A set of design principles rules the process of construction of the layered architecture. Various statistical measures have been defined to check whether the layered architecture of a given softwa…
▽ More
Layered architecture represents the software structure in the form of layers. Every element in the software is assigned to one of the layers such that the relationship amongst the elements is maintained. A set of design principles rules the process of construction of the layered architecture. Various statistical measures have been defined to check whether the layered architecture of a given software is following these design principles or not. In this paper, we redefine the measures of layered architecture based on the relationship between the software components. The measures check for the violations committed regarding the back calls, skip calls, and cyclic structures. Further, we also introduce a new measure to verify the logical separation amongst the layers. The system's current architecture is extracted from the source code and represented using a three-tier layered structure, which is the defacto standard architecture of Java applications. The redefined measures are applied to determine the conformance of layering principles in the system. We evaluate five different software systems for their architecture consistency. The results obtained on our redefined measures are compared to those obtained by applying the standard set of measures. A quantitative analysis of the proposed measures is performed, and we conclude that they can determine the consideration of layering principles followed during the development of a software system.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
Discovery of Layered Software Architecture from Source Code Using Ego Networks
Authors:
Sanjay Thakare,
Arvind W Kiwelekar
Abstract:
Software architecture refers to the high-level abstraction of a system including the configuration of the involved elements and the interactions and relationships that exist between them. Source codes can be easily built by referring to the software architectures. However, the reverse process i.e. derivation of the software architecture from the source code is a challenging task. Further, such an…
▽ More
Software architecture refers to the high-level abstraction of a system including the configuration of the involved elements and the interactions and relationships that exist between them. Source codes can be easily built by referring to the software architectures. However, the reverse process i.e. derivation of the software architecture from the source code is a challenging task. Further, such an architecture consists of multiple layers, and distributing the existing elements into these layers should be done accurately and efficiently. In this paper, a novel approach is presented for the recovery of layered architectures from Java-based software systems using the concept of ego networks. Ego networks have traditionally been used for social network analysis, but in this paper, they are modified in a particular way and tuned to suit the mentioned task. Specifically, a dependency network is extracted from the source code to create an ego network. The ego network is processed to create and optimize ego layers in a particular structure. These ego layers when integrated and optimized together give the final layered architecture. The proposed approach is evaluated in two ways: on static versions of three open-source software, and a continuously evolving software system. The distribution of nodes amongst the proposed layers and the committed violations are observed on both class level and package level. The proposed method is seen to outperform the existing standard approaches over multiple performance metrics. We also carry out the analysis of variation in the results concerning the change in the node selection strategy and the frequency. The empirical observations show promising signs for recovering software architecture layers from source codes using this technique and also extending it further to other languages and software.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
Recovery and Analysis of Architecture Descriptions using Centrality Measures
Authors:
Sanjay Thakare,
Arvind W Kiwelekar
Abstract:
The necessity of an explicit architecture description has been continuously emphasized to communicate the system functionality and for system maintenance activities. This paper presents an approach to extract architecture descriptions using the {\em centrality measures} from the theory of Social Network Analysis. The architecture recovery approach presented in this paper works in two phases. The f…
▽ More
The necessity of an explicit architecture description has been continuously emphasized to communicate the system functionality and for system maintenance activities. This paper presents an approach to extract architecture descriptions using the {\em centrality measures} from the theory of Social Network Analysis. The architecture recovery approach presented in this paper works in two phases. The first phase aims to calculate centrality measures for each program element in the system. The second phase assumes that the system has been designed around the layered architecture style and assigns layers to each program element. Two techniques to assign program elements are presented. The first technique of layer assignment uses a set of pre-defined rules, while the second technique learns the rules of assignment from a pre-labelled data set. The paper presents the evaluation of both approaches.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.