-
Complex Networks for Pattern-Based Data Classification
Authors:
Josimar Chire,
Khalid Mahmood,
Zhao Liang
Abstract:
Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing…
▽ More
Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing internal relationships and class structures, enabling High-Level Classification. Although several complex network-based classification techniques have been proposed, high-level classification by leveraging pattern formation to classify data has not been utilized. In this work, we present two network-based classification techniques utilizing unique measures derived from the Minimum Spanning Tree and Single Source Shortest Path. These network measures are evaluated from the data patterns represented by the inherent network constructed from each class. We have applied our proposed techniques to several data classification scenarios including synthetic and real-world datasets. Compared to the existing classic high-level and machine-learning classification techniques, we have observed promising numerical results for our proposed approaches. Furthermore, the proposed models demonstrate the following distinguished features in comparison to the previous high-level classification techniques: (1) A single network measure is introduced to characterize the data pattern, eliminating the need to determine weight parameters among network measures. Therefore, the model is largely simplified, while obtaining better classification results. (2) The metrics proposed are sensitive and used for classification with competitive results.
△ Less
Submitted 25 February, 2025;
originally announced March 2025.
-
The Importance of Open Data Policy to Tackle Pandemic in Latin America
Authors:
Josimar Chire
Abstract:
Open Data Policies can provide transparency, impulse innovation and citizenship participation. Access to the right data in right time can produce huge benefits to population. But, in Latin America there is not enough interest from governments to promote and use properly. By the other hand, global pandemic has caused many damages in different levels, i.e. Economy, Public Health, Education, etc. The…
▽ More
Open Data Policies can provide transparency, impulse innovation and citizenship participation. Access to the right data in right time can produce huge benefits to population. But, in Latin America there is not enough interest from governments to promote and use properly. By the other hand, global pandemic has caused many damages in different levels, i.e. Economy, Public Health, Education, etc. The paper opens a discussion about the importance of Open Data Policy to mitigate the impact of Covid-19 and overpass this problem.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Improve High Level Classification with a More Sensitive metric and Optimization approach for Complex Network Building
Authors:
Josimar Chire
Abstract:
Complex Networks are a good approach to find internal relationships and represent the structure of classes in a dataset then they are used for High Level Classification. Previous works use K-Nearest Neighbors to build each Complex Network considering all the available samples. This paper introduces a different creation of Complex Networks, considering only sample which belongs to each class. And m…
▽ More
Complex Networks are a good approach to find internal relationships and represent the structure of classes in a dataset then they are used for High Level Classification. Previous works use K-Nearest Neighbors to build each Complex Network considering all the available samples. This paper introduces a different creation of Complex Networks, considering only sample which belongs to each class. And metric is used to analyze the structure of Complex Networks, besides an optimization approach to improve the performance is presented. Experiments are executed considering a cross validation process, the optimization approach is performed using grid search and Genetic Algorithm, this process can improve the results up to 10%.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Comparative analysis of the government plans of the Peruvian presidential candidates, SDO(UN) and State Policies of the National Agreement based on NLP
Authors:
Honorio Apaza Alanoca,
Josimar Chire,
Jimy Oblitas
Abstract:
The analysis of government proposal during elections from political parties is vital to choose the next authorities in any city or country. In this paper, we use a text mining approach to analyze the documents and provide an easy visualization to support an easy analysis. Besides, a comparison with a national plan based on sustainable development objectives of UN(United Nations) from 2030 Agenda i…
▽ More
The analysis of government proposal during elections from political parties is vital to choose the next authorities in any city or country. In this paper, we use a text mining approach to analyze the documents and provide an easy visualization to support an easy analysis. Besides, a comparison with a national plan based on sustainable development objectives of UN(United Nations) from 2030 Agenda is perfomed using Natural Language techniques.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Characterization of Covid-19 Dataset using Complex Networks and Image Processing
Authors:
Josimar Chire,
Esteban Wilfredo Vilca Zuniga
Abstract:
This paper aims to explore the structure of pattern behind covid-19 dataset. The dataset includes medical images with positive and negative cases. A sample of 100 sample is chosen, 50 per each class. An histogram frequency is calculated to get features using statistical measurements, besides a feature extraction using Grey Level Co-Occurrence Matrix (GLCM). Using both features are build Complex Ne…
▽ More
This paper aims to explore the structure of pattern behind covid-19 dataset. The dataset includes medical images with positive and negative cases. A sample of 100 sample is chosen, 50 per each class. An histogram frequency is calculated to get features using statistical measurements, besides a feature extraction using Grey Level Co-Occurrence Matrix (GLCM). Using both features are build Complex Networks respectively to analyze the adjacency matrices and check the presence of patterns. Initial experiments introduces the evidence of hidden patterns in the dataset for each class, which are visible using Complex Networks representation.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.