-
Trace duality and additive complementary pairs of additive cyclic codes over finite chain rings
Authors:
Sanjit Bhowmick,
Kuntal Deka,
Alexandre Fotue Tabue,
Edgar Martínez-Moro
Abstract:
This paper investigates the algebraic structure of additive complementary pairs of cyclic codes over a finite commutative ring. We demonstrate that for every additive complementary pair of additive cyclic codes, both constituent codes are free modules. Moreover, we present a necessary and sufficient condition for a pair of additive cyclic codes over a finite commutative ring to form an additive co…
▽ More
This paper investigates the algebraic structure of additive complementary pairs of cyclic codes over a finite commutative ring. We demonstrate that for every additive complementary pair of additive cyclic codes, both constituent codes are free modules. Moreover, we present a necessary and sufficient condition for a pair of additive cyclic codes over a finite commutative ring to form an additive complementary pair. Finally, we construct a complementary pair of additive cyclic codes over a finite chain ring and show that one of the codes is permutation equivalent to the trace dual of the other.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
How Cohesive Are Community Search Results on Online Social Networks?: An Experimental Evaluation
Authors:
Yining Zhao,
Sourav S Bhowmick,
Nastassja L. Fischer,
SH Annabel Chen
Abstract:
Recently, numerous community search methods for large graphs have been proposed, at the core of which is defining and measuring cohesion. This paper experimentally evaluates the effectiveness of these community search algorithms w.r.t. cohesiveness in the context of online social networks. Social communities are formed and developed under the influence of group cohesion theory, which has been exte…
▽ More
Recently, numerous community search methods for large graphs have been proposed, at the core of which is defining and measuring cohesion. This paper experimentally evaluates the effectiveness of these community search algorithms w.r.t. cohesiveness in the context of online social networks. Social communities are formed and developed under the influence of group cohesion theory, which has been extensively studied in social psychology. However, current generic methods typically measure cohesiveness using structural or attribute-based approaches and overlook domain-specific concepts such as group cohesion. We introduce five novel psychology-informed cohesiveness measures, based on the concept of group cohesion from social psychology, and propose a novel framework called CHASE for evaluating eight representative community search algorithms w.r.t. these measures on online social networks. Our analysis reveals that there is no clear correlation between structural and psychological cohesiveness, and no algorithm effectively identifies psychologically cohesive communities in online social networks. This study provides new insights that could guide the development of future community search methods.
△ Less
Submitted 1 May, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance
Authors:
Kunal Singh,
Ankan Biswas,
Sayandeep Bhowmick,
Pradeep Moturi,
Siva Kishore Gollapalli
Abstract:
We propose Step-by-Step Coding (SBSC): a multi-turn math reasoning framework that enables Large Language Models (LLMs) to generate sequence of programs for solving Olympiad level math problems. At each step/turn, by leveraging the code execution outputs and programs of previous steps, the model generates the next sub-task and the corresponding program to solve it. This way, SBSC, sequentially navi…
▽ More
We propose Step-by-Step Coding (SBSC): a multi-turn math reasoning framework that enables Large Language Models (LLMs) to generate sequence of programs for solving Olympiad level math problems. At each step/turn, by leveraging the code execution outputs and programs of previous steps, the model generates the next sub-task and the corresponding program to solve it. This way, SBSC, sequentially navigates to reach the final answer. SBSC allows more granular, flexible and precise approach to problem-solving compared to existing methods. Extensive experiments highlight the effectiveness of SBSC in tackling competition and Olympiad-level math problems. For Claude-3.5-Sonnet, we observe SBSC (greedy decoding) surpasses existing state-of-the-art (SOTA) program generation based reasoning strategies by absolute 10.7% on AMC12, 8% on AIME and 12.6% on MathOdyssey. Given SBSC is multi-turn in nature, we also benchmark SBSC's greedy decoding against self-consistency decoding results of existing SOTA math reasoning strategies and observe performance gain by absolute 6.2% on AMC, 6.7% on AIME and 7.4% on MathOdyssey.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
SprayCraft: Graph-Based Route Optimization for Variable Rate Precision Spraying
Authors:
Kiran K. Kethineni,
Saraju P. Mohanty,
Elias Kougianos,
Sanjukta Bhowmick,
Laavanya Rachakonda
Abstract:
To efficiently manage plant diseases, Agriculture Cyber-Physical Systems (A-CPS) have been developed to detect and localize disease infestations by integrating the Internet of Agro-Things (IoAT). By the nature of plant and pathogen interactions, the spread of a disease appears as a focus with density of infected plants and intensity of infection diminishing outwards. This gradient of infection nee…
▽ More
To efficiently manage plant diseases, Agriculture Cyber-Physical Systems (A-CPS) have been developed to detect and localize disease infestations by integrating the Internet of Agro-Things (IoAT). By the nature of plant and pathogen interactions, the spread of a disease appears as a focus with density of infected plants and intensity of infection diminishing outwards. This gradient of infection needs variable rate and precision pesticide spraying to efficiently utilize resources and effectively handle the diseases. This article, SprayCraft presents a graph based method for disease management A-CPS to identify disease hotspots and compute near optimal path for a spraying drone to perform variable rate precision spraying. It uses graph to represent the diseased locations and their spatial relation, Message Passing is performed over the graph to compute the probability of a location to be a disease hotspot. These probabilities also serve as disease intensity measures and are used for variable rate spraying at each location. Whereas, the graph is utilized to compute tour path by considering it as Traveling Salesman Problem (TSP) for precision spraying by the drone. Proposed method has been validated on synthetic data of locations of diseased locations in a farmland.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
PANACEA: Towards Influence-driven Profiling of Drug Target Combinations in Cancer Signaling Networks
Authors:
Baihui Xu,
Sourav S Bhowmick,
Jiancheng Hu
Abstract:
Data profiling has garnered increasing attention within the data science community, primarily focusing on structured data. In this paper, we introduce a novel framework called panacea, designed to profile known cancer target combinations in cancer type-specific signaling networks. Given a large signaling network for a cancer type, known targets from approved anticancer drugs, a set of cancer mutat…
▽ More
Data profiling has garnered increasing attention within the data science community, primarily focusing on structured data. In this paper, we introduce a novel framework called panacea, designed to profile known cancer target combinations in cancer type-specific signaling networks. Given a large signaling network for a cancer type, known targets from approved anticancer drugs, a set of cancer mutated genes, and a combination size parameter k, panacea automatically generates a delta histogram that depicts the distribution of k-sized target combinations based on their topological influence on cancer mutated genes and other nodes. To this end, we formally define the novel problem of influence-driven target combination profiling (i-TCP) and propose an algorithm that employs two innovative personalized PageRank-based measures, PEN distance and PEN-diff, to quantify this influence and generate the delta histogram. Our experimental studies on signaling networks related to four cancer types demonstrate that our proposed measures outperform several popular network properties in profiling known target combinations. Notably, we demonstrate that panacea can significantly reduce the candidate k-node combination exploration space, addressing a longstanding challenge for tasks such as in silico target combination prediction in large cancer-specific signaling networks.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Physics-informed AI and ML-based sparse system identification algorithm for discovery of PDE's representing nonlinear dynamic systems
Authors:
Ashish Pal,
Sutanu Bhowmick,
Satish Nagarajaiah
Abstract:
Sparse system identification of nonlinear dynamic systems is still challenging, especially for stiff and high-order differential equations for noisy measurement data. The use of highly correlated functions makes distinguishing between true and false functions difficult, which limits the choice of functions. In this study, an equation discovery method has been proposed to tackle these problems. The…
▽ More
Sparse system identification of nonlinear dynamic systems is still challenging, especially for stiff and high-order differential equations for noisy measurement data. The use of highly correlated functions makes distinguishing between true and false functions difficult, which limits the choice of functions. In this study, an equation discovery method has been proposed to tackle these problems. The key elements include a) use of B-splines for data fitting to get analytical derivatives superior to numerical derivatives, b) sequentially regularized derivatives for denoising (SRDD) algorithm, highly effective in removing noise from signal without system information loss, c) uncorrelated component analysis (UCA) algorithm that identifies and eliminates highly correlated functions while retaining the true functions, and d) physics-informed spline fitting (PISF) where the spline fitting is updated gradually while satisfying the governing equation with a dictionary of candidate functions to converge to the correct equation sequentially. The complete framework is built on a unified deep-learning architecture that eases the optimization process. The proposed method is demonstrated to discover various differential equations at various noise levels, including three-dimensional, fourth-order, and stiff equations. The parameter estimation converges accurately to the true values with a small coefficient of variation, suggesting robustness to the noise.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Sampling-Based Attack for Centrality Disruption in Complex Networks
Authors:
Fariba Afrin Irany,
Soumya Sarakar,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
Many mobile networks are represented as graphs to obtain insight to their connectivity and transmission properties. Among these properties centrality resilience, that is, how well centralities, such as closeness and betweennesss, are maintained under attacks is a critical factor for proper functioning of a network. In this paper, we study the centrality resilience of complex networks by developing…
▽ More
Many mobile networks are represented as graphs to obtain insight to their connectivity and transmission properties. Among these properties centrality resilience, that is, how well centralities, such as closeness and betweennesss, are maintained under attacks is a critical factor for proper functioning of a network. In this paper, we study the centrality resilience of complex networks by developing attack models to disrupt the rank of the top path-based centrality vertices. To develop our attack models, we extend the concept of rich clubs of influential vertices to the more general framework of scattered rich clubs. We define scattered rich clubs as dense subgraphs of high centrality vertices that are spread (scattered) across the network. Finding scattered rich clubs, although of polynomial time complexity, is extremely expensive computationally. We use snowball sampling to identify these important substructures as well as to identify which edges to target in our proposed attack models. Our results over a set of real world networks demonstrate that our proposed algorithm is effective in finding the single or scattered rich clubs efficiently and in successfully disrupting the centrality rankings of the network. To summarize, we propose sampling-based attack models for testing the resilience of networks with respect to centrality rankings. As part of this process, we introduce scattered rich clubs, a generalized form of the rich club model, efficient algorithms to detect them, and demonstrate their relation to network resilience.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Linear complementary pairs of codes over a finite non-commutative Frobenius ring
Authors:
Sanjit Bhowmick,
Xiusheng Liu
Abstract:
In this paper, we study linear complementary pairs (LCP) of codes over finite non-commutative local rings. We further provide a necessary and sufficient condition for a pair of codes $(C,D)$ to be LCP of codes over finite non-commutative Frobenius rings. The minimum distances $d(C)$ and $d(D^\perp)$ are defined as the security parameter for an LCP of codes $(C, D).$ It was recently demonstrated th…
▽ More
In this paper, we study linear complementary pairs (LCP) of codes over finite non-commutative local rings. We further provide a necessary and sufficient condition for a pair of codes $(C,D)$ to be LCP of codes over finite non-commutative Frobenius rings. The minimum distances $d(C)$ and $d(D^\perp)$ are defined as the security parameter for an LCP of codes $(C, D).$ It was recently demonstrated that if $C$ and $D$ are both $2$-sided LCP of group codes over a finite commutative Frobenius rings, $D^\perp$ and $C$ are permutation equivalent in \cite{LL23}. As a result, the security parameter for a $2$-sided group LCP $(C, D)$ of codes is simply $d(C)$. Towards this, we deliver an elementary proof of the fact that for a linear complementary pair of codes $(C,D)$, where $C$ and $D$ are linear codes over finite non-commutative Frobenius rings, under certain conditions, the dual code $D^\perp$ is equivalent to $C.$
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Compressed Image Captioning using CNN-based Encoder-Decoder Framework
Authors:
Md Alif Rahman Ridoy,
M Mahmud Hasan,
Shovon Bhowmick
Abstract:
In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image captioning is vast. It can significantly boost the accuracy of search engines, making it easier to find relevant information. Moreover, it can greatly enhance access…
▽ More
In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image captioning is vast. It can significantly boost the accuracy of search engines, making it easier to find relevant information. Moreover, it can greatly enhance accessibility for visually impaired individuals, providing them with a more immersive experience of digital content. However, despite its promise, image captioning presents several challenges. One major hurdle is extracting meaningful visual information from images and transforming it into coherent language. This requires bridging the gap between the visual and linguistic domains, a task that demands sophisticated algorithms and models. Our project is focused on addressing these challenges by developing an automatic image captioning architecture that combines the strengths of convolutional neural networks (CNNs) and encoder-decoder models. The CNN model is used to extract the visual features from images, and later, with the help of the encoder-decoder framework, captions are generated. We also did a performance comparison where we delved into the realm of pre-trained CNN models, experimenting with multiple architectures to understand their performance variations. In our quest for optimization, we also explored the integration of frequency regularization techniques to compress the "AlexNet" and "EfficientNetB0" model. We aimed to see if this compressed model could maintain its effectiveness in generating image captions while being more resource-efficient.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Additive Complementary Pairs of Codes
Authors:
Sanjit Bhowmick,
Deepak Kumar Dalai
Abstract:
An additive code is an $\mathbb{F}_q$-linear subspace of $\mathbb{F}_{q^m}^n$ over $\mathbb{F}_{q^m}$, which is not a linear subspace over $\mathbb{F}_{q^m}$. Linear complementary pairs (LCP) of codes have important roles in cryptography, such as increasing the speed and capacity of digital communication and strengthening security by improving the encryption necessities to resist cryptanalytic att…
▽ More
An additive code is an $\mathbb{F}_q$-linear subspace of $\mathbb{F}_{q^m}^n$ over $\mathbb{F}_{q^m}$, which is not a linear subspace over $\mathbb{F}_{q^m}$. Linear complementary pairs (LCP) of codes have important roles in cryptography, such as increasing the speed and capacity of digital communication and strengthening security by improving the encryption necessities to resist cryptanalytic attacks. This paper studies an algebraic structure of additive complementary pairs (ACP) of codes over $\mathbb{F}_{q^m}$. Further, we characterize an ACP of codes in analogous generator matrices and parity check matrices. Additionally, we identify a necessary condition for an ACP of codes. Besides, we present some constructions of an ACP of codes over $\mathbb{F}_{q^m}$ from LCP codes over $\mathbb{F}_{q^m}$ and also from an LCP of codes over $\mathbb{F}_{q}$. Finally, we study the constacyclic ACP of codes over $\mathbb{F}_{q^m}$ and the counting of the constacyclic ACP of codes.
△ Less
Submitted 25 September, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
On Linear Complementary Pairs of Algebraic Geometry Codes over Finite Fields
Authors:
Sanjit Bhowmick,
Deepak Kumar Dalai,
Sihem Mesnager
Abstract:
Linear complementary dual (LCD) codes and linear complementary pairs (LCP) of codes have been proposed for new applications as countermeasures against side-channel attacks (SCA) and fault injection attacks (FIA) in the context of direct sum masking (DSM). The countermeasure against FIA may lead to a vulnerability for SCA when the whole algorithm needs to be masked (in environments like smart cards…
▽ More
Linear complementary dual (LCD) codes and linear complementary pairs (LCP) of codes have been proposed for new applications as countermeasures against side-channel attacks (SCA) and fault injection attacks (FIA) in the context of direct sum masking (DSM). The countermeasure against FIA may lead to a vulnerability for SCA when the whole algorithm needs to be masked (in environments like smart cards). This led to a variant of the LCD and LCP problems, where several results have been obtained intensively for LCD codes, but only partial results have been derived for LCP codes. Given the gap between the thin results and their particular importance, this paper aims to reduce this by further studying the LCP of codes in special code families and, precisely, the characterisation and construction mechanism of LCP codes of algebraic geometry codes over finite fields. Notably, we propose constructing explicit LCP of codes from elliptic curves. Besides, we also study the security parameters of the derived LCP of codes $(\mathcal{C}, \mathcal{D})$ (notably for cyclic codes), which are given by the minimum distances $d(\mathcal{C})$ and $d(\mathcal{D}^\perp)$. Further, we show that for LCP algebraic geometry codes $(\mathcal{C},\mathcal{D})$, the dual code $\mathcal{C}^\perp$ is equivalent to $\mathcal{D}$ under some specific conditions we exhibit. Finally, we investigate whether MDS LCP of algebraic geometry codes exist (MDS codes are among the most important in coding theory due to their theoretical significance and practical interests). Construction schemes for obtaining LCD codes from any algebraic curve were given in 2018 by Mesnager, Tang and Qi in [``Complementary dual algebraic geometry codes", IEEE Trans. Inform Theory, vol. 64(4), 2390--3297, 2018]. To our knowledge, it is the first time LCP of algebraic geometry codes has been studied.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Influence Maximization in Social Networks: A Survey
Authors:
Hui Li,
Susu Yang,
Mengting Xu,
Sourav S Bhowmick,
Jiangtao Cui
Abstract:
Online social networks have become an important platform for people to communicate, share knowledge and disseminate information. Given the widespread usage of social media, individuals' ideas, preferences and behavior are often influenced by their peers or friends in the social networks that they participate in. Since the last decade, influence maximization (IM) problem has been extensively adopte…
▽ More
Online social networks have become an important platform for people to communicate, share knowledge and disseminate information. Given the widespread usage of social media, individuals' ideas, preferences and behavior are often influenced by their peers or friends in the social networks that they participate in. Since the last decade, influence maximization (IM) problem has been extensively adopted to model the diffusion of innovations and ideas. The purpose of IM is to select a set of k seed nodes who can influence the most individuals in the network.
In this survey, we present a systematical study over the researches and future directions with respect to IM problem. We review the information diffusion models and analyze a variety of algorithms for the classic IM algorithms. We propose a taxonomy for potential readers to understand the key techniques and challenges. We also organize the milestone works in time order such that the readers of this survey can experience the research roadmap in this field. Moreover, we also categorize other application-oriented IM studies and correspondingly study each of them. What's more, we list a series of open questions as the future directions for IM-related researches, where a potential reader of this survey can easily observe what should be done next in this field.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
DKWS: A Distributed System for Keyword Search on Massive Graphs (Complete Version)
Authors:
Jiaxin Jiang,
Byron Choi,
Xin Huang,
Jianliang Xu,
Sourav S Bhowmick
Abstract:
Due to the unstructuredness and the lack of schemas of graphs, such as knowledge graphs, social networks, and RDF graphs, keyword search for querying such graphs has been proposed. As graphs have become voluminous, large-scale distributed processing has attracted much interest from the database research community. While there have been several distributed systems, distributed querying techniques f…
▽ More
Due to the unstructuredness and the lack of schemas of graphs, such as knowledge graphs, social networks, and RDF graphs, keyword search for querying such graphs has been proposed. As graphs have become voluminous, large-scale distributed processing has attracted much interest from the database research community. While there have been several distributed systems, distributed querying techniques for keyword search are still limited. This paper proposes a novel distributed keyword search system called $\DKWS$. First, we \revise{present} a {\em monotonic} property with keyword search algorithms that guarantees correct parallelization. Second, we present a keyword search algorithm as monotonic backward and forward search phases. Moreover, we propose new tight bounds for pruning nodes being searched. Third, we propose a {\em notify-push} paradigm and $\PINE$ {\em programming model} of $\DKWS$. The notify-push paradigm allows {\em asynchronously} exchanging the upper bounds of matches across the workers and the coordinator in $\DKWS$. The $\PINE$ programming model naturally fits keyword search algorithms, as they have distinguished phases, to allow {\em preemptive} searches to mitigate staleness in a distributed system. Finally, we investigate the performance and effectiveness of $\DKWS$ through experiments using real-world datasets. We find that $\DKWS$ is up to two orders of magnitude faster than related techniques, and its communication costs are $7.6$ times smaller than those of other techniques.
△ Less
Submitted 9 September, 2023; v1 submitted 3 September, 2023;
originally announced September 2023.
-
On LCP and checkable group codes over finite non-commutative Frobenius rings
Authors:
Sanjit Bhowmick,
Javier de la Cruz,
Edgar Martínez-Moro,
Anuradha Sharma
Abstract:
We provide a simple proof for a complementary pair of group codes over a finite non-commutative Frobenius ring of the fact that one of them is equivalent to the other one. We also explore this fact for checkeable codes over the same type of alphabet.
We provide a simple proof for a complementary pair of group codes over a finite non-commutative Frobenius ring of the fact that one of them is equivalent to the other one. We also explore this fact for checkeable codes over the same type of alphabet.
△ Less
Submitted 13 April, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Efficient Selection of Informative Alternative Relational Query Plans for Database Education
Authors:
Hu Wang,
Hui Li,
Sourav S Bhowmick,
Zihao Ma,
Jiangtao Cui
Abstract:
A key learning goal of learners taking database systems course is to understand how SQL queries are processed in an RDBMS in practice. To this end, comprehension of the cost-based comparison of different plan choices to select the query execution plan (QEP) of a query is paramount. Unfortunately, off-the-shelf RDBMS typically only expose the selected QEP to users without revealing information abou…
▽ More
A key learning goal of learners taking database systems course is to understand how SQL queries are processed in an RDBMS in practice. To this end, comprehension of the cost-based comparison of different plan choices to select the query execution plan (QEP) of a query is paramount. Unfortunately, off-the-shelf RDBMS typically only expose the selected QEP to users without revealing information about representative alternative query plans considered during QEP selection in a learner-friendly manner, hindering the learning process. In this paper, we present a novel end-to-end and generic framework called ARENA that facilitates exploration of informative alternative query plans of a given SQL query to aid the comprehension of QEP selection. Under the hood, ARENA addresses a novel problem called alternative plan selection problem (TIPS) which aims to discover a set of k alternative plans from the underlying plan space so that the plan interestingness of the set is maximized. Specifically, we explore two variants of the problem, namely batch TIPS and incremental TIPS, to cater to diverse set of learners. Due to the computational hardness of the problem, we present a 2 approximation algorithm to address it efficiently. Exhaustive experimental study with real-world learners demonstrates the effectiveness of arena in enhancing learners' understanding of the alternative plan choices considered during QEP selection.
△ Less
Submitted 16 November, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
AUTOSHAPE: An Autoencoder-Shapelet Approach for Time Series Clustering
Authors:
Guozhong Li,
Byron Choi,
Jianliang Xu,
Sourav S Bhowmick,
Daphne Ngar-yin Mah,
Grace Lai-Hung Wong
Abstract:
Time series shapelets are discriminative subsequences that have been recently found effective for time series clustering (TSC). The shapelets are convenient for interpreting the clusters. Thus, the main challenge for TSC is to discover high-quality variable-length shapelets to discriminate different clusters. In this paper, we propose a novel autoencoder-shapelet approach (AUTOSHAPE), which is the…
▽ More
Time series shapelets are discriminative subsequences that have been recently found effective for time series clustering (TSC). The shapelets are convenient for interpreting the clusters. Thus, the main challenge for TSC is to discover high-quality variable-length shapelets to discriminate different clusters. In this paper, we propose a novel autoencoder-shapelet approach (AUTOSHAPE), which is the first study to take the advantage of both autoencoder and shapelet for determining shapelets in an unsupervised manner. An autoencoder is specially designed to learn high-quality shapelets. More specifically, for guiding the latent representation learning, we employ the latest self-supervised loss to learn the unified embeddings for variable-length shapelet candidates (time series subsequences) of different variables, and propose the diversity loss to select the discriminating embeddings in the unified space. We introduce the reconstruction loss to recover shapelets in the original time series space for clustering. Finally, we adopt Davies Bouldin index (DBI) to inform AUTOSHAPE of the clustering performance during learning. We present extensive experiments on AUTOSHAPE. To evaluate the clustering performance on univariate time series (UTS), we compare AUTOSHAPE with 15 representative methods using UCR archive datasets. To study the performance of multivariate time series (MTS), we evaluate AUTOSHAPE on 30 UEA archive datasets with 5 competitive methods. The results validate that AUTOSHAPE is the best among all the methods compared. We interpret clusters with shapelets, and can obtain interesting intuitions about clusters in two UTS case studies and one MTS case study, respectively.
△ Less
Submitted 18 August, 2022; v1 submitted 6 August, 2022;
originally announced August 2022.
-
On the $\ell$-DLIPs of codes over finite commutative rings
Authors:
Sanjit Bhowmick,
Alexandre Fotue Tabue,
Joydeb Pal
Abstract:
Generalizing the linear complementary duals, the linear complementary pairs and the hull of codes, we introduce the concept of $\ell$-dimension linear intersection pairs ($\ell$-DLIPs) of codes over a finite commutative ring $(R)$, for some positive integer $\ell$. In this paper, we study $\ell$-DLIP of codes over $R$ in a very general setting by a uniform method. Besides, we provide a necessary a…
▽ More
Generalizing the linear complementary duals, the linear complementary pairs and the hull of codes, we introduce the concept of $\ell$-dimension linear intersection pairs ($\ell$-DLIPs) of codes over a finite commutative ring $(R)$, for some positive integer $\ell$. In this paper, we study $\ell$-DLIP of codes over $R$ in a very general setting by a uniform method. Besides, we provide a necessary and sufficient condition for the existence of a non-free (or free) $\ell$-DLIP of codes over a finite commutative Frobenius ring. In addition, we obtain a generator set of the intersection of two constacyclic codes over a finite chain ring, which helps us to get an important characterization of $\ell$-DLIP of constacyclic codes. Finally, the $\ell$-DLIP of constacyclic codes over a finite chain ring are used to construct new entanglement-assisted quantum error correcting (EAQEC) codes.
△ Less
Submitted 21 June, 2023; v1 submitted 2 April, 2022;
originally announced April 2022.
-
Boosting Entity Mention Detection for Targetted Twitter Streams with Global Contextual Embeddings
Authors:
Satadisha Saha Bhowmick,
Eduard C. Dragut,
Weiyi Meng
Abstract:
Microblogging sites, like Twitter, have emerged as ubiquitous sources of information. Two important tasks related to the automatic extraction and analysis of information in Microblogs are Entity Mention Detection (EMD) and Entity Detection (ED). The state-of-the-art EMD systems aim to model the non-literary nature of microblog text by training upon offline static datasets. They extract a combinati…
▽ More
Microblogging sites, like Twitter, have emerged as ubiquitous sources of information. Two important tasks related to the automatic extraction and analysis of information in Microblogs are Entity Mention Detection (EMD) and Entity Detection (ED). The state-of-the-art EMD systems aim to model the non-literary nature of microblog text by training upon offline static datasets. They extract a combination of surface-level features -- orthographic, lexical, and semantic -- from individual messages for noisy text modeling and entity extraction. But given the constantly evolving nature of microblog streams, detecting all entity mentions from such varying yet limited context of short messages remains a difficult problem. To this end, we propose a framework named EMD Globalizer, better suited for the execution of EMD learners on microblog streams. It deviates from the processing of isolated microblog messages by existing EMD systems, where learned knowledge from the immediate context of a message is used to suggest entities. After an initial extraction of entity candidates by an EMD system, the proposed framework leverages occurrence mining to find additional candidate mentions that are missed during this first detection. Aggregating the local contextual representations of these mentions, a global embedding is drawn from the collective context of an entity candidate within a stream. The global embeddings are then utilized to separate entities within the candidates from false positives. All mentions of said entities from the stream are produced in the framework's final outputs. Our experiments show that EMD Globalizer can enhance the effectiveness of all existing EMD systems that we tested (on average by 25.61%) with a small additional computational overhead.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Classification and count of binary linear complementary dual group codes
Authors:
Ankan Shaw,
Sanjit Bhowmick,
Satya Bagchi
Abstract:
We establish a complete classification of binary group codes with complementary duals for a finite group and explicitly determine the number of linear complementary dual (LCD) cyclic group codes by using cyclotomic cosets. The dimension and the minimum distance for LCD group codes are explored. Finally, we find a connection between LCD MDS group codes and maximal ideals.
We establish a complete classification of binary group codes with complementary duals for a finite group and explicitly determine the number of linear complementary dual (LCD) cyclic group codes by using cyclotomic cosets. The dimension and the minimum distance for LCD group codes are explored. Finally, we find a connection between LCD MDS group codes and maximal ideals.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
A Class of $(n, k, r, t)_i$ LRCs Via Parity Check Matrix
Authors:
Deep Mukhopadhyay,
Sanjit Bhowmick,
Kalyan Hansda,
Satya Bagchi
Abstract:
A code is called $(n, k, r, t)$ information symbol locally repairable code \big($(n, k, r, t)_i$ LRC\big) if each information coordinate can be achieved by at least $t$ disjoint repair sets, containing at most $r$ other coordinates. This paper considers a class of $(n, k, r, t)_i$ LRCs, where each repair set contains exactly one parity coordinate. We explore the systematic code in terms of the sta…
▽ More
A code is called $(n, k, r, t)$ information symbol locally repairable code \big($(n, k, r, t)_i$ LRC\big) if each information coordinate can be achieved by at least $t$ disjoint repair sets, containing at most $r$ other coordinates. This paper considers a class of $(n, k, r, t)_i$ LRCs, where each repair set contains exactly one parity coordinate. We explore the systematic code in terms of the standard parity check matrix. First, some structural features of the parity check matrix are proposed by showing some connections with the membership matrix and the minimum distance optimality of the code. Next to that, parity check matrix based proofs of various bounds associated with the code are placed. In addition to this, we provide several constructions of optimal $(n, k, r, t)_i$ LRCs, with the help of two Cayley tables of a finite field. Finally, we generalize a result of $q$-ary $(n, k, r)$ LRCs to $q$-ary $(n, k, r, t)$ LRCs.
△ Less
Submitted 24 August, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Linear complementary dual code-based Multi-secret sharing scheme
Authors:
Haradhan Ghosh,
Sanjit Bhowmick,
Pramod Kumar Maurya,
Satya Bagchi
Abstract:
Hiding a secret is needed in many situations. Secret sharing plays an important role in protecting information from getting lost, stolen, or destroyed and has been applicable in recent years. A secret sharing scheme is a cryptographic protocol in which a dealer divides the secret into several pieces of share and one share is given to each participant. To recover the secret, the dealer requires a s…
▽ More
Hiding a secret is needed in many situations. Secret sharing plays an important role in protecting information from getting lost, stolen, or destroyed and has been applicable in recent years. A secret sharing scheme is a cryptographic protocol in which a dealer divides the secret into several pieces of share and one share is given to each participant. To recover the secret, the dealer requires a subset of participants called access structure. In this paper, we present a multi-secret sharing scheme over a local ring based on linear complementary dual codes using Blakley's method. We take a large secret space over a local ring that is greater than other code-based schemes and obtain a perfect and almost ideal scheme.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Data-Driven Theory-guided Learning of Partial Differential Equations using SimultaNeous Basis Function Approximation and Parameter Estimation (SNAPE)
Authors:
Sutanu Bhowmick,
Satish Nagarajaiah
Abstract:
The measured spatiotemporal response of various physical processes is utilized to infer the governing partial differential equations (PDEs). We propose SimultaNeous Basis Function Approximation and Parameter Estimation (SNAPE), a technique of parameter estimation of PDEs that is robust against high levels of noise nearly 100 %, by simultaneously fitting basis functions to the measured response and…
▽ More
The measured spatiotemporal response of various physical processes is utilized to infer the governing partial differential equations (PDEs). We propose SimultaNeous Basis Function Approximation and Parameter Estimation (SNAPE), a technique of parameter estimation of PDEs that is robust against high levels of noise nearly 100 %, by simultaneously fitting basis functions to the measured response and estimating the parameters of both ordinary and partial differential equations. The domain knowledge of the general multidimensional process is used as a constraint in the formulation of the optimization framework. SNAPE not only demonstrates its applicability on various complex dynamic systems that encompass wide scientific domains including Schrödinger equation, chaotic duffing oscillator, and Navier-Stokes equation but also estimates an analytical approximation to the process response. The method systematically combines the knowledge of well-established scientific theories and the concepts of data science to infer the properties of the process from the observed data.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Towards Plug-and-Play Visual Graph Query Interfaces: Data-driven Canned Pattern Selection for Large Networks
Authors:
Zifeng Yuan,
Huey Eng Chua,
Sourav S Bhowmick,
Zekun Ye,
Wook-Shin Han,
Byron Choi
Abstract:
Canned patterns (i.e. small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIs for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned pa…
▽ More
Canned patterns (i.e. small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIs for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned patterns is not only labor intensive but may also lack diversity for supporting efficient visual formulation of a wide range of subgraph queries. In this paper, we present a novel generic and extensible framework called TATTOO that takes a data-driven approach to automatically selecting canned patterns for a GUI from large networks. Specifically, it first decomposes the underlying network into truss-infested and truss-oblivious regions. Then candidate canned patterns capturing different real-world query topologies are generated from these regions. Canned patterns based on a user-specified plug are then selected for the GUI from these candidates by maximizing coverage and diversity, and by minimizing the cognitive load of the pattern set. Experimental studies with real-world datasets demonstrate the benefits of TATTOO. Importantly, this work takes a concrete step towards realizing plug-and-play visual graph query interfaces for large networks.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
From Base Data To Knowledge Discovery -- A Life Cycle Approach -- Using Multilayer Networks
Authors:
Abhishek Santra,
Kanthi Komar,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Any large complex data analysis to infer or discover meaningful information/knowledge involves the following steps (in addition to data collection, cleaning, preparing the data for analysis such as attribute elimination): i) Modeling the data -- an approach for modeling and deriving a data representation for analysis using that approach, ii) translating analysis objectives into computations on the…
▽ More
Any large complex data analysis to infer or discover meaningful information/knowledge involves the following steps (in addition to data collection, cleaning, preparing the data for analysis such as attribute elimination): i) Modeling the data -- an approach for modeling and deriving a data representation for analysis using that approach, ii) translating analysis objectives into computations on the model generated; this can be as simple as a single computation (e.g., community detection) or may involve a sequence of operations (e.g., pair-wise community detection over multiple networks) using expressions based on the model, iii) computation of the expressions generated -- efficiency and scalability come into picture here, and iv) drill-down of results to interpret or understand them clearly. Beyond this, it is also meaningful to visualize results for easier understanding. Covid-19 visualization dashboard presented in this paper is an example of this.
This paper covers all of the above steps of data analysis life cycle using a data representation that is gaining importance for multi-entity, multi-feature data sets - Multilayer Networks. We use several data sets to establish the effectiveness of modeling using MLNs and analyze them using the proposed decoupling approach. For coverage, we use different types of MLNs for modeling, and community and centrality computations for analysis. The data sets used - US commercial airlines, IMDb, DBLP, and Covid-19 data set. Our experimental analyses using the identified steps validate modeling, breadth of objectives that can be computed, and overall versatility of the life cycle approach. Correctness of results is verified, where possible, using independently available ground truth. We demonstrate drill-down that is afforded by this approach (due to structure and semantics preservation) for a better understanding and visualization of results.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Towards Enhancing Database Education: Natural Language Generation Meets Query Execution Plans
Authors:
Weiguo Wang,
Sourav S Bhowmick,
Hui Li,
Shafiq R Joty,
Siyuan Liu,
Peng Chen
Abstract:
The database systems course is offered as part of an undergraduate computer science degree program in many major universities. A key learning goal of learners taking such a course is to understand how SQL queries are processed in a RDBMS in practice. Since a query execution plan (QEP) describes the execution steps of a query, learners can acquire the understanding by perusing the QEPs generated by…
▽ More
The database systems course is offered as part of an undergraduate computer science degree program in many major universities. A key learning goal of learners taking such a course is to understand how SQL queries are processed in a RDBMS in practice. Since a query execution plan (QEP) describes the execution steps of a query, learners can acquire the understanding by perusing the QEPs generated by a RDBMS. Unfortunately, in practice, it is often daunting for a learner to comprehend these QEPs containing vendor-specific implementation details, hindering her learning process. In this paper, we present a novel, end-to-end, generic system called lantern that generates a natural language description of a qep to facilitate understanding of the query execution steps. It takes as input an SQL query and its QEP, and generates a natural language description of the execution strategy deployed by the underlying RDBMS. Specifically, it deploys a declarative framework called pool that enables subject matter experts to efficiently create and maintain natural language descriptions of physical operators used in QEPs. A rule-based framework called RULE-LANTERN is proposed that exploits pool to generate natural language descriptions of QEPs. Despite the high accuracy of RULE-LANTERN, our engagement with learners reveal that, consistent with existing psychology theories, perusing such rule-based descriptions lead to boredom due to repetitive statements across different QEPs. To address this issue, we present a novel deep learning-based language generation framework called NEURAL-LANTERN that infuses language variability in the generated description by exploiting a set of paraphrasing tools and word embedding. Our experimental study with real learners shows the effectiveness of lantern in facilitating comprehension of QEPs.
△ Less
Submitted 2 March, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
PANE: scalable and effective attributed network embedding
Authors:
Renchi Yang,
Jieming Shi,
Xiaokui Xiao,
Yin Yang,
Sourav S. Bhowmick,
Juncheng Liu
Abstract:
Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v in G to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challeng…
▽ More
Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v in G to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets. PANE obtains high scalability and effectiveness through 3 main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. The performance of PANE depends upon the number of attributes in the input network. To handle large networks with numerous attributes, we further extend PANE to PANE++. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE and PANE++ consistently outperform all existing methods in terms of result quality, while being orders of magnitude faster.
△ Less
Submitted 30 March, 2023; v1 submitted 2 September, 2020;
originally announced September 2020.
-
A New Community Definition For MultiLayer Networks And A Novel Approach For Its Efficient Computation
Authors:
Abhishek Santra,
Kanthi Sannappa Komar,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
As the use of MultiLayer Networks (or MLNs) for modeling and analysis is gaining popularity, it is becoming increasingly important to propose a community definition that encompasses the multiple features represented by MLNs and develop algorithms for efficiently computing communities on MLNs. Currently, communities for MLNs, are based on aggregating the networks into single graphs using different…
▽ More
As the use of MultiLayer Networks (or MLNs) for modeling and analysis is gaining popularity, it is becoming increasingly important to propose a community definition that encompasses the multiple features represented by MLNs and develop algorithms for efficiently computing communities on MLNs. Currently, communities for MLNs, are based on aggregating the networks into single graphs using different techniques (type independent, projection-based, etc.) and applying single graph community detection algorithms, such as Louvain and Infomap on these graphs. This process results in different types of information loss (semantics and structure). To the best of our knowledge, in this paper we propose, for the first time, a definition of community for heterogeneous MLNs (or HeMLNs) which preserves semantics as well as the structure. Additionally, our basic definition can be extended to appropriately match the analysis objectives as needed.
In this paper, we present a structure and semantics preserving community definition for HeMLNs that is compatible with and is an extension of the traditional definition for single graphs. We also present a framework for its efficient computation using a newly proposed decoupling approach. First, we define a k-community for connected k layers of a HeMLN. Then we propose a family of algorithms for its computation using the concept of bipartite graph pairings. Further, for a broader analysis, we introduce several pairing algorithms and weight metrics for composing binary HeMLN communities using participating community characteristics. Essentially, this results in an extensible family of community computations. We provide extensive experimental results for showcasing the efficiency and analysis flexibility of the proposed computation using popular IMDb and DBLP data sets.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
An Efficient Secure Dynamic Skyline Query Model
Authors:
Weiguo Wang,
Hui Li,
Yanguo Peng,
Sourav S Bhowmick,
Peng Chen,
Xiaofeng Chen,
Jiangtao Cui
Abstract:
It is now cost-effective to outsource large dataset and perform query over the cloud. However, in this scenario, there exist serious security and privacy issues that sensitive information contained in the dataset can be leaked. The most effective way to address that is to encrypt the data before outsourcing. Nevertheless, it remains a grand challenge to process queries in ciphertext efficiently. I…
▽ More
It is now cost-effective to outsource large dataset and perform query over the cloud. However, in this scenario, there exist serious security and privacy issues that sensitive information contained in the dataset can be leaked. The most effective way to address that is to encrypt the data before outsourcing. Nevertheless, it remains a grand challenge to process queries in ciphertext efficiently. In this work, we shall focus on solving one representative query task, namely dynamic skyline query, in a secure manner over the cloud. However, it is difficult to be performed on encrypted data as its dynamic domination criteria require both subtraction and comparison, which cannot be directly supported by a single encryption scheme efficiently. To this end, we present a novel framework called SCALE. It works by transforming traditional dynamic skyline domination into pure comparisons. The whole process can be completed in single-round interaction between user and the cloud. We theoretically prove that the outsourced database, query requests, and returned results are all kept secret under our model. Moreover, we also present an efficient strategy for dynamic insertion and deletion of stored records. Empirical study over a series of datasets demonstrates that our framework improves the efficiency of query processing by nearly three orders of magnitude compared to the state-of-the-art.
△ Less
Submitted 22 February, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
An Efficient Framework for Computing Structure- And Semantics-Preserving Community in a Heterogeneous Multilayer Network
Authors:
Abhishek Santra,
Kanthi Sannappa Komar,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Multilayer networks or MLNs (also called multiplexes or network of networks) are being used extensively for modeling and analysis of data sets with multiple entity and feature types and associated relationships. Although the concept of community is widely-used for aggregate analysis, a structure- and semantics preserving definition for it is lacking for MLNs. Retention of original MLN structure an…
▽ More
Multilayer networks or MLNs (also called multiplexes or network of networks) are being used extensively for modeling and analysis of data sets with multiple entity and feature types and associated relationships. Although the concept of community is widely-used for aggregate analysis, a structure- and semantics preserving definition for it is lacking for MLNs. Retention of original MLN structure and entity relationships is important for detailed drill-down analysis. In addition, efficient computation is also critical for large number of analysis.
In this paper, we introduce a structure-preserving community definition for MLNs as well as a framework for its efficient computation using the decoupling approach. The proposed decoupling approach combines communities from individual layers to form a serial k-community for connected k layers in a MLN. We propose a new algorithm for pairing communities across layers and introduce several weight metrics for composing communities from two layers using participating community characteristics. In addition to the definition, our proposed approach has a number of desired characteristics. It: i) leverages extant single graph community detection algorithms, ii) introduces several weight metrics that are customized for the community concept, iii) is a new algorithm for pairing communities using bipartite graphs, and iv) experimentally validates the community computation and its efficiency on widely-used IMDb and DBLP data sets.
△ Less
Submitted 7 September, 2019;
originally announced October 2019.
-
Efficient Community Detection in Boolean Composed Multiplex Networks
Authors:
Abhishek Santra,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Networks (or graphs) are used to model the dyadic relations between entities in a complex system. In cases where there exists multiple relations between the entities, the complex system can be represented as a multilayer network, where the network in each layer represents one particular relation (or feature). The analysis of multilayer networks involves combining edges from specific layers and the…
▽ More
Networks (or graphs) are used to model the dyadic relations between entities in a complex system. In cases where there exists multiple relations between the entities, the complex system can be represented as a multilayer network, where the network in each layer represents one particular relation (or feature). The analysis of multilayer networks involves combining edges from specific layers and then computing a network property.
Different subsets of the layers can be combined. For any Boolean combination operation (e.g. AND, OR), the number of possible subsets is exponential to the number of layers. Thus recomputing for each subset from scratch is an expensive process. In this paper, we propose to efficiently analyze multilayer networks using a method that we term network decomposition.
Network decomposition is based on analyzing each network layer individually and then aggregating the analysis results. We demonstrate the effectiveness of using network decomposition for detecting communities on different combinations of network layers. Our results on multilayer networks obtained from real-world and synthetic datasets show that our proposed network decomposition method requires significantly lower computation time while producing results of high accuracy.
△ Less
Submitted 7 September, 2019;
originally announced October 2019.
-
Making a Case for MLNs for Data-Driven Analysis: Modeling, Efficiency, and Versatility
Authors:
Abhishek Santra,
Kanthi Sannappa Komar,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Datasets of real-world applications are characterized by entities of different types, which are defined by multiple features and connected via varied types of relationships. A critical challenge for these datasets is developing models and computations to support flexible analysis, i.e., the ability to compute varied types of analysis objectives in an efficient manner.
To address this problem, in…
▽ More
Datasets of real-world applications are characterized by entities of different types, which are defined by multiple features and connected via varied types of relationships. A critical challenge for these datasets is developing models and computations to support flexible analysis, i.e., the ability to compute varied types of analysis objectives in an efficient manner.
To address this problem, in this paper, we make a case for modeling such complex data sets as multilayer networks (or MLNs), and argue that MLNs provide a more informative model than the currently popular simple and attribute graphs. Through analyzing communities and hubs on homogeneous and heterogeneous MLNs, we demonstrate the flexibility of the chosen model. We also show that compared to current analysis approaches, a network decoupling-based analysis of MLNs is more efficient and also preserves the structure and result semantics.
We use three diverse data sets to showcase the effectiveness of modeling them as MLNs and analyzing them using the decoupling-based approach. We use both homogeneous and heterogeneous MLNs for modeling and community and hub computations for analysis. The data sets are from US commercial airlines and IMDb, a large international movie data set. Our experimental analysis validate modeling, efficiency of computation, and versatility of the approach. Correctness of results are verified using independently available ground truth. For the data sets used, efficiency improvement is in the range of 64% to 98%.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
DISCO: Influence Maximization Meets Network Embedding and Deep Learning
Authors:
Hui Li,
Mengting Xu,
Sourav S Bhowmick,
Changsheng Sun,
Zhongyuan Jiang,
Jiangtao Cui
Abstract:
Since its introduction in 2003, the influence maximization (IM) problem has drawn significant research attention in the literature. The aim of IM is to select a set of k users who can influence the most individuals in the social network. The problem is proven to be NP-hard. A large number of approximate algorithms have been proposed to address this problem. The state-of-the-art algorithms estimate…
▽ More
Since its introduction in 2003, the influence maximization (IM) problem has drawn significant research attention in the literature. The aim of IM is to select a set of k users who can influence the most individuals in the social network. The problem is proven to be NP-hard. A large number of approximate algorithms have been proposed to address this problem. The state-of-the-art algorithms estimate the expected influence of nodes based on sampled diffusion paths. As the number of required samples have been recently proven to be lower bounded by a particular threshold that presets tradeoff between the accuracy and efficiency, the result quality of these traditional solutions is hard to be further improved without sacrificing efficiency. In this paper, we present an orthogonal and novel paradigm to address the IM problem by leveraging deep learning models to estimate the expected influence. Specifically, we present a novel framework called DISCO that incorporates network embedding and deep reinforcement learning techniques to address this problem. Experimental study on real-world networks demonstrates that DISCO achieves the best performance w.r.t efficiency and influence spread quality compared to state-of-the-art classical solutions. Besides, we also show that the learning model exhibits good generality.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank
Authors:
Renchi Yang,
Jieming Shi,
Xiaokui Xiao,
Yin Yang,
Sourav S. Bhowmick
Abstract:
Given an input graph G and a node v in G, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility. Our proposed soluti…
▽ More
Given an input graph G and a node v in G, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility. Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility. The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 7 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.
△ Less
Submitted 23 June, 2020; v1 submitted 16 June, 2019;
originally announced June 2019.
-
A Hierarchical Network for Diverse Trajectory Proposals
Authors:
Sriram N. N.,
Gourav Kumar,
Abhay Singh,
M. Siva Karthik,
Saket Saurav Brojeshwar Bhowmick,
K. Madhava Krishna
Abstract:
Autonomous explorative robots frequently encounter scenarios where multiple future trajectories can be pursued. Often these are cases with multiple paths around an obstacle or trajectory options towards various frontiers. Humans in such situations can inherently perceive and reason about the surrounding environment to identify several possibilities of either manoeuvring around the obstacles or mov…
▽ More
Autonomous explorative robots frequently encounter scenarios where multiple future trajectories can be pursued. Often these are cases with multiple paths around an obstacle or trajectory options towards various frontiers. Humans in such situations can inherently perceive and reason about the surrounding environment to identify several possibilities of either manoeuvring around the obstacles or moving towards various frontiers. In this work, we propose a 2 stage Convolutional Neural Network architecture which mimics such an ability to map the perceived surroundings to multiple trajectories that a robot can choose to traverse. The first stage is a Trajectory Proposal Network which suggests diverse regions in the environment which can be occupied in the future. The second stage is a Trajectory Sampling network which provides a finegrained trajectory over the regions proposed by Trajectory Proposal Network. We evaluate our framework in diverse and complicated real life settings. For the outdoor case, we use the KITTI dataset and our own outdoor driving dataset. In the indoor setting, we use an autonomous drone to navigate various scenarios and also a ground robot which can explore the environment using the trajectories proposed by our framework. Our experiments suggest that the framework is able to develop a semantic understanding of the obstacles, open regions and identify diverse trajectories that a robot can traverse. Our comparisons portray the performance gain of the proposed architecture over a diverse set of methods against which it is compared.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
Efficient Estimation of Heat Kernel PageRank for Local Clustering
Authors:
Renchi Yang,
Xiaokui Xiao,
Zhewei Wei,
Sourav S Bhowmick,
Jun Zhao,
Rong-Hua Li
Abstract:
Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem…
▽ More
Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem and found to be more efficient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs.
In this paper, we present TEA and TEA+, two novel local graph clustering algorithms based on HKPR, to address the aforementioned limitations. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit Monte-Carlo random walks to refine the results in an optimized and non-trivial way. In particular, TEA+ offers practical efficiency and effectiveness due to non-trivial optimizations. Extensive experiments on real-world datasets demonstrate that TEA+ outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets in terms of computational time when achieving the same clustering quality, and in particular, is an order of magnitude faster on large graphs including the widely studied Twitter and Friendster datasets.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Structure-Preserving Community In A Multilayer Network: Definition, Detection, And Analysis
Authors:
Abhishek Santra,
Kanthi Sannappa Komar,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Multilayer networks or MLNs (also called multiplexes or network of networks) are being used extensively for modeling and analysis of data sets with multiple entity and feature types as well as their relationships. As the concept of communities and hubs are used for these analysis, a structure-preserving definition for them on MLNs (that retains the original MLN structure and node/edge labels and t…
▽ More
Multilayer networks or MLNs (also called multiplexes or network of networks) are being used extensively for modeling and analysis of data sets with multiple entity and feature types as well as their relationships. As the concept of communities and hubs are used for these analysis, a structure-preserving definition for them on MLNs (that retains the original MLN structure and node/edge labels and types) and its efficient detection are critical. There is no structure-preserving definition of a community for a MLN as most of the current analyses aggregate a MLN to a single graph. Although there is consensus on community definition for single graphs (and detection packages) and to a lesser extent for homogeneous MLNs, it is lacking for heterogeneous MLNs. In this paper, we not only provide a structure-preserving definition for the first time, but also its efficient computation using a decoupling approach, and discuss its characteristics & significance for analysis. The proposed decoupling approach for efficiency combines communities from individual layers to form a serial k-community for connected k layers in a MLN. We propose several weight metrics for composing layer-wise communities using the bipartite graph match approach based on the analysis semantics. Our proposed approach has a number of advantages. It: i) leverages extant single graph community detection algorithms, ii) is based on the widely-used maximal flow bipartite graph matching for composing k layers, iii) introduces several weight metrics that are customized for the community concept, and iv) experimentally validates the definition, mapping, and efficiency from a flexible analysis perspective on widely-used IMDb data set.
Keywords: Heterogeneous Multilayer Networks; Bipartite Graphs; Community Definition and Detection; Decoupling-Based Composition
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
Do non-free LCD codes over finite commutative Frobenius rings exist?
Authors:
Sanjit Bhowmick,
Alexandre Fotue-Tabue,
Edgar Martínez-Moro,
Ramakrishna Bandi,
Satya Bagchi
Abstract:
In this paper, we clarify some aspects on LCD codes in the literature. We first prove that a non-free LCD code does not exist over finite commutative Frobenius local rings. We then obtain a necessary and sufficient condition for the existence of LCD code over finite commutative Frobenius rings. We later show that a free constacyclic code over finite chain ring is LCD if and only if it is reversibl…
▽ More
In this paper, we clarify some aspects on LCD codes in the literature. We first prove that a non-free LCD code does not exist over finite commutative Frobenius local rings. We then obtain a necessary and sufficient condition for the existence of LCD code over finite commutative Frobenius rings. We later show that a free constacyclic code over finite chain ring is LCD if and only if it is reversible, and also provide a necessary and sufficient condition for a constacyclic code to be reversible over finite chain rings. We illustrate the minimum Lee-distance of LCD codes over some finite commutative chain rings and demonstrate the results with examples. We also got some new optimal $\mathbb{Z}_4$ codes of different lengths {which are} cyclic LCD codes over $\mathbb{Z}_4$.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Improving Landmark Recognition using Saliency detection and Feature classification
Authors:
Akash Kumar,
Sagnik Bhowmick,
N. Jayanthi,
S. Indu
Abstract:
Image Landmark Recognition has been one of the most sought-after classification challenges in the field of vision and perception. After so many years of generic classification of buildings and monuments from images, people are now focussing upon fine-grained problems - recognizing the category of each building or monument. We proposed an ensemble network for the purpose of classification of Indian…
▽ More
Image Landmark Recognition has been one of the most sought-after classification challenges in the field of vision and perception. After so many years of generic classification of buildings and monuments from images, people are now focussing upon fine-grained problems - recognizing the category of each building or monument. We proposed an ensemble network for the purpose of classification of Indian Landmark Images. To this end, our method gives robust classification by ensembling the predictions from Graph-Based Visual Saliency (GBVS) network alongwith supervised feature-based classification algorithms such as kNN and Random Forest. The final architecture is an adaptive learning of all the mentioned networks. The proposed network produces a reliable score to eliminate false category cases. Evaluation of our model was done on a new dataset, which involves challenges such as landmark clutter, variable scaling, partial occlusion, etc.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
On Rich Clubs of Path-Based Centralities in Networks
Authors:
Soumya Sarkar,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
Many scale-free networks exhibit a rich club structure, where high degree vertices form tightly interconnected subgraphs. In this paper, we explore the emergence of rich clubs in the context of shortest path based centrality metrics. We term these subgraphs of connected high closeness or high betweeness vertices as rich centrality clubs (RCC).
Our experiments on real world and synthetic networks…
▽ More
Many scale-free networks exhibit a rich club structure, where high degree vertices form tightly interconnected subgraphs. In this paper, we explore the emergence of rich clubs in the context of shortest path based centrality metrics. We term these subgraphs of connected high closeness or high betweeness vertices as rich centrality clubs (RCC).
Our experiments on real world and synthetic networks highlight the inter-relations between RCCs, expander graphs, and the core-periphery structure of the network. We show empirically and theoretically that RCCs exist, if the core-periphery structure of the network is such that each shell is an expander graph, and their density decreases from inner to outer shells.
The main contributions of our paper are: (i) we demonstrate that the formation of RCC is related to the core-periphery structure and particularly the expander like properties of each shell, (ii) we show that the RCC property can be used to find effective seed nodes for spreading information and for improving the resilience of the network under perturbation and, finally, (iii) we present a modification algorithm that can insert RCC within networks, while not affecting their other structural properties. Taken together, these contributions present one of the first comprehensive studies of the properties and applications of rich clubs for path based centralities.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
Self-dual cyclic codes over $M_2(\mathbb{Z}_4)$
Authors:
Sanjit Bhowmick,
Satya Bagchi,
Ramakrishna Bandi
Abstract:
In this paper, we study the codes over the matrix ring over $\mathbb{Z}_4$, which is perhaps the first time the ring structure $M_2(\mathbb{Z}_4)$ is considered as a code alphabet. This ring is isomorphic to $\mathbb{Z}_4[w]+U\mathbb{Z}_4[w]$, where $w$ is a root of the irreducible polynomial $x^2+x+1 \in \mathbb{Z}_2[x]$ and $U\equiv$ ${11}\choose{11}$. We first discuss the structure of the ring…
▽ More
In this paper, we study the codes over the matrix ring over $\mathbb{Z}_4$, which is perhaps the first time the ring structure $M_2(\mathbb{Z}_4)$ is considered as a code alphabet. This ring is isomorphic to $\mathbb{Z}_4[w]+U\mathbb{Z}_4[w]$, where $w$ is a root of the irreducible polynomial $x^2+x+1 \in \mathbb{Z}_2[x]$ and $U\equiv$ ${11}\choose{11}$. We first discuss the structure of the ring $M_2(\mathbb{Z}_4)$ and then focus on algebraic structure of cyclic codes and self-dual cyclic codes over $M_2(\mathbb{Z}_4)$. We obtain the generators of the cyclic codes and their dual codes. Few examples are given at the end of the paper.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Using Core-Periphery Structure to Predict High Centrality Nodes in Time-Varying Networks
Authors:
Soumya Sarkar,
Sandipan Sikdar,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several larg…
▽ More
Vertices with high betweenness and closeness centrality represent influential entities in a network. An important problem for time varying networks is to know a-priori, using minimal computation, whether the influential vertices of the current time step will retain their high centrality, in the future time steps, as the network evolves. In this paper, based on empirical evidences from several large real world time varying networks, we discover a certain class of networks where the highly central vertices are part of the innermost core of the network and this property is maintained over time. As a key contribution of this work, we propose novel heuristics to identify these networks in an optimal fashion and also develop a two-step algorithm for predicting high centrality vertices. Consequently, we show for the first time that for such networks, expensive shortest path computations in each time step as the network changes can be completely avoided; instead we can use time series models (e.g., ARIMA as used here) to predict the overlap between the high centrality vertices in the current time step to the ones in the future time steps. Moreover, once the new network is available in time, we can find the high centrality vertices in the top core simply based on their high degree.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
NEURON: Query Optimization Meets Natural Language Processing For Augmenting Database Education
Authors:
Siyuan Liu,
Sourav S Bhowmick,
Wanlu Zhang,
Shu Wang,
Wanyi Huang,
Shafiq Joty
Abstract:
Relational database management system (RDBMS) is a major undergraduate course taught in many universities worldwide as part of their computer science program. A core component of such course is the design and implementation of the query optimizer in a RDBMS. The goal of the query optimizer is to automatically identify the most efficient execution strategies for executing the declarative SQL querie…
▽ More
Relational database management system (RDBMS) is a major undergraduate course taught in many universities worldwide as part of their computer science program. A core component of such course is the design and implementation of the query optimizer in a RDBMS. The goal of the query optimizer is to automatically identify the most efficient execution strategies for executing the declarative SQL queries submitted by users. The query optimization process produces a query execution plan (QEP) which represents an execution strategy for the query. Due to the complexity of the underlying query optimizer, comprehension of a QEP demands that a student is knowledgeable of implementation-specific issues related to the RDBMS. In practice, this is an unrealistic assumption to make as most students are learning database technology for the first time. Hence, it is often difficult for them to comprehend the query execution strategy undertaken by a DBMS by perusing the QEP, hindering their learning process. In this demonstration, we present a novel system called NEURON that facilitates natural language interaction with QEPs to enhance its understanding. NEURON accepts a SQL query (which may include joins, aggregation, nesting, among other things) as input, executes it, and generates a simplified natural language-based description (both in text and voice form) of the execution strategy deployed by the underlying RDBMS. Furthermore, it facilitates understanding of various features related to the QEP through a natural language-based question answering framework. We advocate that such tool, world's first of its kind, can greatly enhance students' learning of the query optimization topic.
△ Less
Submitted 20 August, 2018; v1 submitted 15 May, 2018;
originally announced May 2018.
-
Visual Based Navigation of Mobile Robots
Authors:
Shailja,
Soumabh Bhowmick,
Jayanta Mukhopadhyay
Abstract:
We have developed an algorithm to generate a complete map of the traversable region for a personal assistant robot using monocular vision only. Using multiple taken by a simple webcam, obstacle detection and avoidance algorithms have been developed. Simple Linear Iterative Clustering (SLIC) has been used for segmentation to reduce the memory and computation cost. A simple mapping technique using i…
▽ More
We have developed an algorithm to generate a complete map of the traversable region for a personal assistant robot using monocular vision only. Using multiple taken by a simple webcam, obstacle detection and avoidance algorithms have been developed. Simple Linear Iterative Clustering (SLIC) has been used for segmentation to reduce the memory and computation cost. A simple mapping technique using inverse perspective mapping and occupancy grids, which is robust, and supports very fast updates has been used to create the map for indoor navigation.
△ Less
Submitted 14 December, 2017;
originally announced December 2017.
-
Capacitated Covering Problems in Geometric Spaces
Authors:
Sayan Bandyapadhyay,
Santanu Bhowmick,
Tanmay Inamdar,
Kasturi Varadarajan
Abstract:
In this article, we consider the following capacitated covering problem. We are given a set $P$ of $n$ points and a set $\mathcal{B}$ of balls from some metric space, and a positive integer $U$ that represents the capacity of each of the balls in $\mathcal{B}$. We would like to compute a subset $\mathcal{B}' \subseteq \mathcal{B}$ of balls and assign each point in $P$ to some ball in…
▽ More
In this article, we consider the following capacitated covering problem. We are given a set $P$ of $n$ points and a set $\mathcal{B}$ of balls from some metric space, and a positive integer $U$ that represents the capacity of each of the balls in $\mathcal{B}$. We would like to compute a subset $\mathcal{B}' \subseteq \mathcal{B}$ of balls and assign each point in $P$ to some ball in $\mathcal{B}$ that contains it, such that the number of points assigned to any ball is at most $U$. The objective function that we would like to minimize is the cardinality of $\mathcal{B}$.
We consider this problem in arbitrary metric spaces as well as Euclidean spaces of constant dimension. In the metric setting, even the uncapacitated version of the problem is hard to approximate to within a logarithmic factor. In the Euclidean setting, the best known approximation guarantee in dimensions $3$ and higher is logarithmic in the number of points. Thus we focus on obtaining "bi-criteria" approximations. In particular, we are allowed to expand the balls in our solution by some factor, but optimal solutions do not have that flexibility. Our main result is that allowing constant factor expansion of the input balls suffices to obtain constant approximations for these problems. In fact, in the Euclidean setting, only $(1+ε)$ factor expansion is sufficient for any $ε> 0$, with the approximation factor being a polynomial in $1/ε$. We obtain these results using a unified scheme for rounding the natural LP relaxation; this scheme may be useful for other capacitated covering problems. We also complement these bi-criteria approximations by obtaining hardness of approximation results that shed light on our understanding of these problems.
△ Less
Submitted 12 December, 2017; v1 submitted 17 July, 2017;
originally announced July 2017.
-
Time is What Prevents Everything from Happening at Once: Propagation Time-conscious Influence Maximization
Authors:
Hui Li,
Sourav S Bhowmick,
Jiangtao Cui,
Jianfeng Ma
Abstract:
The influence maximization (IM) problem as defined in the seminal paper by Kempe et al. has received widespread attention from various research communities, leading to the design of a wide variety of solutions. Unfortunately, this classical IM problem ignores the fact that time taken for influence propagation to reach the largest scope can be significant in realworld social networks, during which…
▽ More
The influence maximization (IM) problem as defined in the seminal paper by Kempe et al. has received widespread attention from various research communities, leading to the design of a wide variety of solutions. Unfortunately, this classical IM problem ignores the fact that time taken for influence propagation to reach the largest scope can be significant in realworld social networks, during which the underlying network itself may have evolved. This phenomenon may have considerable adverse impact on the quality of selected seeds and as a result all existing techniques that use this classical definition as their building block generate seeds with suboptimal influence spread. In this paper, we revisit the classical IM problem and propose a more realistic version called PROTEUS-IM (Propagation Time conscious Influence Maximization) to replace it by addressing the aforementioned limitation. Specifically, as influence propagation may take time, we assume that the underlying social network may evolve during influence propagation. Consequently, PROTEUSIM aims to select seeds in the current network to maximize influence spread in the future instance of the network at the end of influence propagation process without assuming complete topological knowledge of the future network. We propose a greedy and a Reverse Reachable (RR) set-based algorithms called PROTEUS-GENIE and PROTEUS-SEER, respectively, to address this problem. Our algorithms utilize the state-of-the-art Forest Fire Model for modeling network evolution during influence propagation to find superior quality seeds. Experimental study on real and synthetic social networks shows that our proposed algorithms consistently outperform state-of-the-art classical IM algorithms with respect to seed set quality.
△ Less
Submitted 27 September, 2017; v1 submitted 31 May, 2017;
originally announced May 2017.
-
Scalable Holistic Analysis of Multi-Source, Data-Intensive Problems Using Multilayered Networks
Authors:
Abhishek Santra,
Sanjukta Bhowmick,
Sharma Chakravarthy
Abstract:
Holistic analysis of many real-world problems are based on data collected from multiple sources contributing to some aspect of that problem. The word fusion has also been used in the literature for such problems involving disparate data types. Holistically understanding traffic patterns, causes of accidents, bombings, terrorist planning and many natural phenomenon such as storms, earthquakes fall…
▽ More
Holistic analysis of many real-world problems are based on data collected from multiple sources contributing to some aspect of that problem. The word fusion has also been used in the literature for such problems involving disparate data types. Holistically understanding traffic patterns, causes of accidents, bombings, terrorist planning and many natural phenomenon such as storms, earthquakes fall into this category. Some may have real-time requirements and some may need to be analyzed after the fact (post-mortem or forensic analysis.) What is common for all these problems is that the amount and types of data associated with the event. Data may also be incomplete and trustworthiness of sources may also vary. Currently, manual and ad-hoc approaches are used in aggregating data in different ways for analyzing and understanding these problems.
In this paper, we approach this problem in a novel way using multilayered networks. We identify features of a central event and propose a network layer for each feature. This approach allows us to study the effect of each feature independently and its impact on the event. We also establish that the proposed approach allows us to compose these features in arbitrary ways (without loss of information) to analyze their combined effect. Additionally, formulation of relationships (e.g., distance measure for a single feature instead of several at the same time) is simpler. Further, computations can be done once on each layer in this approach and reused for mixing and matching the features for aggregate impacts and "what if" scenarios to understand the problem holistically. This has been demonstrated by recreating the communities for the AND-Composed network by using the communities of the individual layers.
We believe that techniques proposed here make an important contribution to the nascent yet fast growing area of data fusion.
△ Less
Submitted 4 November, 2016;
originally announced November 2016.
-
Understanding Stability of Noisy Networks through Centrality Measures and Local Connections
Authors:
Vladimir Ufimtsev,
Soumya Sarkar,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
Networks created from real-world data contain some inaccuracies or noise, manifested as small changes in the network structure. An important question is whether these small changes can significantly affect the analysis results. In this paper, we study the effect of noise in changing ranks of the high centrality vertices. We compare, using the Jaccard Index (JI), how many of the top-k high centrali…
▽ More
Networks created from real-world data contain some inaccuracies or noise, manifested as small changes in the network structure. An important question is whether these small changes can significantly affect the analysis results. In this paper, we study the effect of noise in changing ranks of the high centrality vertices. We compare, using the Jaccard Index (JI), how many of the top-k high centrality nodes from the original network are also part of the top-k ranked nodes from the noisy network. We deem a network as stable if the JI value is high. We observe two features that affect the stability. First, the stability is dependent on the number of top-ranked vertices considered. When the vertices are ordered according to their centrality values, they group into clusters. Perturbations to the network can change the relative ranking within the cluster, but vertices rarely move from one cluster to another. Second, the stability is dependent on the local connections of the high ranking vertices. The network is highly stable if the high ranking vertices are connected to each other. Our findings show that the stability of a network is affected by the local properties of high centrality vertices, rather than the global properties of the entire network. Based on these local properties we can identify the stability of a network, without explicitly applying a noise model.
△ Less
Submitted 17 September, 2016;
originally announced September 2016.
-
Sensitivity and Reliability in Incomplete Networks: Centrality Metrics to Community Scoring Functions
Authors:
Soumya Sarkar,
Sanjukta Bhowmick,
Suhansanu Kumar,
Animesh Mukherjee
Abstract:
Network analysis is an important tool in understanding the behavior of complex systems of interacting entities. However, due to the limitations of data gathering technologies, some interactions might be missing from the network model. This is a ubiquitous problem in all domains that use network analysis, from social networks to hyper-linked web networks to biological networks. Consequently, an imp…
▽ More
Network analysis is an important tool in understanding the behavior of complex systems of interacting entities. However, due to the limitations of data gathering technologies, some interactions might be missing from the network model. This is a ubiquitous problem in all domains that use network analysis, from social networks to hyper-linked web networks to biological networks. Consequently, an important question in analyzing networks is to understand how increasing the noise level (i.e. percentage of missing edges) affects different network parameters.
In this paper we evaluate the effect of noise on community scoring and centrality-based parameters with respect to two different aspects of network analysis: (i) sensitivity, that is how the parameter value changes as edges are removed and (ii) reliability in the context of message spreading, that is how the time taken to broadcast a message changes as edges are removed.
Our experiments on synthetic and real-world networks and three different noise models demonstrate that for both the aspects over all networks and all noise models, permanence qualifies as the most effective metric. For the sensitivity experiments closeness centrality is a close second. For the message spreading experiments, closeness and betweenness centrality based initiator selection closely competes with permanence. This is because permanence has a dual characteristic where the cumulative permanence over all vertices is sensitive to noise but the ids of the top-rank vertices, which are used to find seeds during message spreading remain relatively stable under noise.
△ Less
Submitted 24 August, 2016; v1 submitted 18 August, 2016;
originally announced August 2016.
-
Permanence and Community Structure in Complex Networks
Authors:
Tanmoy Chakraborty,
Sriram Srinivasan,
Niloy Ganguly,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
The goal of community detection algorithms is to identify densely-connected units within large networks. An implicit assumption is that all the constituent nodes belong equally to their associated community. However, some nodes are more important in the community than others. To date, efforts have been primarily driven to identify communities as a whole, rather than understanding to what extent an…
▽ More
The goal of community detection algorithms is to identify densely-connected units within large networks. An implicit assumption is that all the constituent nodes belong equally to their associated community. However, some nodes are more important in the community than others. To date, efforts have been primarily driven to identify communities as a whole, rather than understanding to what extent an individual node belongs to its community. Therefore, most metrics for evaluating communities, for example modularity, are global. These metrics produce a score for each community, not for each individual node. In this paper, we argue that the belongingness of nodes in a community is not uniform.
The central idea of permanence is based on the observation that the strength of membership of a vertex to a community depends upon two factors: (i) the the extent of connections of the vertex within its community versus outside its community, and (ii) how tightly the vertex is connected internally. We discuss how permanence can help us understand and utilize the structure and evolution of communities by demonstrating that it can be used to -- (i) measure the persistence of a vertex in a community, (ii) design strategies to strengthen the community structure, (iii) explore the core-periphery structure within a community, and (iv) select suitable initiators for message spreading.
We demonstrate that the process of maximizing permanence produces meaningful communities that concur with the ground-truth community structure of the networks more accurately than eight other popular community detection algorithms. Finally, we show that the communities obtained by this method are (i) less affected by the changes in vertex-ordering, and (ii) more resilient to resolution limit, degeneracy of solutions and asymptotic growth of values.
△ Less
Submitted 5 June, 2016;
originally announced June 2016.
-
GenPerm: A Unified Method for Detecting Non-overlapping and Overlapping Communities
Authors:
Tanmoy Chakraborty,
Suhansanu Kumar,
Niloy Ganguly,
Animesh Mukherjee,
Sanjukta Bhowmick
Abstract:
Detection of non-overlapping and overlapping communities are essentially the same problem. However, current algorithms focus either on finding overlapping or non-overlapping communities. We present a generalized framework that can identify both non-overlapping and overlapping communities, without any prior input about the network or its community distribution. To do so, we introduce a vertex-based…
▽ More
Detection of non-overlapping and overlapping communities are essentially the same problem. However, current algorithms focus either on finding overlapping or non-overlapping communities. We present a generalized framework that can identify both non-overlapping and overlapping communities, without any prior input about the network or its community distribution. To do so, we introduce a vertex-based metric, GenPerm, that quantifies by how much a vertex belongs to each of its constituent communities. Our community detection algorithm is based on maximizing the GenPerm over all the vertices in the network. We demonstrate, through experiments over synthetic and real-world networks, that GenPerm is more effective than other metrics in evaluating community structure. Further, we show that due to its vertex-centric property, GenPerm can be used to unfold several inferences beyond community detection, such as core-periphery analysis and message spreading. Our algorithm for maximizing GenPerm outperforms six state-of-the-art algorithms in accurately predicting the ground-truth labels. Finally, we discuss the problem of resolution limit in overlapping communities and demonstrate that maximizing GenPerm can mitigate this problem.
△ Less
Submitted 12 April, 2016;
originally announced April 2016.