-
Local Distance-Preserving Node Embeddings and Their Performance on Random Graphs
Authors:
My Le,
Luana Ruiz,
Souvik Dhara
Abstract:
Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as la…
▽ More
Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes (i.e., landmarks). Our main theoretical contribution shows that random graphs, such as Erdős-Rényi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger networks, offering a scalable alternative for graph representation learning.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Artificially Generated Visual Scanpath Improves Multi-label Thoracic Disease Classification in Chest X-Ray Images
Authors:
Ashish Verma,
Aupendu Kar,
Krishnendu Ghosh,
Sobhan Kanti Dhara,
Debashis Sen,
Prabir Kumar Biswas
Abstract:
Expert radiologists visually scan Chest X-Ray (CXR) images, sequentially fixating on anatomical structures to perform disease diagnosis. An automatic multi-label classifier of diseases in CXR images can benefit by incorporating aspects of the radiologists' approach. Recorded visual scanpaths of radiologists on CXR images can be used for the said purpose. But, such scanpaths are not available for m…
▽ More
Expert radiologists visually scan Chest X-Ray (CXR) images, sequentially fixating on anatomical structures to perform disease diagnosis. An automatic multi-label classifier of diseases in CXR images can benefit by incorporating aspects of the radiologists' approach. Recorded visual scanpaths of radiologists on CXR images can be used for the said purpose. But, such scanpaths are not available for most CXR images, which creates a gap even for modern deep learning based classifiers. This paper proposes to mitigate this gap by generating effective artificial visual scanpaths using a visual scanpath prediction model for CXR images. Further, a multi-class multi-label classifier framework is proposed that uses a generated scanpath and visual image features to classify diseases in CXR images. While the scanpath predictor is based on a recurrent neural network, the multi-label classifier involves a novel iterative sequential model with an attention module. We show that our scanpath predictor generates human-like visual scanpaths. We also demonstrate that the use of artificial visual scanpaths improves multi-class multi-label disease classification results on CXR images. The above observations are made from experiments involving around 0.2 million CXR images from 2 widely-used datasets considering the multi-label classification of 14 pathological findings. Code link: https://github.com/ashishverma03/SDC
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement
Authors:
Aupendu Kar,
Sobhan K. Dhara,
Debashis Sen,
Prabir K. Biswas
Abstract:
Real-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required deg…
▽ More
Real-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required degrees of enhancement at the input image pixels, which are learned from the unpaired low-lit and well-lit images without any direct supervision. The self-supervision is based on a controlled transformation of the input image and subsequent maintenance of its enhancement in spite of the transformation. The self-conditioning performs training of the model on unpaired images such that it does not enhance an already-enhanced image or a well-lit input image. The inherent noise in the input low-light images is handled by employing low gradient magnitude suppression in a detail-preserving manner. In addition, our noise handling is self-conditioned by preventing the denoising of noise-free well-lit images. The training based on low-light image enhancement-specific attributes allows our model to avoid paired supervision without compromising significantly in performance. While our proposed self-supervision aids consistent enhancement, our novel self-conditioning facilitates adequate enhancement. Extensive experiments on multiple standard datasets demonstrate that our model, in general, outperforms the state-of-the-art both quantitatively and subjectively. Ablation studies show the effectiveness of our self-supervision and self-conditioning strategies, and the related loss functions.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Analyzing Different Expert-Opined Strategies to Enhance the Effect on the Goal of a Multi-Attribute Decision-Making System Using a Concept of Effort Propagation and Application in Enhancement of High School Students' Performance
Authors:
Suvojit Dhara,
Adrijit Goswami
Abstract:
In many real-world multi-attribute decision-making (MADM) problems, mining the inter-relationships and possible hierarchical structures among the factors are considered to be one of the primary tasks. But, besides that, one major task is to determine an optimal strategy to work on the factors to enhance the effect on the goal attribute. This paper proposes two such strategies, namely parallel and…
▽ More
In many real-world multi-attribute decision-making (MADM) problems, mining the inter-relationships and possible hierarchical structures among the factors are considered to be one of the primary tasks. But, besides that, one major task is to determine an optimal strategy to work on the factors to enhance the effect on the goal attribute. This paper proposes two such strategies, namely parallel and hierarchical effort assignment, and propagation strategies. The concept of effort propagation through a strategy is formally defined and described in the paper. Both the parallel and hierarchical strategies are divided into sub-strategies based on whether the assignment of efforts to the factors is uniform or depends upon some appropriate heuristics related to the factors in the system. The adapted and discussed heuristics are the relative significance and effort propagability of the factors. The strategies are analyzed for a real-life case study regarding Indian high school administrative factors that play an important role in enhancing students' performance. Total effort propagation of around 7%-15% to the goal is seen across the proposed strategies given a total of 1 unit of effort to the directly accessible factors of the system. A comparative analysis is adapted to determine the optimal strategy among the proposed ones to enhance student performance most effectively. The highest effort propagation achieved in the work is approximately 14.4348%. The analysis in the paper establishes the necessity of research towards the direction of effort propagation analysis in case of decision-making problems.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Iterative Hierarchy and Ranking Process (IHRP): A Novel Effective Hierarchy Method for Densely Connected Systems and Case Study in Student Performance Assessment
Authors:
Suvojit Dhara,
Adrijit Goswami
Abstract:
In real-life decision-making problems, determining the influences of the factors on the decision attribute is one of the primary tasks. To affect the decision attribute most, finding a proper hierarchy among the factors and determining their importance values in the system becomes quite important. Interpretive structural modeling (ISM) is a widely used hierarchy-building method that mines factor i…
▽ More
In real-life decision-making problems, determining the influences of the factors on the decision attribute is one of the primary tasks. To affect the decision attribute most, finding a proper hierarchy among the factors and determining their importance values in the system becomes quite important. Interpretive structural modeling (ISM) is a widely used hierarchy-building method that mines factor inter-influences based on expert opinions. This paper discusses one of the main drawbacks of the conventional ISM method in systems where the factors are densely interrelated. We refer to such systems as "dense systems". We propose a novel iterative hierarchy-building technique, called 'Iterative Hierarchy and Ranking Process'(IHRP) which performs effectively in such dense systems. To take the vagueness of the expert opinions into account, intuitionistic fuzzy linguistics has been used in the research work. In this paper, we propose a two-stage calculation of the relative importance of the factors in the system based on their hierarchical positions and rank the factors accordingly. We have performed a case study on student performance assessment by taking up novel Indian high-school administrative factors' data collected by surveying the experts in this field. A comparative study has been conducted in terms of the correlation of the factor ranking achieved by the proposed method and conventional ISM method with that of standard outranking methods like TOPSIS, and VIKOR. Our proposed IHRP framework achieves an 85-95% correlation compared to a 50-60% correlation for the conventional ISM method. This proves the effectiveness of the proposed method in determining a better hierarchy than the conventional method, especially in dense systems.
△ Less
Submitted 3 February, 2024; v1 submitted 17 June, 2023;
originally announced June 2023.
-
The Power of Two Matrices in Spectral Algorithms for Community Recovery
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral alg…
▽ More
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices.
We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. Further, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. The latter observation is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms.
△ Less
Submitted 24 October, 2024; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Spectral Algorithms Optimally Recover Planted Sub-structures
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), a…
▽ More
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), as well as in a censored variant of the SBM. Here we show that this optimality is somewhat universal as it carries over to other planted substructures such as the planted dense subgraph problem and submatrix localization problem, as well as to a censored version of the planted dense subgraph problem.
△ Less
Submitted 11 October, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Community detection using low-dimensional network embedding algorithms
Authors:
Aman Barot,
Shankar Bhamidi,
Souvik Dhara
Abstract:
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learnin…
▽ More
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Progressive Update Guided Interdependent Networks for Single Image Dehazing
Authors:
Aupendu Kar,
Sobhan Kanti Dhara,
Debashis Sen,
Prabir Kumar Biswas
Abstract:
Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater n…
▽ More
Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater networks that operate in a progressive manner. The haze parameters, transmission map and atmospheric light, are first estimated using dedicated convolutional networks that allow color-cast handling. The estimated parameters are then used to guide our dehazing module, where the estimates are progressively updated by novel convolutional networks. The updating takes place jointly with progressive dehazing using a network that invokes inter-step dependencies. The joint progressive updating and dehazing gradually modify the haze parameter values toward achieving effective dehazing. Through different studies, our dehazing framework is shown to be more effective than image-to-image mapping and predefined haze formation model based dehazing. The framework is also found capable of handling a wide variety of hazy conditions wtih different types and amounts of haze and color casts. Our dehazing framework is qualitatively and quantitatively found to outperform the state-of-the-art on synthetic and real-world hazy images of multiple datasets with varied haze conditions.
△ Less
Submitted 7 June, 2023; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Across-scale Process Similarity based Interpolation for Image Super-Resolution
Authors:
Sobhan Kanti Dhara,
Debashis Sen
Abstract:
A pivotal step in image super-resolution techniques is interpolation, which aims at generating high resolution images without introducing artifacts such as blurring and ringing. In this paper, we propose a technique that performs interpolation through an infusion of high frequency signal components computed by exploiting `process similarity'. By `process similarity', we refer to the resemblance be…
▽ More
A pivotal step in image super-resolution techniques is interpolation, which aims at generating high resolution images without introducing artifacts such as blurring and ringing. In this paper, we propose a technique that performs interpolation through an infusion of high frequency signal components computed by exploiting `process similarity'. By `process similarity', we refer to the resemblance between a decomposition of the image at a resolution to the decomposition of the image at another resolution. In our approach, the decompositions generating image details and approximations are obtained through the discrete wavelet (DWT) and stationary wavelet (SWT) transforms. The complementary nature of DWT and SWT is leveraged to get the structural relation between the input image and its low resolution approximation. The structural relation is represented by optimal model parameters obtained through particle swarm optimization (PSO). Owing to process similarity, these parameters are used to generate the high resolution output image from the input image. The proposed approach is compared with six existing techniques qualitatively and in terms of PSNR, SSIM, and FSIM measures, along with computation time (CPU time). It is found that our approach is the fastest in terms of CPU time and produces comparable results.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Optimal Service Elasticity in Large-Scale Distributed Systems
Authors:
Debankur Mukherjee,
Souvik Dhara,
Sem Borst,
Johan S. H. van Leeuwaarden
Abstract:
A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance…
▽ More
A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.
Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations.
△ Less
Submitted 24 March, 2017;
originally announced March 2017.
-
Phase transitions of extremal cuts for the configuration model
Authors:
Souvik Dhara,
Debankur Mukherjee,
Subhabrata Sen
Abstract:
The $k$-section width and the Max-Cut for the configuration model are shown to exhibit phase transitions according to the values of certain parameters of the asymptotic degree distribution. These transitions mirror those observed on Erdős-Rényi random graphs, established by Luczak and McDiarmid (2001), and Coppersmith et al. (2004), respectively.
The $k$-section width and the Max-Cut for the configuration model are shown to exhibit phase transitions according to the values of certain parameters of the asymptotic degree distribution. These transitions mirror those observed on Erdős-Rényi random graphs, established by Luczak and McDiarmid (2001), and Coppersmith et al. (2004), respectively.
△ Less
Submitted 16 October, 2017; v1 submitted 9 November, 2016;
originally announced November 2016.