-
Analytic continuations and numerical evaluation of the Appell $F_1$, $F_3$, Lauricella $F_D^{(3)}$ and Lauricella-Saran $F_S^{(3)}$ and their Application to Feynman Integrals
Authors:
Souvik Bera,
Tanay Pathak
Abstract:
We present our investigation of the study of two variable hypergeometric series, namely Appell $F_{1}$ and $F_{3}$ series, and obtain a comprehensive list of its analytic continuations enough to cover the whole real $(x,y)$ plane, except on their singular loci. We also derive analytic continuations of their 3-variable generalization, the Lauricella $F_{D}^{(3)}$ series and the Lauricella-Saran…
▽ More
We present our investigation of the study of two variable hypergeometric series, namely Appell $F_{1}$ and $F_{3}$ series, and obtain a comprehensive list of its analytic continuations enough to cover the whole real $(x,y)$ plane, except on their singular loci. We also derive analytic continuations of their 3-variable generalization, the Lauricella $F_{D}^{(3)}$ series and the Lauricella-Saran $F_{S}^{(3)}$ series, leveraging the analytic continuations of $F_{1}$ and $F_{3}$, which ensures that the whole real $(x,y,z)$ space is covered, except on the singular loci of these functions. While these studies are motivated by the frequent occurrence of these multivariable hypergeometric functions in Feynman integral evaluation, they can also be used whenever they appear in other branches of mathematical physics. To facilitate their practical use, we provide four packages: AppellF1$.$wl, AppellF3$.$wl, LauricellaFD$.$wl, and LauricellaSaranFS$.$wl in MATHEMATICA. These packages are applicable for generic as well as non-generic values of parameters, keeping in mind their utilities in the evaluation of the Feynman integrals. We explicitly present various physical applications of these packages in the context of Feynman integral evaluation and compare the results using other packages such as FIESTA. Upon applying the appropriate conventions for numerical evaluation, we find that the results obtained from our packages are consistent. Various Mathematica notebooks demonstrating different numerical results are also provided along with this paper.
△ Less
Submitted 1 January, 2025; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation
Authors:
Abhisek Tiwari,
Shreyangshu Bera,
Sriparna Saha,
Pushpak Bhattacharyya,
Samrat Ghosh
Abstract:
Over the past few years, the use of the Internet for healthcare-related tasks has grown by leaps and bounds, posing a challenge in effectively managing and processing information to ensure its efficient utilization. During moments of emotional turmoil and psychological challenges, we frequently turn to the internet as our initial source of support, choosing this over discussing our feelings with o…
▽ More
Over the past few years, the use of the Internet for healthcare-related tasks has grown by leaps and bounds, posing a challenge in effectively managing and processing information to ensure its efficient utilization. During moments of emotional turmoil and psychological challenges, we frequently turn to the internet as our initial source of support, choosing this over discussing our feelings with others due to the associated social stigma. In this paper, we propose a new task of multi-modal medical concern summary (MMCS) generation, which provides a short and precise summary of patients' major concerns brought up during the consultation. Nonverbal cues, such as patients' gestures and facial expressions, aid in accurately identifying patients' concerns. Doctors also consider patients' personal information, such as age and gender, in order to describe the medical condition appropriately. Motivated by the potential efficacy of patients' personal context and visual gestures, we propose a transformer-based multi-task, multi-modal intent-recognition, and medical concern summary generation (IR-MMCSG) system. Furthermore, we propose a multitasking framework for intent recognition and medical concern summary generation for doctor-patient consultations. We construct the first multi-modal medical concern summary generation (MM-MediConSummation) corpus, which includes patient-doctor consultations annotated with medical concern summaries, intents, patient personal information, doctor's recommendations, and keywords. Our experiments and analysis demonstrate (a) the significant role of patients' expressions/gestures and their personal information in intent identification and medical concern summary generation, and (b) the strong correlation between intent recognition and patients' medical concern summary generation
The dataset and source code are available at https://github.com/NLP-RL/MMCSG.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Histopathological Image Analysis with Style-Augmented Feature Domain Mixing for Improved Generalization
Authors:
Vaibhav Khamankar,
Sutanu Bera,
Saumik Bhattacharya,
Debashis Sen,
Prabir Kumar Biswas
Abstract:
Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmenta…
▽ More
Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmentation is an emerging technique that can be used to improve the generalizability of machine learning models for histopathological images. However, existing style transfer-based methods can be computationally expensive, and they rely on artistic styles, which can negatively impact model accuracy. In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
$\texttt{ChisholmD.wl}$- Automated rational approximant for bi-variate series
Authors:
Souvik Bera,
Tanay Pathak
Abstract:
The Chisholm rational approximant is a natural generalization to two variables of the well-known single variable Padé approximant, and has the advantage of reducing to the latter when one of the variables is set equals to 0. We present, to our knowledge, the first automated Mathematica package to evaluate diagonal Chisholm approximants of two variable series. For the moment, the package can only b…
▽ More
The Chisholm rational approximant is a natural generalization to two variables of the well-known single variable Padé approximant, and has the advantage of reducing to the latter when one of the variables is set equals to 0. We present, to our knowledge, the first automated Mathematica package to evaluate diagonal Chisholm approximants of two variable series. For the moment, the package can only be used to evaluate diagonal approximants i.e. the maximum powers of both the variables, in both the numerator and the denominator, is equal to some integer $M$. We further modify the original method so as to allow us to evaluate the approximants around some general point $(x,y)$ not necessarily $(0,0)$. Using the approximants around general point $(x,y)$, allows us to get a better estimate of the result when the point of evaluation is far from $(0,0)$. Several examples of the elementary functions have been studied which shows that the approximants can be useful for analytic continuation and convergence acceleration purposes. We continue our study using various examples of two variable hypergeometric series, $\mathrm{Li}_{2,2}(x,y)$ etc that arise in particle physics and in the study of critical phenomena in condensed matter physics. The demonstration of the package is discussed in detail and the Mathematica package is provided as an ancillary file.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
DeMEtRIS: Counting (near)-Cliques by Crawling
Authors:
Suman K. Bera,
Jayesh Choudhari,
Shahrzad Haddadan,
Sara Ahmadian
Abstract:
We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be sto…
▽ More
We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be stored as a whole or be scanned entirely and sampling vertices independently is non-trivial in it. We introduce DeMEtRIS: Dense Motif Estimation through Random Incident Sampling. This method provides a scalable algorithm for clique and near clique counting in the random walk model. We prove the correctness of our algorithm through rigorous mathematical analysis and extensive experiments. Both our theoretical results and our experiments show that DeMEtRIS obtains a high precision estimation by only crawling a sub-linear portion on vertices, thus we demonstrate a significant improvement over previously known results.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Spectral Triadic Decompositions of Real-World Networks
Authors:
Sabyasachi Basu,
Suman Kalyan Bera,
C. Seshadhri
Abstract:
A fundamental problem in mathematics and network analysis is to find conditions under which a graph can be partitioned into smaller pieces. The most important tool for this partitioning is the Fiedler vector or discrete Cheeger inequality. These results relate the graph spectrum (eigenvalues of the normalized adjacency matrix) to the ability to break a graph into two pieces, with few edge deletion…
▽ More
A fundamental problem in mathematics and network analysis is to find conditions under which a graph can be partitioned into smaller pieces. The most important tool for this partitioning is the Fiedler vector or discrete Cheeger inequality. These results relate the graph spectrum (eigenvalues of the normalized adjacency matrix) to the ability to break a graph into two pieces, with few edge deletions. An entire subfield of mathematics, called spectral graph theory, has emerged from these results. Yet these results do not say anything about the rich community structure exhibited by real-world networks, which typically have a significant fraction of edges contained in numerous densely clustered blocks. Inspired by the properties of real-world networks, we discover a new spectral condition that relates eigenvalue powers to a network decomposition into densely clustered blocks. We call this the \emph{spectral triadic decomposition}. Our relationship exactly predicts the existence of community structure, as commonly seen in real networked data. Our proof provides an efficient algorithm to produce the spectral triadic decomposition. We observe on numerous social, coauthorship, and citation network datasets that these decompositions have significant correlation with semantically meaningful communities.
△ Less
Submitted 8 May, 2024; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Self Supervised Low Dose Computed Tomography Image Denoising Using Invertible Network Exploiting Inter Slice Congruence
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
The resurgence of deep neural networks has created an alternative pathway for low-dose computed tomography denoising by learning a nonlinear transformation function between low-dose CT (LDCT) and normal-dose CT (NDCT) image pairs. However, those paired LDCT and NDCT images are rarely available in the clinical environment, making deep neural network deployment infeasible. This study proposes a nove…
▽ More
The resurgence of deep neural networks has created an alternative pathway for low-dose computed tomography denoising by learning a nonlinear transformation function between low-dose CT (LDCT) and normal-dose CT (NDCT) image pairs. However, those paired LDCT and NDCT images are rarely available in the clinical environment, making deep neural network deployment infeasible. This study proposes a novel method for self-supervised low-dose CT denoising to alleviate the requirement of paired LDCT and NDCT images. Specifically, we have trained an invertible neural network to minimize the pixel-based mean square distance between a noisy slice and the average of its two immediate adjacent noisy slices. We have shown the aforementioned is similar to training a neural network to minimize the distance between clean NDCT and noisy LDCT image pairs. Again, during the reverse mapping of the invertible network, the output image is mapped to the original input image, similar to cycle consistency loss. Finally, the trained invertible network's forward mapping is used for denoising LDCT images. Extensive experiments on two publicly available datasets showed that our method performs favourably against other existing unsupervised methods.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Design Perspectives of Multitask Deep Learning Models and Applications
Authors:
Yeshwant Singh,
Anupam Biswas,
Angshuman Bora,
Debashish Malakar,
Subham Chakraborty,
Suman Bera
Abstract:
In recent years, multi-task learning has turned out to be of great success in various applications. Though single model training has promised great results throughout these years, it ignores valuable information that might help us estimate a metric better. Under learning-related tasks, multi-task learning has been able to generalize the models even better. We try to enhance the feature mapping of…
▽ More
In recent years, multi-task learning has turned out to be of great success in various applications. Though single model training has promised great results throughout these years, it ignores valuable information that might help us estimate a metric better. Under learning-related tasks, multi-task learning has been able to generalize the models even better. We try to enhance the feature mapping of the multi-tasking models by sharing features among related tasks and inductive transfer learning. Also, our interest is in learning the task relationships among various tasks for acquiring better benefits from multi-task learning. In this chapter, our objective is to visualize the existing multi-tasking models, compare their performances, the methods used to evaluate the performance of the multi-tasking models, discuss the problems faced during the design and implementation of these models in various domains, and the advantages and milestones achieved by them
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
A New Dynamic Algorithm for Densest Subhypergraphs
Authors:
Suman K. Bera,
Sayan Bhattacharya,
Jayesh Choudhari,
Prantar Ghosh
Abstract:
Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time.
This motivates the problem of maintaining the densest subhypergraph of a weighted hypergr…
▽ More
Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time.
This motivates the problem of maintaining the densest subhypergraph of a weighted hypergraph in a {\em dynamic setting}, where the input keeps changing via a sequence of updates (hyperedge insertions/deletions). Previously, the only known algorithm for this problem was due to Hu et al. [HWC17]. This algorithm worked only on unweighted hypergraphs, and had an approximation ratio of $(1+ε)r^2$ and an update time of $O(\text{poly} (r, \log n))$, where $r$ denotes the maximum rank of the input across all the updates.
We obtain a new algorithm for this problem, which works even when the input hypergraph is weighted. Our algorithm has a significantly improved (near-optimal) approximation ratio of $(1+ε)$ that is independent of $r$, and a similar update time of $O(\text{poly} (r, \log n))$. It is the first $(1+ε)$-approximation algorithm even for the special case of weighted simple graphs.
To complement our theoretical analysis, we perform experiments with our dynamic algorithm on large-scale, real-world data-sets. Our algorithm significantly outperforms the state of the art [HWC17] both in terms of accuracy and efficiency.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Olsson.wl : a Mathematica package for the computation of linear transformations of multivariable hypergeometric functions
Authors:
B. Ananthanarayan,
Souvik Bera,
S. Friot,
Tanay Pathak
Abstract:
We present the Olsson$.$wl Mathematica package which aims to find linear transformations for some classes of multivariable hypergeometric functions. It is based on a well-known method developed by P. O. M. Olsson in J. Math. Phys. 5, 420 (1964) in order to derive the analytic continuations of the Appell $F_1$ double hypergeometric series from the linear transformations of the Gauss $_2F_1$ hyperge…
▽ More
We present the Olsson$.$wl Mathematica package which aims to find linear transformations for some classes of multivariable hypergeometric functions. It is based on a well-known method developed by P. O. M. Olsson in J. Math. Phys. 5, 420 (1964) in order to derive the analytic continuations of the Appell $F_1$ double hypergeometric series from the linear transformations of the Gauss $_2F_1$ hypergeometric function. We provide a brief description of Olsson's method and demonstrate the commands of the package, along with examples. We also provide a companion package, called ROC2$.$wl and dedicated to the derivation of the regions of convergence of double hypergeometric series. This package can be used independently of Olsson$.$wl.
△ Less
Submitted 31 December, 2021;
originally announced January 2022.
-
Iterative Gradient Encoding Network with Feature Co-Occurrence Loss for Single Image Reflection Removal
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
Removing undesired reflections from a photo taken in front of glass is of great importance for enhancing visual computing systems' efficiency. Previous learning-based approaches have produced visually plausible results for some reflections type, however, failed to generalize against other reflection types. There is a dearth of literature for efficient methods concerning single image reflection rem…
▽ More
Removing undesired reflections from a photo taken in front of glass is of great importance for enhancing visual computing systems' efficiency. Previous learning-based approaches have produced visually plausible results for some reflections type, however, failed to generalize against other reflection types. There is a dearth of literature for efficient methods concerning single image reflection removal, which can generalize well in large-scale reflection types. In this study, we proposed an iterative gradient encoding network for single image reflection removal. Next, to further supervise the network in learning the correlation between the transmission layer features, we proposed a feature co-occurrence loss. Extensive experiments on the public benchmark dataset of SIR$^2$ demonstrated that our method can remove reflection favorably against the existing state-of-the-art method on all imaging settings, including diverse backgrounds. Moreover, as the reflection strength increases, our method can still remove reflection even where other state of the art methods failed.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Noise Conscious Training of Non Local Neural Network powered by Self Attentive Spectral Normalized Markovian Patch GAN for Low Dose CT Denoising
Authors:
Sutanu Bera,
Prabir Kumar Biswas
Abstract:
The explosive rise of the use of Computer tomography (CT) imaging in medical practice has heightened public concern over the patient's associated radiation dose. However, reducing the radiation dose leads to increased noise and artifacts, which adversely degrades the scan's interpretability. Consequently, an advanced image reconstruction algorithm to improve the diagnostic performance of low dose…
▽ More
The explosive rise of the use of Computer tomography (CT) imaging in medical practice has heightened public concern over the patient's associated radiation dose. However, reducing the radiation dose leads to increased noise and artifacts, which adversely degrades the scan's interpretability. Consequently, an advanced image reconstruction algorithm to improve the diagnostic performance of low dose ct arose as the primary concern among the researchers, which is challenging due to the ill-posedness of the problem. In recent times, the deep learning-based technique has emerged as a dominant method for low dose CT(LDCT) denoising. However, some common bottleneck still exists, which hinders deep learning-based techniques from furnishing the best performance. In this study, we attempted to mitigate these problems with three novel accretions. First, we propose a novel convolutional module as the first attempt to utilize neighborhood similarity of CT images for denoising tasks. Our proposed module assisted in boosting the denoising by a significant margin. Next, we moved towards the problem of non-stationarity of CT noise and introduced a new noise aware mean square error loss for LDCT denoising. Moreover, the loss mentioned above also assisted to alleviate the laborious effort required while training CT denoising network using image patches. Lastly, we propose a novel discriminator function for CT denoising tasks. The conventional vanilla discriminator tends to overlook the fine structural details and focus on the global agreement. Our proposed discriminator leverage self-attention and pixel-wise GANs for restoring the diagnostic quality of LDCT images. Our method validated on a publicly available dataset of the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge performed remarkably better than the existing state of the art method.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles
Authors:
Suman K. Bera,
Noujan Pashanasangi,
C. Seshadhri
Abstract:
Counting homomorphisms of a constant sized pattern graph $H$ in an input graph $G$ is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input $G$ and the pattern $H$. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We foc…
▽ More
Counting homomorphisms of a constant sized pattern graph $H$ in an input graph $G$ is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input $G$ and the pattern $H$. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of $H$, $H$-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns $H$ for which near-linear time algorithms are possible?
We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let $m$ denote the number of edges in $G$. We prove the following: if the largest induced cycle in $H$ has length at most $5$, then there is an $O(m\log m)$ algorithm for counting $H$-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in $H$ has length at least $6$, then (assuming standard fine-grained complexity conjectures) there is a constant $γ> 0$, such that there is no $o(m^{1+γ})$ time algorithm for counting $H$-homomorphisms.
△ Less
Submitted 18 November, 2020; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Counting Subgraphs in Degenerate Graphs
Authors:
Suman K. Bera,
Lior Gishboliner,
Yevgeny Levanzov,
C. Seshadhri,
Asaf Shapira
Abstract:
We consider the problem of counting the number of copies of a fixed graph $H$ within an input graph $G$. This is one of the most well-studied algorithmic graph problems, with many theoretical and practical applications. We focus on solving this problem when the input $G$ has bounded degeneracy. This is a rich family of graphs, containing all graphs without a fixed minor (e.g. planar graphs), as we…
▽ More
We consider the problem of counting the number of copies of a fixed graph $H$ within an input graph $G$. This is one of the most well-studied algorithmic graph problems, with many theoretical and practical applications. We focus on solving this problem when the input $G$ has bounded degeneracy. This is a rich family of graphs, containing all graphs without a fixed minor (e.g. planar graphs), as well as graphs generated by various random processes (e.g. preferential attachment graphs). We say that $H$ is easy if there is a linear-time algorithm for counting the number of copies of $H$ in an input $G$ of bounded degeneracy. A seminal result of Chiba and Nishizeki from '85 states that every $H$ on at most 4 vertices is easy. Bera, Pashanasangi, and Seshadhri recently extended this to all $H$ on 5 vertices, and further proved that for every $k > 5$ there is a $k$-vertex $H$ which is not easy. They left open the natural problem of characterizing all easy graphs $H$.
Bressan has recently introduced a framework for counting subgraphs in degenerate graphs, from which one can extract a sufficient condition for a graph $H$ to be easy. Here we show that this sufficient condition is also necessary, thus fully answering the Bera--Pashanasangi--Seshadhri problem. We further resolve two closely related problems; namely characterizing the graphs that are easy with respect to counting induced copies, and with respect to counting homomorphisms.
△ Less
Submitted 9 December, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Lightweight Modules for Efficient Deep Learning based Image Restoration
Authors:
Avisek Lahiri,
Sourav Bairagya,
Sutanu Bera,
Siddhant Haldar,
Prabir Kumar Biswas
Abstract:
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost vari…
▽ More
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Recent works for efficient neural networks design have mainly focused on classification. However, low-level image processing falls under the image-to-image' translation genre which requires some additional computational modules not present in classification. This paper seeks to bridge this gap by designing generic efficient modules which can replace essential components used in contemporary deep learning based image restoration networks. We also present and analyse our results highlighting the drawbacks of applying depthwise separable convolutional kernel (a popular method for efficient classification network) for sub-pixel convolution based upsampling (a popular upsampling strategy for low-level vision applications). This shows that concepts from domain of classification cannot always be seamlessly integrated into image-to-image translation tasks. We extensively validate our findings on three popular tasks of image inpainting, denoising and super-resolution. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines with significant reduction of parameters, memory footprint and execution speeds on contemporary mobile devices.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Distributional Individual Fairness in Clustering
Authors:
Nihesh Anderson,
Suman K. Bera,
Syamantak Das,
Yang Liu
Abstract:
In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population…
▽ More
In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population. However, to the best of our knowledge, the alternative viewpoint of individual fairness, introduced by Dwork et al. (ITCS 2012) in the context of classification, has not been considered for clustering so far. Similar to Dwork et al., we adopt the individual fairness notion which mandates that similar individuals should be treated similarly for clustering problems. We use the notion of $f$-divergence as a measure of statistical similarity that significantly generalizes the ones used by Dwork et al. We introduce a framework for assigning individuals, embedded in a metric space, to probability distributions over a bounded number of cluster centers. The objective is to ensure (a) low cost of clustering in expectation and (b) individuals that are close to each other in a given fairness space are mapped to statistically similar distributions.
We provide an algorithm for clustering with $p$-norm objective ($k$-center, $k$-means are special cases) and individual fairness constraints with provable approximation guarantee. We extend this framework to include both group fairness and individual fairness inside the protected groups. Finally, we observe conditions under which individual fairness implies group fairness. We present extensive experimental evidence that justifies the effectiveness of our approach.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
How to Count Triangles, without Seeing the Whole Graph
Authors:
Suman K. Bera,
C. Seshadhri
Abstract:
Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we…
▽ More
Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph?
We formally study triangle counting in the {\em random walk} access model introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We have access to an arbitrary seed vertex of the graph, and can only perform random walks. This model is restrictive in access and captures the challenges of collecting real-world graphs. Even sampling a uniform random vertex is a hard task in this model.
Despite these challenges, we design a provable and practical algorithm, TETRIS, for triangle counting in this model. TETRIS is the first provably sublinear algorithm (for most natural parameter settings) that approximates the triangle count in the random walk model, for graphs with low mixing time. Our result builds on recent advances in the theory of sublinear algorithms. The final sample built by TETRIS is a careful mix of random walks and degree-biased sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a variety of large graphs, getting estimates within 5\% relative error by looking at 3\% of the number of edges.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Atom Search Optimization with Simulated Annealing -- a Hybrid Metaheuristic Approach for Feature Selection
Authors:
Kushal Kanti Ghosh,
Ritam Guha,
Soulib Ghosh,
Suman Kumar Bera,
Ram Sarkar
Abstract:
'Hybrid meta-heuristics' is one of the most interesting recent trends in the field of optimization and feature selection (FS). In this paper, we have proposed a binary variant of Atom Search Optimization (ASO) and its hybrid with Simulated Annealing called ASO-SA techniques for FS. In order to map the real values used by ASO to the binary domain of FS, we have used two different transfer functions…
▽ More
'Hybrid meta-heuristics' is one of the most interesting recent trends in the field of optimization and feature selection (FS). In this paper, we have proposed a binary variant of Atom Search Optimization (ASO) and its hybrid with Simulated Annealing called ASO-SA techniques for FS. In order to map the real values used by ASO to the binary domain of FS, we have used two different transfer functions: S-shaped and V-shaped. We have hybridized this technique with a local search technique called, SA We have applied the proposed feature selection methods on 25 datasets from 4 different categories: UCI, Handwritten digit recognition, Text, non-text separation, and Facial emotion recognition. We have used 3 different classifiers (K-Nearest Neighbor, Multi-Layer Perceptron and Random Forest) for evaluating the strength of the selected featured by the binary ASO, ASO-SA and compared the results with some recent wrapper-based algorithms. The experimental results confirm the superiority of the proposed method both in terms of classification accuracy and number of selected features.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
How the Degeneracy Helps for Triangle Counting in Graph Streams
Authors:
Suman K. Bera,
C. Seshadhri
Abstract:
We revisit the well-studied problem of triangle count estimation in graph streams. Given a graph represented as a stream of $m$ edges, our aim is to compute a $(1\pm\varepsilon)$-approximation to the triangle count $T$, using a small space algorithm. For arbitrary order and a constant number of passes, the space complexity is known to be essentially $Θ(\min(m^{3/2}/T, m/\sqrt{T}))$ (McGregor et al…
▽ More
We revisit the well-studied problem of triangle count estimation in graph streams. Given a graph represented as a stream of $m$ edges, our aim is to compute a $(1\pm\varepsilon)$-approximation to the triangle count $T$, using a small space algorithm. For arbitrary order and a constant number of passes, the space complexity is known to be essentially $Θ(\min(m^{3/2}/T, m/\sqrt{T}))$ (McGregor et al., PODS 2016, Bera et al., STACS 2017).
We give a (constant pass, arbitrary order) streaming algorithm that can circumvent this lower bound for \emph{low degeneracy graphs}. The degeneracy, $κ$, is a nuanced measure of density, and the class of constant degeneracy graphs is immensely rich (containing planar graphs, minor-closed families, and preferential attachment graphs). We design a streaming algorithm with space complexity $\widetilde{O}(mκ/T)$. For constant degeneracy graphs, this bound is $\widetilde{O}(m/T)$, which is significantly smaller than both $m^{3/2}/T$ and $m/\sqrt{T}$. We complement our algorithmic result with a nearly matching lower bound of $Ω(mκ/T)$.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six
Authors:
Suman K. Bera,
Noujan Pashanasangi,
C. Seshadhri
Abstract:
We consider the problem of counting all $k$-vertex subgraphs in an input graph, for any constant $k$. This problem (denoted sub-cnt$_k$) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for exam…
▽ More
We consider the problem of counting all $k$-vertex subgraphs in an input graph, for any constant $k$. This problem (denoted sub-cnt$_k$) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for example, all minor-free families and preferential attachment graphs. The techniques from this result have inspired a number of recent practical algorithms for sub-cnt$_k$. Towards a better understanding of the limits of these techniques, we ask: for what values of $k$ can sub-cnt$_k$ be solved in linear time?
We discover a chasm at $k=6$. Specifically, we prove that for $k < 6$, sub-cnt$_k$ can be solved in linear time. Assuming a standard conjecture in fine-grained complexity, we prove that for all $k \geq 6$, sub-cnt$_k$ cannot be solved even in near-linear time.
△ Less
Submitted 27 November, 2019; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models
Authors:
Suman K. Bera,
Amit Chakrabarti,
Prantar Ghosh
Abstract:
We study the problem of coloring a given graph using a small number of colors in several well-established models of computation for big data. These include the data streaming model, the general graph query model, the massively parallel computation (MPC) model, and the CONGESTED-CLIQUE and the LOCAL models of distributed computation. On the one hand, we give algorithms with sublinear complexity, fo…
▽ More
We study the problem of coloring a given graph using a small number of colors in several well-established models of computation for big data. These include the data streaming model, the general graph query model, the massively parallel computation (MPC) model, and the CONGESTED-CLIQUE and the LOCAL models of distributed computation. On the one hand, we give algorithms with sublinear complexity, for the appropriate notion of complexity in each of these models. Our algorithms color a graph $G$ using about $κ(G)$ colors, where $κ(G)$ is the degeneracy of $G$: this parameter is closely related to the arboricity $α(G)$. As a function of $κ(G)$ alone, our results are close to best possible, since the optimal number of colors is $κ(G)+1$.
On the other hand, we establish certain lower bounds indicating that sublinear algorithms probably cannot go much further. In particular, we prove that any randomized coloring algorithm that uses $κ(G)+1$ many colors, would require $Ω(n^2)$ storage in the one pass streaming model, and $Ω(n^2)$ many queries in the general graph query model, where $n$ is the number of vertices in the graph. These lower bounds hold even when the value of $κ(G)$ is known in advance; at the same time, our upper bounds do not require $κ(G)$ to be given in advance.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Fair Algorithms for Clustering
Authors:
Suman K. Bera,
Deeparnab Chakrabarty,
Nicolas J. Flores,
Maryam Negahbani
Abstract:
We study the problem of finding low-cost Fair Clusterings in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti et.al. (NIPS 2017) as follows.
- We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of a…
▽ More
We study the problem of finding low-cost Fair Clusterings in data where each data point may belong to many protected groups. Our work significantly generalizes the seminal work of Chierichetti et.al. (NIPS 2017) as follows.
- We allow the user to specify the parameters that define fair representation. More precisely, these parameters define the maximum over- and minimum under-representation of any group in any cluster.
- Our clustering algorithm works on any $\ell_p$-norm objective (e.g. $k$-means, $k$-median, and $k$-center). Indeed, our algorithm transforms any vanilla clustering solution into a fair one incurring only a slight loss in quality.
- Our algorithm also allows individuals to lie in multiple protected groups. In other words, we do not need the protected groups to partition the data and we can maintain fairness across different groups simultaneously.
Our experiments show that on established data sets, our algorithm performs much better in practice than what our theoretical results suggest.
△ Less
Submitted 17 June, 2019; v1 submitted 8 January, 2019;
originally announced January 2019.
-
Coloring in Graph Streams
Authors:
Suman Kalyan Bera,
Prantar Ghosh
Abstract:
In this paper, we initiate the study of the vertex coloring problem of a graph in the semi streaming model. In this model, the input graph is defined by a stream of edges, arriving in adversarial order and any algorithm must process the edges in the order of arrival using space linear (up to polylogarithmic factors) in the number of vertices of the graph. In the offline settings, there is a simple…
▽ More
In this paper, we initiate the study of the vertex coloring problem of a graph in the semi streaming model. In this model, the input graph is defined by a stream of edges, arriving in adversarial order and any algorithm must process the edges in the order of arrival using space linear (up to polylogarithmic factors) in the number of vertices of the graph. In the offline settings, there is a simple greedy algorithm for $(Δ+1)$-vertex coloring of a graph with maximum degree $Δ$. We design a one pass randomized streaming algorithm for $(1+\varepsilon)Δ$-vertex coloring problem for any constant $\varepsilon >0$ using $O(\varepsilon^{-1} n ~\mathrm{ poly} \log n)$ space where $n$ is the number of vertices in the graph. Much more color efficient algorithms are known for graphs with bounded arboricity in the offline settings. Specifically, there is a simple $2α$-vertex coloring algorithm for a graph with arboricity $α$. We present a $O(\varepsilon^{-1}\log n)$ pass randomized vertex coloring algorithm that requires at most $(2+\varepsilon)α$ many colors for any constant $\varepsilon>0$ for a graph with arboricity $α$ in the semi streaming model.
△ Less
Submitted 25 July, 2018; v1 submitted 19 July, 2018;
originally announced July 2018.
-
On Chord and Sagitta in ${\mathbb Z}^2$: An Analysis towards Fast and Robust Circular Arc Detection
Authors:
Sahadev Bera,
Shyamosree Pal,
Partha Bhowmick,
Bhargab B. Bhattacharya
Abstract:
Although chord and sagitta, when considered in tandem, may reflect many underlying geometric properties of circles on the Euclidean plane, their implications on the digital plane are not yet well-understood. In this paper, we explore some of their fundamental properties on the digital plane that have a strong bearing on the unsupervised detection of circles and circular arcs in a digital image. We…
▽ More
Although chord and sagitta, when considered in tandem, may reflect many underlying geometric properties of circles on the Euclidean plane, their implications on the digital plane are not yet well-understood. In this paper, we explore some of their fundamental properties on the digital plane that have a strong bearing on the unsupervised detection of circles and circular arcs in a digital image. We show that although the chord-and-sagitta properties of a real circle do not readily migrate to the digital plane, they can indeed be used for the analysis in the discrete domain based on certain bounds on their deviations, which are derived from the real domain. In particular, we derive an upper bound on the circumferential angular deviation of a point in the context of chord property, and an upper bound on the relative error in radius estimation with regard to the sagitta property. Using these two bounds, we design a novel algorithm for the detection and parameterization of circles and circular arcs, which does not require any heuristic initialization or manual tuning. The chord property is deployed for the detection of circular arcs, whereas the sagitta property is used to estimate their centers and radii. Finally, to improve the accuracy of estimation, the notion of restricted Hough transform is used. Experimental results demonstrate superior efficiency and robustness of the proposed methodology compared to existing techniques.
△ Less
Submitted 26 October, 2014;
originally announced November 2014.
-
On Covering a Solid Sphere with Concentric Spheres in ${\mathbb Z}^3$
Authors:
Sahadev Bera,
Partha Bhowmick,
Bhargab B. Bhattacharya
Abstract:
We show that a digital sphere, constructed by the circular sweep of a digital semicircle (generatrix) around its diameter, consists of some holes (absentee-voxels), which appear on its spherical surface of revolution. This incompleteness calls for a proper characterization of the absentee-voxels whose restoration will yield a complete spherical surface without any holes. In this paper, we present…
▽ More
We show that a digital sphere, constructed by the circular sweep of a digital semicircle (generatrix) around its diameter, consists of some holes (absentee-voxels), which appear on its spherical surface of revolution. This incompleteness calls for a proper characterization of the absentee-voxels whose restoration will yield a complete spherical surface without any holes. In this paper, we present a characterization of such absentee-voxels using certain techniques of digital geometry and show that their count varies quadratically with the radius of the semicircular generatrix. Next, we design an algorithm to fill these absentee-voxels so as to generate a spherical surface of revolution, which is more realistic from the viewpoint of visual perception. We further show that covering a solid sphere by a set of complete spheres also results in an asymptotically larger count of absentees, which is cubic in the radius of the sphere. The characterization and generation of complete solid spheres without any holes can also be accomplished in a similar fashion. We furnish test results to substantiate our theoretical findings.
△ Less
Submitted 23 October, 2014;
originally announced November 2014.
-
Fenchel Duals for Drifting Adversaries
Authors:
Suman K Bera,
Anamitra R Choudhury,
Syamantak Das,
Sambuddha Roy,
Jayram S. Thatchachar
Abstract:
We describe a primal-dual framework for the design and analysis of online convex optimization algorithms for {\em drifting regret}. Existing literature shows (nearly) optimal drifting regret bounds only for the $\ell_2$ and the $\ell_1$-norms. Our work provides a connection between these algorithms and the Online Mirror Descent ($\omd$) updates; one key insight that results from our work is that i…
▽ More
We describe a primal-dual framework for the design and analysis of online convex optimization algorithms for {\em drifting regret}. Existing literature shows (nearly) optimal drifting regret bounds only for the $\ell_2$ and the $\ell_1$-norms. Our work provides a connection between these algorithms and the Online Mirror Descent ($\omd$) updates; one key insight that results from our work is that in order for these algorithms to succeed, it suffices to have the gradient of the regularizer to be bounded (in an appropriate norm). For situations (like for the $\ell_1$ norm) where the vanilla regularizer does not have this property, we have to {\em shift} the regularizer to ensure this. Thus, this helps explain the various updates presented in \cite{bansal10, buchbinder12}. We also consider the online variant of the problem with 1-lookahead, and with movement costs in the $\ell_2$-norm. Our primal dual approach yields nearly optimal competitive ratios for this problem.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams
Authors:
Suman K. Bera,
Sourav Dutta,
Ankur Narang,
Souvik Bhattacherjee
Abstract:
Applications involving telecommunication call data records, web pages, online transactions, medical records, stock markets, climate warning systems, etc., necessitate efficient management and processing of such massively exponential amount of data from diverse sources. De-duplication or Intelligent Compression in streaming scenarios for approximate identification and elimination of duplicates from…
▽ More
Applications involving telecommunication call data records, web pages, online transactions, medical records, stock markets, climate warning systems, etc., necessitate efficient management and processing of such massively exponential amount of data from diverse sources. De-duplication or Intelligent Compression in streaming scenarios for approximate identification and elimination of duplicates from such unbounded data stream is a greater challenge given the real-time nature of data arrival. Stable Bloom Filters (SBF) addresses this problem to a certain extent. .
In this work, we present several novel algorithms for the problem of approximate detection of duplicates in data streams. We propose the Reservoir Sampling based Bloom Filter (RSBF) combining the working principle of reservoir sampling and Bloom Filters. We also present variants of the novel Biased Sampling based Bloom Filter (BSBF) based on biased sampling concepts. We also propose a randomized load balanced variant of the sampling Bloom Filter approach to efficiently tackle the duplicate detection. In this work, we thus provide a generic framework for de-duplication using Bloom Filters. Using detailed theoretical analysis we prove analytical bounds on the false positive rate, false negative rate and convergence rate of the proposed structures. We exhibit that our models clearly outperform the existing methods. We also demonstrate empirical analysis of the structures using real-world datasets (3 million records) and also with synthetic datasets (1 billion records) capturing various input distributions.
△ Less
Submitted 17 December, 2012;
originally announced December 2012.
-
Approximation Algorithms for Edge Partitioned Vertex Cover Problems
Authors:
Suman Kalyan Bera,
Shalmoli Gupta,
Amit Kumar,
Sambuddha Roy
Abstract:
We consider a natural generalization of the Partial Vertex Cover problem. Here an instance consists of a graph G = (V,E), a positive cost function c: V-> Z^{+}, a partition $P_1,..., P_r$ of the edge set $E$, and a parameter $k_i$ for each partition $P_i$. The goal is to find a minimum cost set of vertices which cover at least $k_i$ edges from the partition $P_i$. We call this the Partition Vertex…
▽ More
We consider a natural generalization of the Partial Vertex Cover problem. Here an instance consists of a graph G = (V,E), a positive cost function c: V-> Z^{+}, a partition $P_1,..., P_r$ of the edge set $E$, and a parameter $k_i$ for each partition $P_i$. The goal is to find a minimum cost set of vertices which cover at least $k_i$ edges from the partition $P_i$. We call this the Partition Vertex Cover problem. In this paper, we give matching upper and lower bound on the approximability of this problem. Our algorithm is based on a novel LP relaxation for this problem. This LP relaxation is obtained by adding knapsack cover inequalities to a natural LP relaxation of the problem. We show that this LP has integrality gap of $O(log r)$, where $r$ is the number of sets in the partition of the edge set. We also extend our result to more general settings.
△ Less
Submitted 10 October, 2012; v1 submitted 8 December, 2011;
originally announced December 2011.
-
WiMAX Based 60 GHz Millimeter-Wave Communication for Intelligent Transport System Applications
Authors:
Rabindranath Bera,
Subir Kumar Sarkar,
Bikash Sharma,
Samarendra Nath Sur,
Debasish Bhaskar,
Soumyasree Bera
Abstract:
With the successful worldwide deployment of 3rd generation mobile communication, security aspects are ensured partly. Researchers are now looking for 4G mobile for its deployment with high data rate, enhanced security and reliability so that world should look for CALM, Continuous Air interface for Long and Medium range communication. This CALM will be a reliable high data rate secured mobile commu…
▽ More
With the successful worldwide deployment of 3rd generation mobile communication, security aspects are ensured partly. Researchers are now looking for 4G mobile for its deployment with high data rate, enhanced security and reliability so that world should look for CALM, Continuous Air interface for Long and Medium range communication. This CALM will be a reliable high data rate secured mobile communication to be deployed for car to car communication (C2C) for safety application. This paper reviewed the WiMAX ,& 60 GHz RF carrier for C2C. The system is tested at SMIT laboratory with multimedia transmission and reception. With proper deployment of this 60 GHz system on vehicles, the existing commercial products for 802.11P will be required to be replaced or updated soon .
△ Less
Submitted 2 May, 2011;
originally announced May 2011.