-
SiliCoN: Simultaneous Nuclei Segmentation and Color Normalization of Histological Images
Authors:
Suman Mahapatra,
Pradipta Maji
Abstract:
Segmentation of nuclei regions from histological images is an important task for automated computer-aided analysis of histological images, particularly in the presence of impermissible color variation in the color appearance of stained tissue images. While color normalization enables better nuclei segmentation, accurate segmentation of nuclei structures makes color normalization rather trivial. In…
▽ More
Segmentation of nuclei regions from histological images is an important task for automated computer-aided analysis of histological images, particularly in the presence of impermissible color variation in the color appearance of stained tissue images. While color normalization enables better nuclei segmentation, accurate segmentation of nuclei structures makes color normalization rather trivial. In this respect, the paper proposes a novel deep generative model for simultaneously segmenting nuclei structures and normalizing color appearance of stained histological images.This model judiciously integrates the merits of truncated normal distribution and spatial attention. The model assumes that the latent color appearance information, corresponding to a particular histological image, is independent of respective nuclei segmentation map as well as embedding map information. The disentangled representation makes the model generalizable and adaptable as the modification or loss in color appearance information cannot be able to affect the nuclei segmentation map as well as embedding information. Also, for dealing with the stain overlap of associated histochemical reagents, the prior for latent color appearance code is assumed to be a mixture of truncated normal distributions. The proposed model incorporates the concept of spatial attention for segmentation of nuclei regions from histological images. The performance of the proposed approach, along with a comparative analysis with related state-of-the-art algorithms, has been demonstrated on publicly available standard histological image data sets.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Optimal Transport Driven Asymmetric Image-to-Image Translation for Nuclei Segmentation of Histological Images
Authors:
Suman Mahapatra,
Pradipta Maji
Abstract:
Segmentation of nuclei regions from histological images enables morphometric analysis of nuclei structures, which in turn helps in the detection and diagnosis of diseases under consideration. To develop a nuclei segmentation algorithm, applicable to different types of target domain representations, image-to-image translation networks can be considered as they are invariant to target domain image r…
▽ More
Segmentation of nuclei regions from histological images enables morphometric analysis of nuclei structures, which in turn helps in the detection and diagnosis of diseases under consideration. To develop a nuclei segmentation algorithm, applicable to different types of target domain representations, image-to-image translation networks can be considered as they are invariant to target domain image representations. One of the important issues with image-to-image translation models is that they fail miserably when the information content between two image domains are asymmetric in nature. In this regard, the paper introduces a new deep generative model for segmenting nuclei structures from histological images. The proposed model considers an embedding space for handling information-disparity between information-rich histological image space and information-poor segmentation map domain. Integrating judiciously the concepts of optimal transport and measure theory, the model develops an invertible generator, which provides an efficient optimization framework with lower network complexity. The concept of invertible generator automatically eliminates the need of any explicit cycle-consistency loss. The proposed model also introduces a spatially-constrained squeeze operation within the framework of invertible generator to maintain spatial continuity within the image patches. The model provides a better trade-off between network complexity and model performance compared to other existing models having complex network architectures. The performance of the proposed deep generative model, along with a comparison with state-of-the-art nuclei segmentation methods, is demonstrated on publicly available histological image data sets.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages
Authors:
Sayan Mahapatra,
Debtanu Datta,
Shubham Soni,
Adrijit Goswami,
Saptarshi Ghosh
Abstract:
Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian…
▽ More
Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems and Large Language Models. Through a comprehensive survey by Law practitioners, we check how satisfied they are with the translations by some of these MT systems, and how well automatic MT evaluation metrics agree with the opinions of Law practitioners.
△ Less
Submitted 7 November, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines
Authors:
Shree Charran R,
Sandipan Das Mahapatra
Abstract:
Interpretability is a crucial aspect of machine learning models that enables humans to understand and trust the decision-making process of these models. In many real-world applications, the interpretability of models is essential for legal, ethical, and practical reasons. For instance, in the banking domain, interpretability is critical for lenders and borrowers to understand the reasoning behind…
▽ More
Interpretability is a crucial aspect of machine learning models that enables humans to understand and trust the decision-making process of these models. In many real-world applications, the interpretability of models is essential for legal, ethical, and practical reasons. For instance, in the banking domain, interpretability is critical for lenders and borrowers to understand the reasoning behind the acceptance or rejection of loan applications as per fair lending laws. However, achieving interpretability in machine learning models is challenging, especially for complex high-performance models. Hence Explainable Boosting Machines (EBMs) have been gaining popularity due to their interpretable and high-performance nature in various prediction tasks. However, these models can suffer from issues such as spurious interactions with redundant features and single-feature dominance across all interactions, which can affect the interpretability and reliability of the model's predictions. In this paper, we explore novel approaches to address these issues by utilizing alternate Cross-feature selection, ensemble features and model configuration alteration techniques. Our approach involves a multi-step feature selection procedure that selects a set of candidate features, ensemble features and then benchmark the same using the EBM model. We evaluate our method on three benchmark datasets and show that the alternate techniques outperform vanilla EBM methods, while providing better interpretability and feature selection stability, and improving the model's predictive performance. Moreover, we show that our approach can identify meaningful interactions and reduce the dominance of single features in the model's predictions, leading to more reliable and interpretable models.
Index Terms- Interpretability, EBM's, ensemble, feature selection.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Maximum-Width Rainbow-Bisecting Empty Annulus
Authors:
Sang Won Bae,
Sandip Banerjee,
Arpita Baral,
Priya Ranjan Sinha Mahapatra,
Sang Duk Yoon
Abstract:
Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of…
▽ More
Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of a particular shape with maximum possible width such that $A$ does not contain any input points and it bisects the input point set into two parts, each of which is a rainbow. We compute a maximum-width rainbow-bisecting empty axis-parallel square, axis-parallel rectangular and circular annulus in $O(n^3)$ time using $O(n)$ space, in $O(k^2n^2\log n)$ time using $O(n\log n)$ space and in $O(n^3)$ time using $O(n^2)$ space respectively.
△ Less
Submitted 26 March, 2024; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Not All Lotteries Are Made Equal
Authors:
Surya Kant Sahu,
Sai Mitheran,
Somya Suhans Mahapatra
Abstract:
The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized neural network, a sub-network within the same network yields no less performance than the dense counterpart when trained from the same initialization. This work investigates the relation between model size and the ease of finding these sparse sub-networks. We show through experiments that, surprisingly, under a finite budget, s…
▽ More
The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized neural network, a sub-network within the same network yields no less performance than the dense counterpart when trained from the same initialization. This work investigates the relation between model size and the ease of finding these sparse sub-networks. We show through experiments that, surprisingly, under a finite budget, smaller models benefit more from Ticket Search (TS).
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
New Methods & Metrics for LFQA tasks
Authors:
Suchismit Mahapatra,
Vladimir Blagojevic,
Pablo Bertorello,
Prasanna Kumar
Abstract:
Long-form question answering (LFQA) tasks require retrieving the documents pertinent to a query, using them to form a paragraph-length answer. Despite considerable progress in LFQA modeling, fundamental issues impede its progress: i) train/validation/test dataset overlap, ii) absence of automatic metrics and iii) generated answers not being "grounded" in retrieved documents. This work addresses ev…
▽ More
Long-form question answering (LFQA) tasks require retrieving the documents pertinent to a query, using them to form a paragraph-length answer. Despite considerable progress in LFQA modeling, fundamental issues impede its progress: i) train/validation/test dataset overlap, ii) absence of automatic metrics and iii) generated answers not being "grounded" in retrieved documents. This work addresses every one these critical bottlenecks, contributing natural language inference/generation (NLI/NLG) methods and metrics that make significant strides to their alleviation.
△ Less
Submitted 26 December, 2021;
originally announced December 2021.
-
Parameterized Algorithms for the Steiner Arborescence Problem on a Hypercube
Authors:
Sugyani Mahapatra,
Manikandan Narayanan,
N S Narayanaswamy
Abstract:
Motivated by a phylogeny reconstruction problem in evolutionary biology, we study the minimum Steiner arborescence problem on directed hypercubes (MSA-DH). Given $m$, representing the directed hypercube $\vec{Q}_m$, and a set of terminals $R$, the problem asks to find a Steiner arborescence that spans $R$ with minimum cost. As $m$ implicitly represents $\vec{Q}_m$ comprising $2^{m}$ vertices, the…
▽ More
Motivated by a phylogeny reconstruction problem in evolutionary biology, we study the minimum Steiner arborescence problem on directed hypercubes (MSA-DH). Given $m$, representing the directed hypercube $\vec{Q}_m$, and a set of terminals $R$, the problem asks to find a Steiner arborescence that spans $R$ with minimum cost. As $m$ implicitly represents $\vec{Q}_m$ comprising $2^{m}$ vertices, the running time analyses of traditional Steiner tree algorithms on general graphs does not give a clear understanding of the actual complexity of this problem. We present algorithms that exploit the structure of the hypercube and run in time polynomial in $|R|$ and $m$.
We explore the MSA-DH problem on three natural parameters - $R$, and two above-guarantee parameters, number of Steiner nodes $p$ and penalty $q$. For above-guarantee parameters, the parameterized MSA-DH problem takes $p \geq 0$ or $q\geq 0$ as input, and outputs a Steiner arborescence with at most $|R| + p - 1$ or $m + q$ edges respectively. We present the following results ($\tilde{\mathcal{O}}$ hides the polynomial factors):
1. An exact algorithm that runs in $\tilde{\mathcal{O}}(3^{|R|})$ time.
2. A randomized algorithm that runs in $\tilde{\mathcal{O}}(9^q)$ time with success probability $\geq 4^{-q}$.
3. An exact algorithm that runs in $\tilde{\mathcal{O}}(36^q)$ time.
4. A $(1+q)$-approximation algorithm that runs in $\tilde{\mathcal{O}}(1.25284^q)$ time.
5. An $\mathcal{O}\left(p\ell_{\mathrm{max}} \right)$-additive approximation algorithm that runs in $\tilde{\mathcal{O}}(\ell_{\mathrm{max}}^{p+2})$ time, where $\ell_{\mathrm{max}}$ is the maximum distance of any terminal from the root.
△ Less
Submitted 14 May, 2024; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Efficient Reporting of Top-k Subset Sums
Authors:
Biswajit Sanyal,
Subhashis Majumder,
Priya Ranjan Sinha Mahapatra
Abstract:
The "Subset Sum problem" is a very well-known NP-complete problem. In this work, a top-k variation of the "Subset Sum problem" is considered. This problem has wide application in recommendation systems, where instead of k best objects the k best subsets of objects with the lowest (or highest) overall scores are required. Given an input set R of n real numbers and a positive integer k, our target i…
▽ More
The "Subset Sum problem" is a very well-known NP-complete problem. In this work, a top-k variation of the "Subset Sum problem" is considered. This problem has wide application in recommendation systems, where instead of k best objects the k best subsets of objects with the lowest (or highest) overall scores are required. Given an input set R of n real numbers and a positive integer k, our target is to generate the k best subsets of R such that the sum of their elements is minimized. Our solution methodology is based on constructing a metadata structure G for a given n. Each node of G stores a bit vector of size n from which a subset of R can be retrieved. Here it is shown that the construction of the whole graph G is not needed. To answer a query, only implicit traversal of the required portion of G on demand is sufficient, which obviously gets rid of the preprocessing step, thereby reducing the overall time and space requirement. A modified algorithm is then proposed to generate each subset incrementally, where it is shown that it is possible to do away with the explicit storage of the bit vector. This not only improves the space requirement but also improves the asymptotic time complexity. Finally, a variation of our algorithm that reports only the top-k subset sums has been compared with an existing algorithm, which shows that our algorithm performs better both in terms of time and space requirement by a constant factor.
△ Less
Submitted 25 August, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Differential Tracking Across Topical Webpages of Indian News Media
Authors:
Yash Vekaria,
Vibhor Agarwal,
Pushkal Agarwal,
Sangeeta Mahapatra,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties emplo…
▽ More
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard.
We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on these topical subpages? To answer these questions, we propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Under the Spotlight: Web Tracking in Indian Partisan News Websites
Authors:
Vibhor Agarwal,
Yash Vekaria,
Pushkal Agarwal,
Sangeeta Mahapatra,
Shounak Set,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters wi…
▽ More
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixel-based tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India.
△ Less
Submitted 8 March, 2021; v1 submitted 6 February, 2021;
originally announced February 2021.
-
Observing Responses to the COVID-19 Pandemic using Worldwide Network Cameras
Authors:
Isha Ghodgaonkar,
Abhinav Goel,
Fischer Bordwell,
Caleb Tung,
Sara Aghajanzadeh,
Noah Curran,
Ryan Chen,
Kaiwen Yu,
Sneha Mahapatra,
Vishnu Banna,
Gore Kao,
Kate Lee,
Xiao Hu,
Nick Eliopolous,
Akhil Chinnakotla,
Damini Rijhwani,
Ashley Kim,
Aditya Chakraborty,
Mark Daniel Ward,
Yung-Hsiang Lu,
George K. Thiruvathukal
Abstract:
COVID-19 has resulted in a worldwide pandemic, leading to "lockdown" policies and social distancing. The pandemic has profoundly changed the world. Traditional methods for observing these historical events are difficult because sending reporters to areas with many infected people can put the reporters' lives in danger. New technologies are needed for safely observing responses to these policies. T…
▽ More
COVID-19 has resulted in a worldwide pandemic, leading to "lockdown" policies and social distancing. The pandemic has profoundly changed the world. Traditional methods for observing these historical events are difficult because sending reporters to areas with many infected people can put the reporters' lives in danger. New technologies are needed for safely observing responses to these policies. This paper reports using thousands of network cameras deployed worldwide for the purpose of witnessing activities in response to the policies. The network cameras can continuously provide real-time visual data (image and video) without human efforts. Thus, network cameras can be utilized to observe activities without risking the lives of reporters. This paper describes a project that uses network cameras to observe responses to governments' policies during the COVID-19 pandemic (March to April in 2020). The project discovers over 30,000 network cameras deployed in 110 countries. A set of computer tools are created to collect visual data from network cameras continuously during the pandemic. This paper describes the methods to discover network cameras on the Internet, the methods to collect and manage data, and preliminary results of data analysis. This project can be the foundation for observing the possible "second wave" in fall 2020. The data may be used for post-pandemic analysis by sociologists, public health experts, and meteorologists.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck
Authors:
Yang Zhao,
Ping Yu,
Suchismit Mahapatra,
Qinliang Su,
Changyou Chen
Abstract:
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. However, the sequential text generation common pitfall with VAEs is that the model tends to ignore latent variables with a strong auto-regressive decoder. In this paper, we propose a principled approach to alleviate this issue by applying a discretized bottleneck to enforce an implicit latent feature matchin…
▽ More
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. However, the sequential text generation common pitfall with VAEs is that the model tends to ignore latent variables with a strong auto-regressive decoder. In this paper, we propose a principled approach to alleviate this issue by applying a discretized bottleneck to enforce an implicit latent feature matching in a more compact latent space. We impose a shared discrete latent space where each input is learned to choose a combination of latent atoms as a regularized latent representation. Our model endows a promising capability to model underlying semantics of discrete sequences and thus provide more interpretative latent structures. Empirically, we demonstrate our model's efficiency and effectiveness on a broad range of tasks, including language modeling, unaligned text style transfer, dialog response generation, and neural machine translation.
△ Less
Submitted 25 February, 2021; v1 submitted 22 April, 2020;
originally announced April 2020.
-
3D printed cable-driven continuum robots with generally routed cables: modeling and experiments
Authors:
Soumya Kanti Mahapatra,
Ashwin K. P.,
Ashitava Ghosal
Abstract:
Continuum robots are becoming increasingly popular for applications which require the robots to deform and change shape, while also being compliant. A cable-driven continuum robot is one of the most commonly used type. Typical cable driven continuum robots consist of a flexible backbone with spacer disks attached to the backbone and cables passing through the holes in the spacer disks from the fix…
▽ More
Continuum robots are becoming increasingly popular for applications which require the robots to deform and change shape, while also being compliant. A cable-driven continuum robot is one of the most commonly used type. Typical cable driven continuum robots consist of a flexible backbone with spacer disks attached to the backbone and cables passing through the holes in the spacer disks from the fixed base to a free end. In most such robots, the routing of the cables are straight or a smooth helical curve. In this paper, we analyze the experimental and theoretical deformations of a 3D printed continuum robot, for 6 different kinds of cable routings. The results are compared for discrete optimization based kinematic modelling as well as static modelling using Cosserat rod theory. It is shown that the experimental results match the theoretical results with an error margin of 2%. It is also shown that the optimization based approach is faster than the one based on Cosserat rod theory. We also present a three-fingered gripper prototype where each of the fingers are 3D printed continuum robots with general cable routing. It is demonstrated that the prototype can be used for gripping objects and for its manipulation.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Block Prefix Mechanism for Flow Mobility in PMIPv6 Based Networks
Authors:
K. Vasu,
S. Mahapatra,
C. S. Kumar
Abstract:
The next generation Internet is deemed to be heterogeneous in nature and mobile devices connected to the Internet are expected to be equipped with different wireless network interfaces. As seamless mobility is important in such networks, handover between different network types, called vertical handover, is an important issue in such networks. While proposing standards like Mobile IPv6 (MIPv6) and…
▽ More
The next generation Internet is deemed to be heterogeneous in nature and mobile devices connected to the Internet are expected to be equipped with different wireless network interfaces. As seamless mobility is important in such networks, handover between different network types, called vertical handover, is an important issue in such networks. While proposing standards like Mobile IPv6 (MIPv6) and Proxy Mobile IPv6 (PMIPv6) for mobility management protocols, one important challenge being addressed by IETF work groups and the research community is flow mobility in multi-homed heterogeneous wireless networks. In this paper we propose and analyze a block prefix mechanism for flow mobility in PMIPv6 and conducted extensive analytical and simulation studies to compare the proposed mechanism with existing prefix based mechanisms for flow mobility in PMIPv6 reported in terms of important performance metrics such as handover latency, average hop delay, packet density, signaling cost and packet loss. Both analytical and simulation results demonstrate that the proposed mechanism outperforms the existing flow mobility management procedures using either shared or unique prefixes.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Maximum-Width Empty Square and Rectangular Annulus
Authors:
Sang Won Bae,
Arpita Baral,
Priya Ranjan Sinha Mahapatra
Abstract:
An annulus is, informally, a ring-shaped region, often described by two concentric circles. The maximum-width empty annulus problem asks to find an annulus of a certain shape with the maximum possible width that avoids a given set of $n$ points in the plane. This problem can also be interpreted as the problem of finding an optimal location of a ring-shaped obnoxious facility among the input points…
▽ More
An annulus is, informally, a ring-shaped region, often described by two concentric circles. The maximum-width empty annulus problem asks to find an annulus of a certain shape with the maximum possible width that avoids a given set of $n$ points in the plane. This problem can also be interpreted as the problem of finding an optimal location of a ring-shaped obnoxious facility among the input points. In this paper, we study square and rectangular variants of the maximum-width empty anuulus problem, and present first nontrivial algorithms. Specifically, our algorithms run in $O(n^3)$ and $O(n^2 \log n)$ time for computing a maximum-width empty axis-parallel square and rectangular annulus, respectively. Both algorithms use only $O(n)$ space.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Learning Manifolds from Non-stationary Streaming Data
Authors:
Suchismit Mahapatra,
Varun Chandola
Abstract:
Streaming adaptations of manifold learning based dimensionality reduction methods, such as Isomap, are based on the assumption that a small initial batch of observations is enough for exact learning of the manifold, while remaining streaming data instances can be cheaply mapped to this manifold. However, there are no theoretical results to show that this core assumption is valid. Moreover, such me…
▽ More
Streaming adaptations of manifold learning based dimensionality reduction methods, such as Isomap, are based on the assumption that a small initial batch of observations is enough for exact learning of the manifold, while remaining streaming data instances can be cheaply mapped to this manifold. However, there are no theoretical results to show that this core assumption is valid. Moreover, such methods typically assume that the underlying data distribution is stationary. Such methods are not equipped to detect, or handle, sudden changes or gradual drifts in the distribution that may occur when the data is streaming. We present theoretical results to show that the quality of a manifold asymptotically converges as the size of data increases. We then show that a Gaussian Process Regression (GPR) model, that uses a manifold-specific kernel function and is trained on an initial batch of sufficient size, can closely approximate the state-of-art streaming Isomap algorithms. The predictive variance obtained from the GPR prediction is then shown to be an effective detector of changes in the underlying data distribution. Results on several synthetic and real data sets show that the resulting algorithm can effectively learn lower dimensional representation of high dimensional data in a streaming setting, while identifying shifts in the generative distribution.
△ Less
Submitted 16 July, 2020; v1 submitted 23 April, 2018;
originally announced April 2018.
-
Maximum-width Axis-Parallel Empty Rectangular Annulus
Authors:
Arpita Baral,
Abhilash Gondane,
Sanjib Sadhu,
Priya Ranjan Sinha Mahapatra
Abstract:
Given a set $P$ of $n$ points on $\mathbb R^{2}$, we address the problem of computing an axis-parallel empty rectangular annulus $A$ of maximum-width such that no point of $P$ lies inside $A$ but all points of $P$ must lie inside, outside and on the boundaries of two parallel rectangles forming the annulus $A$. We propose an $O(n^3)$ time and $O(n)$ space algorithm to solve the problem. In a parti…
▽ More
Given a set $P$ of $n$ points on $\mathbb R^{2}$, we address the problem of computing an axis-parallel empty rectangular annulus $A$ of maximum-width such that no point of $P$ lies inside $A$ but all points of $P$ must lie inside, outside and on the boundaries of two parallel rectangles forming the annulus $A$. We propose an $O(n^3)$ time and $O(n)$ space algorithm to solve the problem. In a particular case when the inner rectangle of an axis-parallel empty rectangular annulus reduces to an input point we can solve the problem in $O(n \log n)$ time and $O(n)$ space.
△ Less
Submitted 1 December, 2017;
originally announced December 2017.
-
Modeling Graphs Using a Mixture of Kronecker Models
Authors:
Suchismit Mahapatra,
Varun Chandola
Abstract:
Generative models for graphs are increasingly becoming a popular tool for researchers to generate realistic approximations of graphs. While in the past, focus was on generating graphs which follow general laws, such as the power law for degree distribution, current models have the ability to learn from observed graphs and generate synthetic approximations. The primary emphasis of existing models h…
▽ More
Generative models for graphs are increasingly becoming a popular tool for researchers to generate realistic approximations of graphs. While in the past, focus was on generating graphs which follow general laws, such as the power law for degree distribution, current models have the ability to learn from observed graphs and generate synthetic approximations. The primary emphasis of existing models has been to closely match different properties of a single observed graph. Such models, though stochastic, tend to generate samples which do not have significant variance in terms of the various graph properties. We argue that in many cases real graphs are sampled drawn from a graph population (e.g., networks sampled at various time points, social networks for individual schools, healthcare networks for different geographic regions, etc.). Such populations typically exhibit significant variance. However, existing models are not designed to model this variance, which could lead to issues such as overfitting. We propose a graph generative model that focuses on matching the properties of real graphs and the natural variance expected for the corresponding population. The proposed model adopts a mixture-model strategy to expand the expressiveness of Kronecker product based graph models (KPGM), while building upon the two strengths of KPGM, viz., ability to model several key properties of graphs and to scale to massive graph sizes using its elegant fractal growth based formulation. The proposed model, called x-Kronecker Product Graph Model, or xKPGM, allows scalable learning from observed graphs and generates samples that match the mean and variance of several salient graph properties. We experimentally demonstrate the capability of the proposed model to capture the inherent variability in real world graphs on a variety of publicly available graph data sets.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
S-Isomap++: Multi Manifold Learning from Streaming Data
Authors:
Suchismit Mahapatra,
Varun Chandola
Abstract:
Manifold learning based methods have been widely used for non-linear dimensionality reduction (NLDR). However, in many practical settings, the need to process streaming data is a challenge for such methods, owing to the high computational complexity involved. Moreover, most methods operate under the assumption that the input data is sampled from a single manifold, embedded in a high dimensional sp…
▽ More
Manifold learning based methods have been widely used for non-linear dimensionality reduction (NLDR). However, in many practical settings, the need to process streaming data is a challenge for such methods, owing to the high computational complexity involved. Moreover, most methods operate under the assumption that the input data is sampled from a single manifold, embedded in a high dimensional space. We propose a method for streaming NLDR when the observed data is either sampled from multiple manifolds or irregularly sampled from a single manifold. We show that existing NLDR methods, such as Isomap, fail in such situations, primarily because they rely on smoothness and continuity of the underlying manifold, which is violated in the scenarios explored in this paper. However, the proposed algorithm is able to learn effectively in presence of multiple, and potentially intersecting, manifolds, while allowing for the input data to arrive as a massive stream.
△ Less
Submitted 17 March, 2018; v1 submitted 17 October, 2017;
originally announced October 2017.
-
Measurement and Analysis of UDP Traffic over Wi-Fi and GPRS
Authors:
Sumit Maheshwari,
K. Vasu,
Sudipta Mahapatra,
C. S. Kumar
Abstract:
With the increasing usage of mobile devices to ubiquitously access heterogeneous applications in wireless Internet, the measurement and analysis of Internet traffic has become a key research area. In this paper, we present the results of our measurements for VBR traffic over UDP in 802.11g and GPRS networks. We focus on Inter-Packet Arrival Time (IPRT) and Inter-Packet Transmission Delay (IPTD) an…
▽ More
With the increasing usage of mobile devices to ubiquitously access heterogeneous applications in wireless Internet, the measurement and analysis of Internet traffic has become a key research area. In this paper, we present the results of our measurements for VBR traffic over UDP in 802.11g and GPRS networks. We focus on Inter-Packet Arrival Time (IPRT) and Inter-Packet Transmission Delay (IPTD) and observe that the later has a significant impact on the round trip delay. Numerical parameters for Weibull, Exponential and Normal distribution in order to represent such traffic are also presented.
△ Less
Submitted 26 July, 2017;
originally announced July 2017.
-
A Hybrid Approach for Secured Optimal Power Flow and Voltage Stability with TCSC Placement
Authors:
Sheila Mahapatra,
Nitin Malik
Abstract:
This paper proposes a hybrid technique for secured optimal power flow coupled with enhancing voltage stability with FACTS device installation. The hybrid approach of Improved Gravitational Search algorithm (IGSA) and Firefly algorithm (FA) performance is analyzed by optimally placing TCSC controller. The algorithm is implemented in MATLAB working platform and the power flow security and voltage st…
▽ More
This paper proposes a hybrid technique for secured optimal power flow coupled with enhancing voltage stability with FACTS device installation. The hybrid approach of Improved Gravitational Search algorithm (IGSA) and Firefly algorithm (FA) performance is analyzed by optimally placing TCSC controller. The algorithm is implemented in MATLAB working platform and the power flow security and voltage stability is evaluated with IEEE 30 bus transmission systems. The optimal results generated are compared with those available in literature and the superior performance of algorithm is depicted as minimum generation cost, reduced real power losses along with sustaining voltage stability.
△ Less
Submitted 31 January, 2017;
originally announced January 2017.
-
No-hole $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling for Square Grid
Authors:
Soumen Atta,
Priya Ranjan Sinha Mahapatra,
Stanisław Goldstein
Abstract:
Given a fixed $k$ $\in$ $\mathbb{Z}^+$ and $λ$ $\in$ $\mathbb{Z}^+$, the objective of a $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling of a graph $G$ is to assign non-negative integers (known as labels) from the set $\{0, \ldots, λ-1\}$ to the vertices of $G$ such that the adjacent vertices receive values which differ by at least $k$, vertices connected by a path of length two receive values which differ…
▽ More
Given a fixed $k$ $\in$ $\mathbb{Z}^+$ and $λ$ $\in$ $\mathbb{Z}^+$, the objective of a $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling of a graph $G$ is to assign non-negative integers (known as labels) from the set $\{0, \ldots, λ-1\}$ to the vertices of $G$ such that the adjacent vertices receive values which differ by at least $k$, vertices connected by a path of length two receive values which differ by at least $k-1$, and so on. The vertices which are at least $k+1$ distance apart can receive the same label. The smallest $λ$ for which there exists a $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling of $G$ is known as the $L(k, k-1, \ldots, 2, 1)$-labeling number of $G$ and is denoted by $λ_k(G)$. The ratio between the upper bound and the lower bound of a $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling is known as the approximation ratio. In this paper a lower bound on the value of the labeling number for square grid is computed and a formula is proposed which yields a $λ$-$L(k, k-1, \ldots, 2, 1)$-labeling of square grid, with approximation ratio at most $\frac{9}{8}$. The labeling presented is a no-hole one, i.e., it uses each label from $0$ to $λ-1$ at least once.
△ Less
Submitted 22 December, 2016; v1 submitted 21 September, 2016;
originally announced September 2016.
-
A Low Complexity VLSI Architecture for Multi-Focus Image Fusion in DCT Domain
Authors:
Ashutosh Mishra,
Sudipta Mahapatra,
Swapna Banerjee
Abstract:
Due to the confined focal length of optical sensors, focusing all objects in a scene with a single sensor is a difficult task. To handle such a situation, image fusion methods are used in multi-focus environment. Discrete Cosine Transform (DCT) is a widely used image compression transform, image fusion in DCT domain is an efficient method. This paper presents a low complexity approach for multi-fo…
▽ More
Due to the confined focal length of optical sensors, focusing all objects in a scene with a single sensor is a difficult task. To handle such a situation, image fusion methods are used in multi-focus environment. Discrete Cosine Transform (DCT) is a widely used image compression transform, image fusion in DCT domain is an efficient method. This paper presents a low complexity approach for multi-focus image fusion and its VLSI implementation using DCT. The proposed method is evaluated using reference/non-reference fusion measure criteria and the obtained results asserts it's effectiveness. The maximum synthesized frequency on FPGA is found to be 221 MHz and consumes 42% of FPGA resources. The proposed method consumes very less power and can process 4K resolution images at the rate of 60 frames per second which makes the hardware suitable for handheld portable devices such as camera module and wireless image sensors.
△ Less
Submitted 15 September, 2015;
originally announced February 2016.
-
A Novel Solution to the Dynamic Routing and Wavelength Assignment Problem in Transparent Optical Networks
Authors:
Urmila Bhanja,
Sudipta Mahapatra,
Rajarshi Roy
Abstract:
We present an evolutionary programming algorithm for solving the dynamic routing and wavelength assignment (DRWA) problem in optical wavelength-division multiplexing (WDM) networks under wavelength continuity constraint. We assume an ideal physical channel and therefore neglect the blocking of connection requests due to the physical impairments. The problem formulation includes suitable constraint…
▽ More
We present an evolutionary programming algorithm for solving the dynamic routing and wavelength assignment (DRWA) problem in optical wavelength-division multiplexing (WDM) networks under wavelength continuity constraint. We assume an ideal physical channel and therefore neglect the blocking of connection requests due to the physical impairments. The problem formulation includes suitable constraints that enable the algorithm to balance the load among the individuals and thus results in a lower blocking probability and lower mean execution time than the existing bio-inspired algorithms available in the literature for the DRWA problems. Three types of wavelength assignment techniques, such as First fit, Random, and Round Robin wavelength assignment techniques have been investigated here. The ability to guarantee both low blocking probability without any wavelength converters and small delay makes the improved algorithm very attractive for current optical switching networks.
△ Less
Submitted 17 March, 2010;
originally announced March 2010.