Search | arXiv e-print repository

arXiv:2504.19916 [pdf, ps, other]

An Achievability Bound for Type-Based Unsourced Multiple Access

Authors: Deekshith Pathayappilly Krishnan, Kaan Okumus, Khac-Hoang Ngo, Giuseppe Durisi

Abstract: We derive an achievability bound to quantify the performance of a type-based unsourced multiple access system -- an information-theoretic model for grant-free multiple access with correlated messages. The bound extends available achievability results for the per-user error probability in the unsourced multiple access framework, where, different from our setup, message collisions are treated as err… ▽ More We derive an achievability bound to quantify the performance of a type-based unsourced multiple access system -- an information-theoretic model for grant-free multiple access with correlated messages. The bound extends available achievability results for the per-user error probability in the unsourced multiple access framework, where, different from our setup, message collisions are treated as errors. Specifically, we provide an upper bound on the total variation distance between the type (i.e., the empirical probability mass function) of the transmitted messages and its estimate over a Gaussian multiple access channel. Through numerical simulations, we illustrate that our bound can be used to determine the message type that is less efficient to transmit, because more difficult to detect. We finally show that a practical scheme for type estimation, based on coded compressed sensing with approximate message passing, operates approximately 3 dB away from the bound, for the parameters considered in the paper. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 8 pages, 1 figure. Extended version of a paper accepted for presentation at ISIT 2025

arXiv:2501.18908 [pdf, other]

Streamlining Security Vulnerability Triage with Large Language Models

Authors: Mohammad Jalili Torkamani, Joey NG, Nikita Mehrotra, Mahinthan Chandramohan, Padmanabhan Krishnan, Rahul Purandare

Abstract: Bug triaging for security vulnerabilities is a critical part of software maintenance, ensuring that the most pressing vulnerabilities are addressed promptly to safeguard system integrity and user data. However, the process is resource-intensive and comes with challenges, including classifying software vulnerabilities, assessing their severity, and managing a high volume of bug reports. In this pap… ▽ More Bug triaging for security vulnerabilities is a critical part of software maintenance, ensuring that the most pressing vulnerabilities are addressed promptly to safeguard system integrity and user data. However, the process is resource-intensive and comes with challenges, including classifying software vulnerabilities, assessing their severity, and managing a high volume of bug reports. In this paper, we present CASEY, a novel approach that leverages Large Language Models (in our case, the GPT model) that automates the identification of Common Weakness Enumerations (CWEs) of security bugs and assesses their severity. CASEY employs prompt engineering techniques and incorporates contextual information at varying levels of granularity to assist in the bug triaging process. We evaluated CASEY using an augmented version of the National Vulnerability Database (NVD), employing quantitative and qualitative metrics to measure its performance across CWE identification, severity assessment, and their combined analysis. CASEY achieved a CWE identification accuracy of 68%, a severity identification accuracy of 73.6%, and a combined accuracy of 51.2% for identifying both. These results demonstrate the potential of LLMs in identifying CWEs and severity levels, streamlining software vulnerability management, and improving the efficiency of security vulnerability triaging workflows. △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: 16 pages, 22 figures, 6 tables, preprint

ACM Class: D.2; K.6.3; I.2.7

arXiv:2501.11505 [pdf, other]

Sun-Jafar-Type Schemes for Weak Private Information Retrieval

Authors: Chandan Anand, Jayesh Seshadri, Prasad Krishnan, Gowtham R. Kurri

Abstract: In information-theoretic private information retrieval (PIR), a client wants to retrieve one desired file out of $M$ files, stored across $N$ servers, while keeping the index of the desired file private from each $T$-sized subset of servers. A PIR protocol must ideally maximize the rate, which is the ratio of the file size to the total quantum of the download from the servers, while ensuring such… ▽ More In information-theoretic private information retrieval (PIR), a client wants to retrieve one desired file out of $M$ files, stored across $N$ servers, while keeping the index of the desired file private from each $T$-sized subset of servers. A PIR protocol must ideally maximize the rate, which is the ratio of the file size to the total quantum of the download from the servers, while ensuring such privacy. In Weak-PIR (WPIR), the criterion of perfect information-theoretic privacy is relaxed. This enables higher rates to be achieved, while some information about the desired file index leaks to the servers. This leakage is captured by various known privacy metrics. By leveraging the well-established capacity-achieving schemes of Sun and Jafar under non-colluding ($T=1$) and colluding ($1<T\leq N$) scenarios, we present WPIR protocols for these scenarios. We also present a new WPIR scheme for the MDS scenario, by building upon the scheme by Banawan and Ulukus for this scenario. We present corresponding explicit rate-privacy trade-offs for these setups, under the mutual-information and the maximal leakage privacy metrics. In the collusion-free setup, our presented rate-privacy trade-off under maximal leakage matches that of the previous state of the art. With respect to the MDS scenario under the maximal leakage metric, we compare with the non-explicit trade-off in the literature, and show that our scheme performs better for some numerical examples. For the $T$-collusion setup (under both privacy metrics) and for the MDS setup under the mutual information metric, our rate-privacy trade-offs are the first in the literature, to the best of our knowledge. △ Less

Submitted 20 January, 2025; originally announced January 2025.

arXiv:2410.16154 [pdf, other]

doi 10.1109/IJCNN60899.2024.10650116

Unsupervised Replay Strategies for Continual Learning with Limited Data

Authors: Anthony Bazhenov, Pahan Dewasurendra, Giri P. Krishnan, Jean Erik Delanois

Abstract: Artificial neural networks (ANNs) show limited performance with scarce or imbalanced training data and face challenges with continuous learning, such as forgetting previously learned data after new tasks training. In contrast, the human brain can learn continuously and from just a few examples. This research explores the impact of 'sleep', an unsupervised phase incorporating stochastic activation… ▽ More Artificial neural networks (ANNs) show limited performance with scarce or imbalanced training data and face challenges with continuous learning, such as forgetting previously learned data after new tasks training. In contrast, the human brain can learn continuously and from just a few examples. This research explores the impact of 'sleep', an unsupervised phase incorporating stochastic activation with local Hebbian learning rules, on ANNs trained incrementally with limited and imbalanced datasets, specifically MNIST and Fashion MNIST. We discovered that introducing a sleep phase significantly enhanced accuracy in models trained with limited data. When a few tasks were trained sequentially, sleep replay not only rescued previously learned information that had been catastrophically forgetting following new task training but often enhanced performance in prior tasks, especially those trained with limited data. This study highlights the multifaceted role of sleep replay in augmenting learning efficiency and facilitating continual learning in ANNs. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Journal ref: 2024 International Joint Conference on Neural Networks (IJCNN)

arXiv:2410.08427 [pdf]

Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds

Authors: Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan

Abstract: In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collect… ▽ More In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collecting provenance information during the build, the resulting artifacts can be used with greater trust. Such offerings are now available from Google, Oracle and RedHat. The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: 'Does build A confirm the integrity of build B?' or 'Can build A reveal a compromised build B?'. To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types. We demonstrate the value of these new levels through several experiments. We construct a dataset consisting of Java binaries built from the same sources independently by different providers, resulting in 14,156 pairs of binaries in total. We then compare the compiled class files in those jar files and find that for 3,750 pairs of jars (26.49%) there is at least one such file that is different, also forcing the jar files and their cryptographic hashes to be different. However, based on the new equivalence levels, we can still establish that many of them are practically equivalent. We evaluate several candidate equivalence relations on a semi-synthetic dataset that provides oracles consisting of pairs of binaries that either should be, or must not be equivalent. △ Less

Submitted 9 April, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

Comments: 20 pages, 1 figure, 10 tables

ACM Class: D.2.13; D.3.4; F.3.2

arXiv:2407.21783 [pdf, other]

The Llama 3 Herd of Models

Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development. △ Less

Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.02732 [pdf, other]

Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models

Authors: Mahinthan Chandramohan, Dai Quoc Nguyen, Padmanabhan Krishnan, Jovan Jancic

Abstract: Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our ap… ▽ More Automatically locating a bug within a large codebase remains a significant challenge for developers. Existing techniques often struggle with generalizability and deployment due to their reliance on application-specific data and large model sizes. This paper proposes a novel pre-trained language model (PLM) based technique for bug localization that transcends project and language boundaries. Our approach leverages contrastive learning to enhance the representation of bug reports and source code. It then utilizes a novel ranking approach that combines commit messages and code segments. Additionally, we introduce a knowledge distillation technique that reduces model size for practical deployment without compromising performance. This paper presents several key benefits. By incorporating code segment and commit message analysis alongside traditional file-level examination, our technique achieves better bug localization accuracy. Furthermore, our model excels at generalizability - trained on code from various projects and languages, it can effectively identify bugs in unseen codebases. To address computational limitations, we propose a CPU-compatible solution. In essence, proposed work presents a highly effective, generalizable, and efficient bug localization technique with the potential to real-world deployment. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.01062 [pdf, other]

Layout Agnostic Scene Text Image Synthesis with Diffusion Models

Authors: Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

Abstract: While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text styl… ▽ More While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text styles and fonts an inherent limitation stemming from the deterministic nature of the layout generation phase. To address these challenges this paper introduces SceneTextGen a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage. By doing so SceneTextGen facilitates a more natural and varied representation of text. The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed typographic properties coupled with a character-level instance segmentation model and a word-level spotting model to address the issues of unwanted text generation and minor character inaccuracies. We validate the performance of our method by demonstrating improved character recognition rates on generated images across different public visual text datasets in comparison to both standard diffusion based methods and text specific methods. △ Less

Submitted 15 September, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7496-7506

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7496-7506

arXiv:2405.07870 [pdf]

Mapping the Invisible: A Framework for Tracking COVID-19 Spread Among College Students with Google Location Data

Authors: Prajindra Sankar Krishnan, Chai Phing Chen, Gamal Alkawsi, Sieh Kiong Tiong, Luiz Fernando Capretz

Abstract: The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding… ▽ More The COVID-19 pandemic and the implementation of social distancing policies have rapidly changed people's visiting patterns, as reflected in mobility data that tracks mobility traffic using location trackers on cell phones. However, the frequency and duration of concurrent occupancy at specific locations govern the transmission rather than the number of customers visiting. Therefore, understanding how people interact in different locations is crucial to target policies, inform contact tracing, and prevention strategies. This study proposes an efficient way to reduce the spread of the virus among on-campus university students by developing a self-developed Google History Location Extractor and Indicator software based on real-world human mobility data. The platform enables policymakers and researchers to explore the possibility of future developments in the epidemic's spread and simulate the outcomes of human mobility and epidemic state under different epidemic control policies. It offers functions for determining potential contacts, assessing individual infection risks, and evaluating the effectiveness of on-campus policies. The proposed multi-functional platform facilitates the screening process by more accurately targeting potential virus carriers and aids in making informed decisions on epidemic control policies, ultimately contributing to preventing and managing future outbreaks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 8 pages

Journal ref: Latin American Workshop on Data Fusion (LAFUSION 2023), November/2023, pp 1-8, Rio de Janeiro, Brazil

arXiv:2404.19552 [pdf, ps, other]

Type-Based Unsourced Multiple Access

Authors: Khac-Hoang Ngo, Deekshith Pathayappilly Krishnan, Kaan Okumus, Giuseppe Durisi, Erik G. Ström

Abstract: We generalize the type-based multiple access framework proposed by Mergen and Tong (2006) to the case of unsourced multiple access. In the proposed framework, each device tracks the state of a physical/digital process, quantizes this state, and communicates it to a common receiver through a shared channel in an uncoordinated manner. The receiver aims to estimate the type of the states, i.e., the s… ▽ More We generalize the type-based multiple access framework proposed by Mergen and Tong (2006) to the case of unsourced multiple access. In the proposed framework, each device tracks the state of a physical/digital process, quantizes this state, and communicates it to a common receiver through a shared channel in an uncoordinated manner. The receiver aims to estimate the type of the states, i.e., the set of states and their multiplicity in the sequence of states reported by all devices. We measure the type estimation error using the Wasserstein distance. Considering an example of multi-target position tracking, we show that type estimation can be performed effectively via approximate message passing. Furthermore, we determine the quantization resolution that minimizes the type estimation error by balancing quantization distortion and communication error. △ Less

Submitted 15 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: accepted to the 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC); simulation code available at: https://github.com/khachoang1412/TUMA

arXiv:2404.11420 [pdf, other]

Quantum Cloud Computing: A Review, Open Problems, and Future Directions

Authors: Hoa T. Nguyen, Prabhakar Krishnan, Dilip Krishnaswamy, Muhammad Usman, Rajkumar Buyya

Abstract: Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art qu… ▽ More Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to host and operate physical quantum computers. This paper reviews recent advances, identifies open problems, and proposes future directions in quantum cloud computing. It discusses the state-of-the-art quantum cloud advances, including the various cloud-based models, platforms, and recently developed technologies and software use cases. Furthermore, it discusses different aspects of the quantum cloud, including resource management, quantum serverless, security, and privacy problems. Finally, the paper examines open problems and proposes the future directions of quantum cloud computing, including potential opportunities and ongoing research in this emerging field. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2401.16342 [pdf, other]

On Achievable Rates for the Shotgun Sequencing Channel with Erasures

Authors: Hrishi Narayanan, Prasad Krishnan, Nita Parekh

Abstract: In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless… ▽ More In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless substrings of the original sequence. Modern shotgun sequencers however also output quality scores for each base read, indicating the confidence in its identification. Bases with low quality scores can be considered to be erased. Motivated by this, we consider the \textit{shotgun sequencing channel with erasures}, where each symbol in any read can be independently erased with some probability $δ$. We identify achievable rates for this channel, using a random code construction and a decoder that uses typicality-like arguments to merge the reads. △ Less

Submitted 12 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted for presentation at ISIT 2024

arXiv:2401.15678 [pdf, other]

Recursive Subproduct Codes with Reed-Muller-like Structure

Authors: Aditya Siddheshwar, Lakshmi Prasad Natarajan, Prasad Krishnan

Abstract: We study a family of subcodes of the $m$-dimensional product code $\mathscr{C}^{\otimes m}$ ('subproduct codes') that have a recursive Plotkin-like structure, and which include Reed-Muller (RM) codes and Dual Berman codes as special cases. We denote the codes in this family as $\mathscr{C}^{\otimes [r,m]}$, where $0 \leq r \leq m$ is the 'order' of the code. These codes allow a 'projection' operat… ▽ More We study a family of subcodes of the $m$-dimensional product code $\mathscr{C}^{\otimes m}$ ('subproduct codes') that have a recursive Plotkin-like structure, and which include Reed-Muller (RM) codes and Dual Berman codes as special cases. We denote the codes in this family as $\mathscr{C}^{\otimes [r,m]}$, where $0 \leq r \leq m$ is the 'order' of the code. These codes allow a 'projection' operation that can be exploited in iterative decoding, viz., the sum of two carefully chosen subvectors of any codeword in $\mathscr{C}^{\otimes [r,m]}$ belongs to $\mathscr{C}^{\otimes [r-1,m-1]}$. Recursive subproduct codes provide a wide range of rates and block lengths compared to RM codes while possessing several of their structural properties, such as the Plotkin-like design, the projection property, and fast ML decoding of first-order codes. Our simulation results for first-order and second-order codes, that are based on a belief propagation decoder and a local graph search algorithm, show instances of subproduct codes that perform either better than or within 0.5 dB of comparable RM codes and CRC-aided Polar codes. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2308.13173 [pdf, other]

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

Authors: Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

Abstract: This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metri… ▽ More This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 9 pages

arXiv:2307.03981 [pdf]

BER Analysis of Full Duplex Relay assisted BPSK-SIM based VLC System for Indoor Applications

Authors: L Bhargava Kumar, Ramavath Prasad Naik, Datta Choudhari, Prabu Krishnan, Goutham Simha G D, Jagadeesh V K

Abstract: This paper contemplates a relay-assisted visible light communication (VLC) system, where the light source (Table lamp) acts as a relay node and cooperates with the main light source. Following the IEEE 802.15.7r1 VLC reference channel model, we assume that there are two different light sources present in an office room. The first one is the source terminal present on the ceiling and another one is… ▽ More This paper contemplates a relay-assisted visible light communication (VLC) system, where the light source (Table lamp) acts as a relay node and cooperates with the main light source. Following the IEEE 802.15.7r1 VLC reference channel model, we assume that there are two different light sources present in an office room. The first one is the source terminal present on the ceiling and another one is the desk lamp that serves as the relay station which works in full-duplex method. Because of the loop interference channel, we model VLC relay terminal using ray tracing simulations. We have analyzed bit error rate (BER) performance of the relay-assisted VLC system using binary phase shift keying-subcarrier intensity modulation (BPSK-SIM) technique. The proposed method outperforms existing phase shift keying (PSK) and square M-quadrature amplitude modulation (M-QAM) techniques. The proposed VLC system using BPSK-SIM technique achieves a BER performance of for an SNR of 20 dB. The results of proposed full duplex and half duplex relayed VLC system are evaluated using equal power allocation (EPA) and optimum power allocations (OPA) techniques over three different modulation schemes which are 2-PSK, square M-QAM, BPSK-SIM. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2305.14828 [pdf, other]

Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image Manipulation

Authors: Prashant Krishnan, Zilong Wang, Yangkun Wang, Jingbo Shang

Abstract: Recent advances of incorporating layout information, typically bounding box coordinates, into pre-trained language models have achieved significant performance in entity recognition from document images. Using coordinates can easily model the absolute position of each token, but they might be sensitive to manipulations in document images (e.g., shifting, rotation or scaling), especially when the t… ▽ More Recent advances of incorporating layout information, typically bounding box coordinates, into pre-trained language models have achieved significant performance in entity recognition from document images. Using coordinates can easily model the absolute position of each token, but they might be sensitive to manipulations in document images (e.g., shifting, rotation or scaling), especially when the training data is limited in few-shot settings. In this paper, we propose to further introduce the topological adjacency relationship among the tokens, emphasizing their relative position information. Specifically, we consider the tokens in the documents as nodes and formulate the edges based on the topological heuristics from the k-nearest bounding boxes. Such adjacency graphs are invariant to affine transformations including shifting, rotations and scaling. We incorporate these graphs into the pre-trained language model by adding graph neural network layers on top of the language model embeddings, leading to a novel model LAGER. Extensive experiments on two benchmark datasets show that LAGER significantly outperforms strong baselines under different few-shot settings and also demonstrate better robustness to manipulations. △ Less

Submitted 23 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.05596 [pdf, ps, other]

On the Structure of Higher Order MDS Codes

Authors: Harshithanjani Athi, Rasagna Chigullapally, Prasad Krishnan, Lalitha Vadlamani

Abstract: A code of length $n$ is said to be (combinatorially) $(ρ,L)$-list decodable if the Hamming ball of radius $ρn$ around any vector in the ambient space does not contain more than $L$ codewords. We study a recently introduced class of higher order MDS codes, which are closely related (via duality) to codes that achieve a generalized Singleton bound for list decodability. For some $\ell\geq 1$, higher… ▽ More A code of length $n$ is said to be (combinatorially) $(ρ,L)$-list decodable if the Hamming ball of radius $ρn$ around any vector in the ambient space does not contain more than $L$ codewords. We study a recently introduced class of higher order MDS codes, which are closely related (via duality) to codes that achieve a generalized Singleton bound for list decodability. For some $\ell\geq 1$, higher order MDS codes of length $n$, dimension $k$, and order $\ell$ are denoted as $(n,k)$-MDS($\ell$) codes. We present a number of results on the structure of these codes, identifying the `extend-ability' of their parameters in various scenarios. Specifically, for some parameter regimes, we identify conditions under which $(n_1,k_1)$-MDS($\ell_1$) codes can be obtained from $(n_2,k_2)$-MDS($\ell_2$) codes, via various techniques. We believe that these results will aid in efficient constructions of higher order MDS codes. We also obtain a new field size upper bound for the existence of such codes, which arguably improves over the best known existing bound, in some parameter regimes. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: Accepted into IEEE International Symposium on Information Theory 2023

arXiv:2305.04606 [pdf, ps, other]

$t$-PIR Schemes with Flexible Parameters via Star Products of Berman Codes

Authors: Srikar Kale, Keshav Agarwal, Prasad Krishnan

Abstract: We present a new class of private information retrieval (PIR) schemes that keep the identity of the file requested private in the presence of at most $t$ colluding servers, based on the recent framework developed for such $t$-PIR schemes using star products of transitive codes. These $t$-PIR schemes employ the class of Berman codes as the storage-retrieval code pairs. Berman codes, which are binar… ▽ More We present a new class of private information retrieval (PIR) schemes that keep the identity of the file requested private in the presence of at most $t$ colluding servers, based on the recent framework developed for such $t$-PIR schemes using star products of transitive codes. These $t$-PIR schemes employ the class of Berman codes as the storage-retrieval code pairs. Berman codes, which are binary linear codes of length $n^m$ for any $n\geq 2$ and $m\geq 1$ being positive integers, were recently shown to achieve the capacity of the binary erasure channel. We provide a complete characterization of the star products of the Berman code pairs, enabling us to calculate the PIR rate of the star product-based schemes that employ these codes. The schemes we present have flexibility in the number of servers, the PIR rate, the storage rate, and the collusion parameter $t$, owing to numerous codes available in the class of Berman codes. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Accepted at IEEE International Symposium for Information Technology (ISIT), 2023

arXiv:2302.03452 [pdf, other]

Cache-Aided Communication Schemes via Combinatorial Designs and their $q$-analogs

Authors: Shailja Agrawal, K V Sushena Sree, Prasad Krishnan, Abhinav Vaishya, Srikar Kale

Abstract: We consider the standard broadcast setup with a single server broadcasting information to a number of clients, each of which contains local storage (called cache) of some size, which can store some parts of the available files at the server. The centralized coded caching framework, consists of a caching phase and a delivery phase, both of which are carefully designed in order to use the cache and… ▽ More We consider the standard broadcast setup with a single server broadcasting information to a number of clients, each of which contains local storage (called cache) of some size, which can store some parts of the available files at the server. The centralized coded caching framework, consists of a caching phase and a delivery phase, both of which are carefully designed in order to use the cache and the channel together optimally. In prior literature, various combinatorial structures have been used to construct coded caching schemes. One of the chief drawbacks of many of these existing constructions is the large subpacketization level, which denotes the number of times a file should be split for the schemes to provide coding gain. In this work, using a new binary matrix model, we present several novel constructions for coded caching based on the various types of combinatorial designs and their $q$-analogs, which are also called subspace designs. While most of the schemes constructed in this work (based on existing designs) have a high cache requirement, they provide a rate that is either constant or decreasing, and moreover require competitively small levels of subpacketization, which is an extremely important feature in practical applications of coded caching. We also apply our constructions to the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. Using our binary matrix framework, we present a new simple generic coded data shuffling scheme. Employing our designs-based constructions in conjunction with this new shuffling scheme, we obtain new coded computing schemes which have low file complexity, with marginally higher communication load compared to the optimal scheme for equivalent parameters. We show that our schemes can neatly extend to the scenario with full and partial stragglers also. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2001.05438, arXiv:1901.06383

arXiv:2208.10389 [pdf, other]

Bounding the Optimal Length of Pliable Index Coding via a Hypergraph-based Approach

Authors: Tulasi Sowjanya B., Visvesh Subramanian, Prasad Krishnan

Abstract: In pliable index coding (PICOD), a number of clients are connected via a noise-free broadcast channel to a server which has a list of messages. Each client has a unique subset of messages at the server as side-information and requests for any one message not in the side-information. A PICOD scheme of length $\ell$ is a set of $\ell$ encoded transmissions broadcast from the server such that all cli… ▽ More In pliable index coding (PICOD), a number of clients are connected via a noise-free broadcast channel to a server which has a list of messages. Each client has a unique subset of messages at the server as side-information and requests for any one message not in the side-information. A PICOD scheme of length $\ell$ is a set of $\ell$ encoded transmissions broadcast from the server such that all clients are satisfied. Finding the optimal (minimum) length of PICOD and designing PICOD schemes that have small length are the fundamental questions in PICOD. In this paper, we use a hypergraph-based approach to derive new achievability and converse results for PICOD. We present an algorithm which gives an achievable scheme for PICOD with length at most $Δ(\mathcal{H})$, where $Δ(\mathcal{H})$ is the maximum degree of any vertex in a hypergraph that represents the PICOD problem. We also give a lower bound for the optimal PICOD length using a new structural parameter associated with the PICOD hypergraph called the nesting number. We extend some of our results to the PICOD problem where each client demands $t$ messages, rather than just one. Finally, we identify a class of problems for which our converse is tight, and also characterize the optimal PICOD lengths of problems with $Δ(\mathcal{H})\in\{1,2,3\}$. △ Less

Submitted 26 December, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: Accepted at the IEEE Information Theory Workshop, 2022

arXiv:2205.06257 [pdf, ps, other]

Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage

Authors: Abhinav Vaishya, Athreya Chandramouli, Srikar Kale, Prasad Krishnan

Abstract: We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed databases. When existing nodes leave or new nodes are added to this system, the balanced nature of the database is lost, either due to the reduction in the replica… ▽ More We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed databases. When existing nodes leave or new nodes are added to this system, the balanced nature of the database is lost, either due to the reduction in the replication factor, or the non-uniformity of the storage at the nodes. This triggers a rebalancing algorithm, that exchanges data between the nodes so that the balance of the database is reinstated. The goal is then to design rebalancing schemes with minimal communication load. In a recent work by Krishnan et al., coded transmissions were used to rebalance a carefully designed distributed database from a node removal or addition. These coded rebalancing schemes have optimal communication load, however, require the file-size to be at least exponential in the system parameters. In this work, we consider a cyclic balanced database (where data is cyclically placed in the system nodes) and present coded rebalancing schemes for node removal and addition in such a database. These databases (and the associated rebalancing schemes) require the file-size to be only cubic in the number of nodes in the system. We bound the advantage of our node removal rebalancing scheme over the uncoded scheme, and show that our scheme has a smaller communication load. In the node addition scenario, the rebalancing scheme presented is a simple uncoded scheme, which we show has optimal load. Finally, we derive a lower bound for the single node-removal rebalancing for the specific choice of data placements specified by our achievable rebalancing schemes, and show that our achievable rebalancing loads are within a multiplicative gap from the lower bound obtained. △ Less

Submitted 12 December, 2024; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: 37 pages, updated previous version with new results

arXiv:2205.03259 [pdf]

Decentralized Digital Currency System using Merkle Hash Trees

Authors: Shreekanth M Prabhu, Natarajan Subramanyam, Ms. Shreya P Krishnan, Ms. Brindavana Sachidananda

Abstract: In India, post the demonetization exercise in 2016, digital payments have become extremely popular. Among them, the volume of transactions using Paytm wallets and UPI (Unified Payment Interface) have grown manifold. The lockdowns due to COVID-19 Pandemic have furthered this trend. Side by side, crypto-currencies such as bitcoin are also gaining traction. Many countries are considering issuing a Di… ▽ More In India, post the demonetization exercise in 2016, digital payments have become extremely popular. Among them, the volume of transactions using Paytm wallets and UPI (Unified Payment Interface) have grown manifold. The lockdowns due to COVID-19 Pandemic have furthered this trend. Side by side, crypto-currencies such as bitcoin are also gaining traction. Many countries are considering issuing a Digital Currency via their Central Banks. In this paper, we propose a novel Decentralized Digital Currency System (DDCS) that makes use of Merkle Hash-Trees as Authenticated Data Structures. DDCS uses a Ledger-less, distributed, peer-to-peer architecture. We name the proposed currency $δ$-Money. $δ$-Money is intended as a replacement for physical currency and has in-built security features that rival crypto-currencies. Transactions using $δ$-Money happen in a disintermediated manner but with post-facto reconciliation. In place of Central Bank-issued Digital Currency (CBDC), we envisage a scenario where multiple Payment Banks issue digital currencies that have stable valuations without being subject to either volatility or perennial devaluation. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 37 pages, 9 Figures, 8 Tables, submitted to Journal of Banking and Financial Technology

arXiv:2202.09981 [pdf, other]

Berman Codes: A Generalization of Reed-Muller Codes that Achieve BEC Capacity

Authors: Lakshmi Prasad Natarajan, Prasad Krishnan

Abstract: We identify a family of binary codes whose structure is similar to Reed-Muller (RM) codes and which include RM codes as a strict subclass. The codes in this family are denoted as $C_n(r,m)$, and their duals are denoted as $B_n(r,m)$. The length of these codes is $n^m$, where $n \geq 2$, and $r$ is their `order'. When $n=2$, $C_n(r,m)$ is the RM code of order $r$ and length $2^m$. The special case… ▽ More We identify a family of binary codes whose structure is similar to Reed-Muller (RM) codes and which include RM codes as a strict subclass. The codes in this family are denoted as $C_n(r,m)$, and their duals are denoted as $B_n(r,m)$. The length of these codes is $n^m$, where $n \geq 2$, and $r$ is their `order'. When $n=2$, $C_n(r,m)$ is the RM code of order $r$ and length $2^m$. The special case of these codes corresponding to $n$ being an odd prime was studied by Berman (1967) and Blackmore and Norton (2001). Following the terminology introduced by Blackmore and Norton, we refer to $B_n(r,m)$ as the Berman code and $C_n(r,m)$ as the dual Berman code. We identify these codes using a recursive Plotkin-like construction, and we show that these codes have a rich automorphism group, they are generated by the minimum weight codewords, and that they can be decoded up to half the minimum distance efficiently. Using a result of Kumar et al. (2016), we show that these codes achieve the capacity of the binary erasure channel (BEC) under bit-MAP decoding. Furthermore, except double transitivity, they satisfy all the code properties used by Reeves and Pfister to show that RM codes achieve the capacity of binary-input memoryless symmetric channels. Finally, when $n$ is odd, we identify a large class of abelian codes that includes $B_n(r,m)$ and $C_n(r,m)$ and which achieves BEC capacity. △ Less

Submitted 25 July, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: Accepted for publication in the IEEE Transactions on Information Theory

arXiv:2202.04161 [pdf, other]

Logical Reasoning for Task Oriented Dialogue Systems

Authors: Sajjad Beygi, Maryam Fazel-Zarandi, Alessandra Cervone, Prakash Krishnan, Siddhartha Reddy Jonnalagadda

Abstract: In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In th… ▽ More In recent years, large pretrained models have been used in dialogue systems to improve successful task completion rates. However, lack of reasoning capabilities of dialogue platforms make it difficult to provide relevant and fluent responses, unless the designers of a conversational experience spend a considerable amount of time implementing these capabilities in external rule based modules. In this work, we propose a novel method to fine-tune pretrained transformer models such as Roberta and T5. to reason over a set of facts in a given dialogue context. Our method includes a synthetic data generation mechanism which helps the model learn logical relations, such as comparison between list of numerical values, inverse relations (and negation), inclusion and exclusion for categorical attributes, and application of a combination of attributes over both numerical and categorical values, and spoken form for numerical values, without need for additional training dataset. We show that the transformer based model can perform logical reasoning to answer questions when the dialogue context contains all the required information, otherwise it is able to extract appropriate constraints to pass to downstream components (e.g. a knowledge base) when partial information is available. We observe that transformer based models such as UnifiedQA-T5 can be fine-tuned to perform logical reasoning (such as numerical and categorical attributes' comparison) over attributes that been seen in training time (e.g., accuracy of 90\%+ for comparison of smaller than $k_{\max}$=5 values over heldout test dataset). △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2106.14516 [pdf, other]

A Diffeomorphic Aging Model for Adult Human Brain from Cross-Sectional Data

Authors: Alphin J Thottupattu, Jayanthi Sivaswamy, Venkateswaran P. Krishnan

Abstract: Normative aging trends of the brain can serve as an important reference in the assessment of neurological structural disorders. Such models are typically developed from longitudinal brain image data -- follow-up data of the same subject over different time points. In practice, obtaining such longitudinal data is difficult. We propose a method to develop an aging model for a given population, in th… ▽ More Normative aging trends of the brain can serve as an important reference in the assessment of neurological structural disorders. Such models are typically developed from longitudinal brain image data -- follow-up data of the same subject over different time points. In practice, obtaining such longitudinal data is difficult. We propose a method to develop an aging model for a given population, in the absence of longitudinal data, by using images from different subjects at different time points, the so-called cross-sectional data. We define an aging model as a diffeomorphic deformation on a structural template derived from the data and propose a method that develops topology preserving aging model close to natural aging. The proposed model is successfully validated on two public cross-sectional datasets which provide templates constructed from different sets of subjects at different age points. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.10997 [pdf, other]

Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge

Authors: Neeraj Kumar Sharma, Ananya Muguli, Prashant Krishnan, Rohit Kumar, Srikanth Raj Chetupalli, Sriram Ganapathy

Abstract: The technology development for point-of-care tests (POCTs) targeting respiratory diseases has witnessed a growing demand in the recent past. Investigating the presence of acoustic biomarkers in modalities such as cough, breathing and speech sounds, and using them for building POCTs can offer fast, contactless and inexpensive testing. In view of this, over the past year, we launched the ``Coswara''… ▽ More The technology development for point-of-care tests (POCTs) targeting respiratory diseases has witnessed a growing demand in the recent past. Investigating the presence of acoustic biomarkers in modalities such as cough, breathing and speech sounds, and using them for building POCTs can offer fast, contactless and inexpensive testing. In view of this, over the past year, we launched the ``Coswara'' project to collect cough, breathing and speech sound recordings via worldwide crowdsourcing. With this data, a call for development of diagnostic tools was announced in the Interspeech 2021 as a special session titled ``Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge''. The goal was to bring together researchers and practitioners interested in developing acoustics-based COVID-19 POCTs by enabling them to work on the same set of development and test datasets. As part of the challenge, datasets with breathing, cough, and speech sound samples from COVID-19 and non-COVID-19 individuals were released to the participants. The challenge consisted of two tracks. The Track-1 focused only on cough sounds, and participants competed in a leaderboard setting. In Track-2, breathing and speech samples were provided for the participants, without a competitive leaderboard. The challenge attracted 85 plus registrations with 29 final submissions for Track-1. This paper describes the challenge (datasets, tasks, baseline system), and presents a focused summary of the various systems submitted by the participating teams. An analysis of the results from the top four teams showed that a fusion of the scores from these teams yields an area-under-the-curve of 95.1% on the blind test data. By summarizing the lessons learned, we foresee the challenge overview in this paper to help accelerate technology for acoustic-based POCTs. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: Manuscript in review in the Elsevier Computer Speech and Language journal

arXiv:2106.08385 [pdf, other]

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

Authors: Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, Tal Hassner

Abstract: We present a novel approach for disentangling the content of a text image from all aspects of its appearance. The appearance representation we derive can then be applied to new content, for one-shot transfer of the source style to new content. We learn this disentanglement in a self-supervised manner. Our method processes entire word boxes, without requiring segmentation of text from background, p… ▽ More We present a novel approach for disentangling the content of a text image from all aspects of its appearance. The appearance representation we derive can then be applied to new content, for one-shot transfer of the source style to new content. We learn this disentanglement in a self-supervised manner. Our method processes entire word boxes, without requiring segmentation of text from background, per-character processing, or making assumptions on string lengths. We show results in different text domains which were previously handled by specialized methods, e.g., scene text, handwritten text. To these ends, we make a number of technical contributions: (1) We disentangle the style and content of a textual image into a non-parametric, fixed-dimensional vector. (2) We propose a novel approach inspired by StyleGAN but conditioned over the example style at different resolution and content. (3) We present novel self-supervised training criteria which preserve both source style and target content using a pre-trained font classifier and text recognizer. Finally, (4) we also introduce Imgur5K, a new challenging dataset for handwritten word images. We offer numerous qualitative photo-realistic results of our method. We further show that our method surpasses previous work in quantitative tests on scene text and handwriting datasets, as well as in a user study. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 18 pages, 13 figures

arXiv:2106.00639 [pdf, other]

Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a web-application over a period of ten months. We investigate the use of statistical descriptors of simple time-frequency features for acoustic signals and binary features for the presence of symptoms. Unlike previous works, we primarily focus on the application of simple linear classifiers like logistic regression and support vector machines for acoustic data while decision tree models are employed on the symptoms data. We show that a multi-modal integration of acoustics and symptoms classifiers achieves an area-under-curve (AUC) of 92.40, a significant improvement over any individual modality. Several ablation experiments are also provided which highlight the acoustic and symptom dimensions that are important for the task of COVID-19 diagnostics. △ Less

Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

arXiv:2104.09088 [pdf, other]

Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems

Authors: Anish Acharya, Suranjit Adhikari, Sanchit Agarwal, Vincent Auvray, Nehal Belgamwar, Arijit Biswas, Shubhra Chandra, Tagyoung Chung, Maryam Fazel-Zarandi, Raefer Gabriel, Shuyang Gao, Rahul Goel, Dilek Hakkani-Tur, Jan Jezabek, Abhay Jha, Jiun-Yu Kao, Prakash Krishnan, Peter Ku, Anuj Goyal, Chien-Wei Lin, Qing Liu, Arindam Mandal, Angeliki Metallinou, Vishal Naik, Yi Pan , et al. (6 additional authors not shown)

Abstract: Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and… ▽ More Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Journal ref: NAACL 2021 System Demonstrations Track

arXiv:2104.04132 [pdf, other]

Replay in Deep Learning: Current Approaches and Missing Biological Elements

Authors: Tyler L. Hayes, Giri P. Krishnan, Maxim Bazhenov, Hava T. Siegelmann, Terrence J. Sejnowski, Christopher Kanan

Abstract: Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks th… ▽ More Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this paper, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be utilized to improve artificial neural networks. △ Less

Submitted 28 May, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: Accepted for publication in the MIT Press journal of Neural Computation

arXiv:2103.15992 [pdf, other]

A Multiplexed Network for End-to-End, Multilingual OCR

Authors: Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner

Abstract: Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification… ▽ More Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks, and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks. We believe that our work is a step towards the end-to-end trainable and scalable multilingual multi-purpose OCR system. Our code and model will be released. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.09148 [pdf, other]

DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These recordings were collected via crowdsourcing from multiple countries, through a website application. The challenge features two tracks, one focusing on cough sounds, and the other on using a collection of breath, sustained vowel phonation, and number counting speech recordings. In this paper, we introduce the challenge and provide a detailed description of the task, and present a baseline system for the task. △ Less

Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: To appear in Proceedings of Interspeech, 2021

arXiv:2102.02182 [pdf, other]

Pliable Index Coding via Conflict-Free Colorings of Hypergraphs

Authors: Prasad Krishnan, Rogers Mathew, Subrahmanyam Kalyanasundaram

Abstract: In the pliable index coding (PICOD) problem, a server is to serve multiple clients, each of which possesses a unique subset of the complete message set as side information and requests a new message which it does not have. The goal of the server is to do this using as few transmissions as possible. This work presents a hypergraph coloring approach to the scalar PICOD problem. A \textit{conflict-fr… ▽ More In the pliable index coding (PICOD) problem, a server is to serve multiple clients, each of which possesses a unique subset of the complete message set as side information and requests a new message which it does not have. The goal of the server is to do this using as few transmissions as possible. This work presents a hypergraph coloring approach to the scalar PICOD problem. A \textit{conflict-free coloring} of a hypergraph is known from literature as an assignment of colors to its vertices so that each hyperedge of the graph contains one uniquely colored vertex. For a given PICOD problem represented by a hypergraph consisting of messages as vertices and request-sets as hyperedges, we present achievable PICOD schemes using conflict-free colorings of the PICOD hypergraph. Various graph theoretic parameters arising out of such colorings (and some new coloring variants) then give a number of upper bounds on the optimal PICOD length, which we study in this work. Suppose the PICOD hypergraph has $m$ vertices and $n$ hyperedges, where every hyperedge overlaps with at most $Γ$ other hyperedges. We show easy to implement randomized algorithms for the following: (a) For the single request case, we give a PICOD of length $O(\log^2Γ)$. This result improves over known achievability results for some parameter ranges, (b) For the $t$-request case, we give an MDS code of length $\max(O(\log Γ\log m), O(t \log m))$. Further if the hyperedges (request sets) are sufficiently large, we give a PICOD of the same length as above, which is not based on MDS construction. In general, this gives an improvement over prior achievability results. Our codes are of near-optimal length (up to a multiplicative factor of $\log t$). △ Less

Submitted 26 December, 2022; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: A shorter version has appeared in IEEE International Symposium on Information Theory, 2021

arXiv:2011.14298 [pdf, other]

A method for large diffeomorphic registration via broken geodesics

Authors: Alphin J. Thottupattu, Jayanthi Sivaswamy, Venkateswaran P. Krishnan

Abstract: Anatomical variabilities seen in longitudinal data or inter-subject data is usually described by the underlying deformation, captured by non-rigid registration of these images. Stationary Velocity Field (SVF) based non-rigid registration algorithms are widely used for registration. SVF based methods form a metric-free framework which captures a finite dimensional submanifold of deformations embedd… ▽ More Anatomical variabilities seen in longitudinal data or inter-subject data is usually described by the underlying deformation, captured by non-rigid registration of these images. Stationary Velocity Field (SVF) based non-rigid registration algorithms are widely used for registration. SVF based methods form a metric-free framework which captures a finite dimensional submanifold of deformations embedded in the infinite dimensional smooth manifold of diffeomorphisms. However, these methods cover only a limited degree of deformations. In this paper, we address this limitation and define an approximate metric space for the manifold of diffeomorphisms $\mathcal{G}$. We propose a method to break down the large deformation into finite compositions of small deformations. This results in a broken geodesic path on $\mathcal{G}$ and its length now forms an approximate registration metric. We illustrate the method using a simple, intensity-based, log-demon implementation. Validation results of the proposed method show that it can capture large and complex deformations while producing qualitatively better results than the state-of-the-art methods. The results also demonstrate that the proposed registration metric is a good indicator of the degree of deformation. △ Less

Submitted 3 January, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: 18 pages and 9 figures

arXiv:2010.14411 [pdf, other]

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

Authors: Siddhant Bansal, Praveen Krishnan, C. V. Jawahar

Abstract: We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a t… ▽ More We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10 percent in terms of word recognition accuracy. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: 8 pages, 6 figures, Accepted in International Conference on Pattern Recognition (ICPR) 2020

arXiv:2010.11935 [pdf, other]

Coded Data Rebalancing for Decentralized Distributed Databases

Authors: K V Sushena Sree, Prasad Krishnan

Abstract: The performance of replication-based distributed databases is affected due to non-uniform storage across storage nodes (also called \textit{data skew}) and reduction in the replication factor during operation, particularly due to node additions or removals. Data rebalancing refers to the communication involved between the nodes in correcting this data skew, while maintaining the replication factor… ▽ More The performance of replication-based distributed databases is affected due to non-uniform storage across storage nodes (also called \textit{data skew}) and reduction in the replication factor during operation, particularly due to node additions or removals. Data rebalancing refers to the communication involved between the nodes in correcting this data skew, while maintaining the replication factor. For carefully designed distributed databases, transmitting coded symbols during the rebalancing phase has been recently shown to reduce the communication load of rebalancing. In this work, we look at balanced distributed databases with \textit{random placement}, in which each data segment is stored in a random subset of $r$ nodes in the system, where $r$ refers to the replication factor of the distributed database. We call these as decentralized databases. For a natural class of such decentralized databases, we propose rebalancing schemes for correcting data skew and the reduction in the replication factor arising due to a single node addition or removal. We give converse arguments which show that our proposed rebalancing schemes are optimal asymptotically in the size of the file. △ Less

Submitted 12 November, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

Comments: 10 pages

arXiv:2010.10464 [pdf, other]

Blind Updates in Coded Caching

Authors: Suman Ghosh, Prasad Krishnan, Lakshmi Prasad Natarajan

Abstract: We consider the centralized coded caching system where a library of files is available at the server and their subfiles are cached at the clients as prescribed by a placement delivery array (PDA). We are interested in the problem where a specific file in the library is replaced with a new file at the server, the contents of which are correlated with the file being replaced, and this change needs t… ▽ More We consider the centralized coded caching system where a library of files is available at the server and their subfiles are cached at the clients as prescribed by a placement delivery array (PDA). We are interested in the problem where a specific file in the library is replaced with a new file at the server, the contents of which are correlated with the file being replaced, and this change needs to be communicated to the caches. Upon replacement, the server has access only to the updated file and is unaware of its differences with the original, while each cache has access to specific subfiles of the original file as dictated by the PDA. We model the correlation between the two files by assuming that they differ in at the most $ε$ subfiles, and aim to reduce the number of bits broadcast by the server to update the caches. We design a new elegant coded transmission strategy for the server to update the caches blindly, and also identify a simple scheme that is based on MDS codes. We then derive converse bounds on the minimum communication cost $\ell^*$ among all linear strategies. For two well-known families of PDAs -- Maddah-Ali & Niesen's caching scheme and a PDA by Tang & Ramamoorthy and Yan et al. -- our new scheme has cost $\ell^*(1 + o(1))$ when the updates are sufficiently sparse, while the scheme using MDS codes has order-optimal cost when the updates are dense. △ Less

Submitted 15 May, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Shorter version was accepted and presented in ITW 2020, Riva del Garda, Italy. Changes with respect to arXiv:2010.10464v1 -- improved presentation and corrected minor errors. Keywords: blind update, broadcast channel, coded caching, communication cost, placement delivery array

arXiv:2010.10459 [pdf, ps, other]

doi 10.3390/e23080985

An Umbrella Converse for Data Exchange: Applied to Caching, Computing, and Shuffling

Authors: Prasad Krishnan, Lakshmi Natarajan, V. Lalitha

Abstract: The problem of data exchange between multiple nodes with storage and communication capabilities models several current multi-user communication problems like Coded Caching, Data Shuffling, Coded Computing, etc. The goal in such problems is to design communication schemes which accomplish the desired data exchange between the nodes with the optimal (minimum) amount of communication load. In this wo… ▽ More The problem of data exchange between multiple nodes with storage and communication capabilities models several current multi-user communication problems like Coded Caching, Data Shuffling, Coded Computing, etc. The goal in such problems is to design communication schemes which accomplish the desired data exchange between the nodes with the optimal (minimum) amount of communication load. In this work, we present a converse to such a general data exchange problem. The expression of the converse depends only on the number of bits to be moved between different subsets of nodes, and does not assume anything further specific about the parameters in the problem. Specific problem formulations, such as those in Coded Caching, Coded Data Shuffling, Coded Distributed Computing, can be seen as instances of this generic data exchange problem. Applying our generic converse, we are able to efficiently recover known important converses in these formulations. Further, for a generic coded caching problem with heterogeneous cache sizes at the clients with or without a central server, we obtain a new general converse, which subsumes some existing results. Finally we relate a `centralized' version of our bound to the known generalized independence number bound in index coding, and discuss our bound's tightness in this context. △ Less

Submitted 8 June, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 32 pages, refined some sections over previous version (shorter version appeared in ITW 2020)

arXiv:2009.10801 [pdf, ps, other]

DeepIaC: Deep Learning-Based Linguistic Anti-pattern Detection in IaC

Authors: Nemania Borovits, Indika Kumara, Parvathy Krishnan, Stefano Dalla Palma, Dario Di Nucci, Fabio Palomba, Damian A. Tamburri, Willem-Jan van den Heuvel

Abstract: Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we conside… ▽ More Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between0.785and0.915in detecting inconsistencies △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: 6 pages

arXiv:2008.04527 [pdf, other]

Neural PLDA Modeling for End-to-End Speaker Verification

Authors: Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

Abstract: While deep learning models have made significant advances in supervised classification problems, the application of these models for out-of-set verification tasks like speaker recognition has been limited to deriving feature embeddings. The state-of-the-art x-vector PLDA based speaker verification systems use a generative model based on probabilistic linear discriminant analysis (PLDA) for computi… ▽ More While deep learning models have made significant advances in supervised classification problems, the application of these models for out-of-set verification tasks like speaker recognition has been limited to deriving feature embeddings. The state-of-the-art x-vector PLDA based speaker verification systems use a generative model based on probabilistic linear discriminant analysis (PLDA) for computing the verification score. Recently, we had proposed a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA) where the likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. In this paper, we extend this work to achieve joint optimization of the embedding neural network (x-vector network) with the NPLDA network in an end-to-end (E2E) fashion. This proposed end-to-end model is optimized directly from the acoustic features with a verification cost function and during testing, the model directly outputs the likelihood ratio score. With various experiments using the NIST speaker recognition evaluation (SRE) 2018 and 2019 datasets, we show that the proposed E2E model improves significantly over the x-vector PLDA baseline speaker verification system. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: Accepted in Interspeech 2020. GitHub Implementation Repos: https://github.com/iiscleap/E2E-NPLDA and https://github.com/iiscleap/NeuralPlda

arXiv:2007.14319 [pdf, other]

Coding Practices and Recommendations of Spring Security for Enterprise Applications

Authors: Mazharul Islam, Sazzadur Rahaman, Na Meng, Behnaz Hassanshahi, Padmanabhan Krishnan, Danfeng, Yao

Abstract: Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by condu… ▽ More Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by conducting a measurement-based approach on 28 Spring applications. Our analysis shows that security risks associated with the identified security anti-patterns and insecure defaults can leave the enterprise application vulnerable to a wide range of high-risk attacks. To prevent these high-risk attacks, we also provide recommendations for practitioners. Consequently, our study has contributed one update to the official Spring security documentation while other security issues identified in this study are being considered for future major releases by Spring security community. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Journal ref: IEEE Secure Development Conference. Atlanta, GA, September 2020

arXiv:2007.06021 [pdf, other]

NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling

Authors: Shareef Babu Kalluri, Deepu Vijayasenan, Sriram Ganapathy, Ragesh Rajan M, Prashant Krishnan

Abstract: Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have onl… ▽ More Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: 5pages, Initial version submitted to Interspeech2020

arXiv:2007.00166 [pdf, other]

doi 10.1007/978-3-030-57058-3_22

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Authors: Siddhant Bansal, Praveen Krishnan, C. V. Jawahar

Abstract: Recognition and retrieval of textual content from the large document collections have been a powerful use case for the document image analysis community. Often the word is the basic unit for recognition as well as retrieval. Systems that rely only on the text recogniser (OCR) output are not robust enough in many situations, especially when the word recognition rates are poor, as in the case of his… ▽ More Recognition and retrieval of textual content from the large document collections have been a powerful use case for the document image analysis community. Often the word is the basic unit for recognition as well as retrieval. Systems that rely only on the text recogniser (OCR) output are not robust enough in many situations, especially when the word recognition rates are poor, as in the case of historic documents or digital libraries. An alternative has been word spotting based methods that retrieve/match words based on a holistic representation of the word. In this paper, we fuse the noisy output of text recogniser with a deep embeddings representation derived out of the entire word. We use average and max fusion for improving the ranked results in the case of retrieval. We validate our methods on a collection of Hindi documents. We improve word recognition rate by 1.4 and retrieval by 11.13 in the mAP. △ Less

Submitted 30 June, 2020; originally announced July 2020.

Comments: 15 pages, 8 figures, Accepted in IAPR International Workshop on Document Analysis Systems (DAS) 2020, "Visit project page, at http://cvit.iiit.ac.in/research/projects/cvit-projects/fused-text-recogniser-and-deep-embeddings-improve-word-recognition-and-retrieval"

arXiv:2005.10548 [pdf, other]

doi 10.21437/Interspeech.2020-2768

Coswara -- A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

Authors: Neeraj Sharma, Prashant Krishnan, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy

Abstract: The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is the reverse transcription polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for… ▽ More The COVID-19 pandemic presents global challenges transcending boundaries of country, race, religion, and economy. The current gold standard method for COVID-19 detection is the reverse transcription polymerase chain reaction (RT-PCR) testing. However, this method is expensive, time-consuming, and violates social distancing. Also, as the pandemic is expected to stay for a while, there is a need for an alternate diagnosis tool which overcomes these limitations, and is deployable at a large scale. The prominent symptoms of COVID-19 include cough and breathing difficulties. We foresee that respiratory sounds, when analyzed using machine learning techniques, can provide useful insights, enabling the design of a diagnostic tool. Towards this, the paper presents an early effort in creating (and analyzing) a database, called Coswara, of respiratory sounds, namely, cough, breath, and voice. The sound samples are collected via worldwide crowdsourcing using a website application. The curated dataset is released as open access. As the pandemic is evolving, the data collection and analysis is a work in progress. We believe that insights from analysis of Coswara can be effective in enabling sound based technology solutions for point-of-care diagnosis of respiratory infection, and in the near future this can help to diagnose COVID-19. △ Less

Submitted 11 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: A description of Coswara dataset to evaluate COVID-19 diagnosis using respiratory sounds

arXiv:2004.06292 [pdf, other]

Gelato: Feedback-driven and Guided Security Analysis of Client-side Web Applications

Authors: Behnaz Hassanshahi, Hyunjun Lee, Paddy Krishnan, Jörn Güy Suß

Abstract: Even though a lot of effort has been invested in analyzing client-side web applications during the past decade, the existing tools often fail to deal with the complexity of modern JavaScript applications. However, from an attacker point of view, the client side of such web applications can reveal invaluable information about the server side. In this paper, first we study the existing tools and enu… ▽ More Even though a lot of effort has been invested in analyzing client-side web applications during the past decade, the existing tools often fail to deal with the complexity of modern JavaScript applications. However, from an attacker point of view, the client side of such web applications can reveal invaluable information about the server side. In this paper, first we study the existing tools and enumerate the most crucial features a security-aware client-side analysis should be supporting. Next, we propose GELATO to detect vulnerabilities in modern client-side JavaScript applications that are built upon complex libraries and frameworks. In particular, we take the first step in closing the gap between state-aware crawling and client-side security analysis by proposing a feedback-driven security-aware guided crawler that is able to analyze complex frameworks automatically, and increase the coverage of security-sensitive parts of the program efficiently. Moreover, we propose a new lightweight client-side taint analysis that outperforms the start-of-the-art tools, requires no modification to browsers, and reports non-trivial taint flows on modern JavaScript applications. △ Less

Submitted 13 April, 2020; originally announced April 2020.

Comments: 15 pages, 2 figures, 5 algorithms, 5 listings, 7 tables

arXiv:2002.03562 [pdf, other]

doi 10.21437/Odyssey.2020-29

NPLDA: A Deep Neural PLDA Model for Speaker Verification

Authors: Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

Abstract: The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity f… ▽ More The state-of-art approach for speaker verification consists of a neural network based embedding extractor along with a backend generative model such as the Probabilistic Linear Discriminant Analysis (PLDA). In this work, we propose a neural network approach for backend modeling in speaker recognition. The likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. The proposed model, termed as neural PLDA (NPLDA), is initialized using the generative PLDA model parameters. The loss function for the NPLDA model is an approximation of the minimum detection cost function (DCF). The speaker recognition experiments using the NPLDA model are performed on the speaker verificiation task in the VOiCES datasets as well as the SITW challenge dataset. In these experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system. △ Less

Submitted 24 May, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: Published in Odyssey 2020, the Speaker and Language Recognition Workshop (VOiCES Special Session). Link to GitHub Implementation: https://github.com/iiscleap/NeuralPlda. arXiv admin note: substantial text overlap with arXiv:2001.07034

Journal ref: in Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, Pages 202-209

arXiv:2002.02735 [pdf, other]

doi 10.21437/Odyssey.2020-40

LEAP System for SRE19 CTS Challenge -- Improvements and Error Analysis

Authors: Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh, Sriram Ganapathy

Abstract: The NIST Speaker Recognition Evaluation - Conversational Telephone Speech (CTS) challenge 2019 was an open evaluation for the task of speaker verification in challenging conditions. In this paper, we provide a detailed account of the LEAP SRE system submitted to the CTS challenge focusing on the novel components in the back-end system modeling. All the systems used the time-delay neural network (T… ▽ More The NIST Speaker Recognition Evaluation - Conversational Telephone Speech (CTS) challenge 2019 was an open evaluation for the task of speaker verification in challenging conditions. In this paper, we provide a detailed account of the LEAP SRE system submitted to the CTS challenge focusing on the novel components in the back-end system modeling. All the systems used the time-delay neural network (TDNN) based x-vector embeddings. The x-vector system in our SRE19 submission used a large pool of training speakers (about 14k speakers). Following the x-vector extraction, we explored a neural network approach to backend score computation that was optimized for a speaker verification cost. The system combination of generative and neural PLDA models resulted in significant improvements for the SRE evaluation dataset. We also found additional gains for the SRE systems based on score normalization and calibration. Subsequent to the evaluations, we have performed a detailed analysis of the submitted systems. The analysis revealed the incremental gains obtained for different training dataset combinations as well as the modeling methods. △ Less

Submitted 24 May, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: Published In Proc. Odyssey 2020, the Speaker and Language Recognition Workshop. Link to GitHub Implementation: https://github.com/iiscleap/NeuralPlda

Journal ref: in Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 281--288

arXiv:2001.07034 [pdf, other]

Pairwise Discriminative Neural PLDA for Speaker Verification

Authors: Shreyas Ramoji, Prashant Krishnan V, Prachi Singh, Sriram Ganapathy

Abstract: The state-of-art approach to speaker verification involves the extraction of discriminative embeddings like x-vectors followed by a generative model back-end using a probabilistic linear discriminant analysis (PLDA). In this paper, we propose a Pairwise neural discriminative model for the task of speaker verification which operates on a pair of speaker embeddings such as x-vectors/i-vectors and ou… ▽ More The state-of-art approach to speaker verification involves the extraction of discriminative embeddings like x-vectors followed by a generative model back-end using a probabilistic linear discriminant analysis (PLDA). In this paper, we propose a Pairwise neural discriminative model for the task of speaker verification which operates on a pair of speaker embeddings such as x-vectors/i-vectors and outputs a score that can be considered as a scaled log-likelihood ratio. We construct a differentiable cost function which approximates speaker verification loss, namely the minimum detection cost. The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training. We also explore regularization techniques to prevent overfitting, which is a major concern in using discriminative back-end models for verification tasks. The experiments are performed on the NIST SRE 2018 development and evaluation datasets. We observe average relative improvements of 8% in CMN2 condition and 30% in VAST condition over the PLDA baseline system. △ Less

Submitted 7 February, 2020; v1 submitted 20 January, 2020; originally announced January 2020.

Comments: This paper was submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020. Link to GitHub Repository: https://github.com/iiscleap/NeuralPlda

arXiv:2001.05438 [pdf, other]

Low Complexity Distributed Computing via Binary Matrices with Extension to Stragglers

Authors: Shailja Agrawal, Prasad Krishnan

Abstract: We consider the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. For this framework, we propose the use of binary matrices (with $0,1$ entries) called \textit{computing matrices} to describe the map phase and the shuffle phase. Similar binary matrices were recently proposed for the coded caching framework. The stru… ▽ More We consider the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. For this framework, we propose the use of binary matrices (with $0,1$ entries) called \textit{computing matrices} to describe the map phase and the shuffle phase. Similar binary matrices were recently proposed for the coded caching framework. The structure of ones and zeroes in the binary computing matrix captures the map phase of the MapReduce framework. We present a new simple coded data shuffling scheme for this binary matrix model, based on a \textit{identity submatrix cover} of the computing matrix. This new coded shuffling scheme has in general a larger communication load than existing schemes, but has the advantage of less complexity overhead than the well-known earlier schemes in literature in terms of the file-splitting and associated indexing and coordination required. We also show that there exists a binary matrix based distributed computing scheme with our new data-shuffling scheme which has strictly less than twice than the communication load of the known optimal scheme in literature. The structure of this new scheme enables it to be applied to the framework of MapReduce with stragglers also, in a straightforward manner, borrowing its advantages and disadvantages from the no-straggler situation. Finally, using binary matrices derived from combinatorial designs, we show specific classes of computing schemes with very low \textit{file complexity} (number of subfiles in the file), with marginally higher communication load compared to the optimal scheme for equivalent parameters. △ Less

Submitted 30 January, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: 8 pages, Submitted to ISIT 2020

arXiv:2001.04939 [pdf, other]

Coded Data Rebalancing: Fundamental Limits and Constructions

Authors: Prasad Krishnan, V. Lalitha, Lakshmi Natarajan

Abstract: Distributed databases often suffer unequal distribution of data among storage nodes, which is known as `data skew'. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the database. Data skew leads to performance degradations and \textcolor{black}{thus} necessitates `rebalancing' at regular intervals to reduce the amount of skew. We… ▽ More Distributed databases often suffer unequal distribution of data among storage nodes, which is known as `data skew'. Data skew arises from a number of causes such as removal of existing storage nodes and addition of new empty nodes to the database. Data skew leads to performance degradations and \textcolor{black}{thus} necessitates `rebalancing' at regular intervals to reduce the amount of skew. We define an $r$-balanced distributed database as a distributed database in which the storage across the nodes has uniform size, and each bit of the data is replicated in $r$ distinct storage nodes. We consider the problem of designing such balanced databases along with associated rebalancing schemes which maintain the $r$-balanced property under node removal and addition operations. We present a class of $r$-balanced databases (parameterized by the number of storage nodes) which have the property of structural invariance, i.e., the databases designed for different number of storage nodes have the same essential structure. For this class of $r$-balanced databases, we present rebalancing schemes which use coded transmissions between storage nodes, and characterize their communication loads under node addition and removal. We show that the communication cost incurred to rebalance our distributed database for node addition and removal is optimal, i.e., it achieves the minimum possible cost among all possible balanced distributed databases and rebalancing schemes. △ Less

Submitted 13 July, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: 12 pages, 4 figures, To appear in Proceedings of the IEEE ISIT 2020. A video presentation of this paper is available at https://www.youtube.com/watch?v=a2fVfKiXnOY

Showing 1–50 of 71 results for author: Krishnan, P