-
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
Authors:
Sagnik Bhattacharya,
Abhiram Gorle,
Ahsan Bilal,
Connor Ding,
Amit Kumar Singh Yadav,
Tsachy Weissman
Abstract:
Generative modeling of non-negative, discrete data, such as symbolic music, remains challenging due to two persistent limitations in existing methods. Firstly, many approaches rely on modeling continuous embeddings, which is suboptimal for inherently discrete data distributions. Secondly, most models optimize variational bounds rather than exact data likelihood, resulting in inaccurate likelihood…
▽ More
Generative modeling of non-negative, discrete data, such as symbolic music, remains challenging due to two persistent limitations in existing methods. Firstly, many approaches rely on modeling continuous embeddings, which is suboptimal for inherently discrete data distributions. Secondly, most models optimize variational bounds rather than exact data likelihood, resulting in inaccurate likelihood estimates and degraded sampling quality. While recent diffusion-based models have addressed these issues separately, we tackle them jointly. In this work, we introduce the Information-Theoretic Discrete Poisson Diffusion Model (ItDPDM), inspired by photon arrival process, which combines exact likelihood estimation with fully discrete-state modeling. Central to our approach is an information-theoretic Poisson Reconstruction Loss (PRL) that has a provable exact relationship with the true data likelihood. ItDPDM achieves improved likelihood and sampling performance over prior discrete and continuous diffusion models on a variety of synthetic discrete datasets. Furthermore, on real-world datasets such as symbolic music and images, ItDPDM attains superior likelihood estimates and competitive generation quality-demonstrating a proof of concept for distribution-robust discrete generative modeling.
△ Less
Submitted 27 May, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Speedy MASt3R
Authors:
Jingxing Li,
Yongjae Lee,
Abhay Kumar Yadav,
Cheng Peng,
Rama Chellappa,
Deliang Fan
Abstract:
Image matching is a key component of modern 3D vision algorithms, essential for accurate scene reconstruction and localization. MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme that accelerates matching by orders of magnitude while preserving theoretical guarantees. This approach has gained strong traction, with DUSt3R and MASt3R c…
▽ More
Image matching is a key component of modern 3D vision algorithms, essential for accurate scene reconstruction and localization. MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme that accelerates matching by orders of magnitude while preserving theoretical guarantees. This approach has gained strong traction, with DUSt3R and MASt3R collectively cited over 250 times in a short span, underscoring their impact. However, despite its accuracy, MASt3R's inference speed remains a bottleneck. On an A40 GPU, latency per image pair is 198.16 ms, mainly due to computational overhead from the ViT encoder-decoder and Fast Reciprocal Nearest Neighbor (FastNN) matching.
To address this, we introduce Speedy MASt3R, a post-training optimization framework that enhances inference efficiency while maintaining accuracy. It integrates multiple optimization techniques, including FlashMatch-an approach leveraging FlashAttention v2 with tiling strategies for improved efficiency, computation graph optimization via layer and tensor fusion having kernel auto-tuning with TensorRT (GraphFusion), and a streamlined FastNN pipeline that reduces memory access time from quadratic to linear while accelerating block-wise correlation scoring through vectorized computation (FastNN-Lite). Additionally, it employs mixed-precision inference with FP16/FP32 hybrid computations (HybridCast), achieving speedup while preserving numerical precision. Evaluated on Aachen Day-Night, InLoc, 7-Scenes, ScanNet1500, and MegaDepth1500, Speedy MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy. This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Virtual Trial Room with Computer Vision and Machine Learning
Authors:
Tulashi Prasad Joshi,
Amrendra Kumar Yadav,
Arjun Chhetri,
Suraj Agrahari,
Umesh Kanta Ghimire
Abstract:
Online shopping has revolutionized the retail industry, providing customers with convenience and accessibility. However, customers often hesitate to purchase wearable products such as watches, jewelry, glasses, shoes, and clothes due to the lack of certainty regarding fit and suitability. This leads to significant return rates, causing problems for both customers and vendors. To address this issue…
▽ More
Online shopping has revolutionized the retail industry, providing customers with convenience and accessibility. However, customers often hesitate to purchase wearable products such as watches, jewelry, glasses, shoes, and clothes due to the lack of certainty regarding fit and suitability. This leads to significant return rates, causing problems for both customers and vendors. To address this issue, a platform called the Virtual Trial Room with Computer Vision and Machine Learning is designed which enables customers to easily check whether a product will fit and suit them or not. To achieve this, an AI-generated 3D model of the human head was created from a single 2D image using the DECA model. This 3D model was then superimposed with a custom-made 3D model of glass which is based on real-world measurements and fitted over the human head. To replicate the real-world look and feel, the model was retouched with textures, lightness, and smoothness. Furthermore, a full-stack application was developed utilizing various fornt-end and back-end technologies. This application enables users to view 3D-generated results on the website, providing an immersive and interactive experience.
△ Less
Submitted 17 December, 2024; v1 submitted 14 December, 2024;
originally announced December 2024.
-
Comparative Analysis of ASR Methods for Speech Deepfake Detection
Authors:
Davide Salvi,
Amit Kumar Singh Yadav,
Kratika Bhagtani,
Viola Negroni,
Paolo Bestagini,
Edward J. Delp
Abstract:
Recent techniques for speech deepfake detection often rely on pre-trained self-supervised models. These systems, initially developed for Automatic Speech Recognition (ASR), have proved their ability to offer a meaningful representation of speech signals, which can benefit various tasks, including deepfake detection. In this context, pre-trained models serve as feature extractors and are used to ex…
▽ More
Recent techniques for speech deepfake detection often rely on pre-trained self-supervised models. These systems, initially developed for Automatic Speech Recognition (ASR), have proved their ability to offer a meaningful representation of speech signals, which can benefit various tasks, including deepfake detection. In this context, pre-trained models serve as feature extractors and are used to extract embeddings from input speech, which are then fed to a binary speech deepfake detector. The remarkable accuracy achieved through this approach underscores a potential relationship between ASR and speech deepfake detection. However, this connection is not yet entirely clear, and we do not know whether improved performance in ASR corresponds to higher speech deepfake detection capabilities. In this paper, we address this question through a systematic analysis. We consider two different pre-trained self-supervised ASR models, Whisper and Wav2Vec 2.0, and adapt them for the speech deepfake detection task. These models have been released in multiple versions, with increasing number of parameters and enhanced ASR performance. We investigate whether performance improvements in ASR correlate with improvements in speech deepfake detection. Our results provide insights into the relationship between these two tasks and offer valuable guidance for the development of more effective speech deepfake detectors.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
Authors:
Kratika Bhagtani,
Amit Kumar Singh Yadav,
Paolo Bestagini,
Edward J. Delp
Abstract:
Diffusion-based speech generators are ubiquitous. These methods can generate very high quality synthetic speech and several recent incidents report their malicious use. To counter such misuse, synthetic speech detectors have been developed. Many of these detectors are trained on datasets which do not include diffusion-based synthesizers. In this paper, we demonstrate that existing detectors traine…
▽ More
Diffusion-based speech generators are ubiquitous. These methods can generate very high quality synthetic speech and several recent incidents report their malicious use. To counter such misuse, synthetic speech detectors have been developed. Many of these detectors are trained on datasets which do not include diffusion-based synthesizers. In this paper, we demonstrate that existing detectors trained on one such dataset, ASVspoof2019, do not perform well in detecting synthetic speech from recent diffusion-based synthesizers. We propose the Diffusion-Based Synthetic Speech Dataset (DiffSSD), a dataset consisting of about 200 hours of labeled speech, including synthetic speech generated by 8 diffusion-based open-source and 2 commercial generators. We also examine the performance of existing synthetic speech detectors on DiffSSD in both closed-set and open-set scenarios. The results highlight the importance of this dataset in detecting synthetic speech generated from recent open-source and commercial speech generators.
△ Less
Submitted 2 October, 2024; v1 submitted 19 September, 2024;
originally announced September 2024.
-
Wiretapped Commitment over Binary Channels
Authors:
Anuj Kumar Yadav,
Manideep Mamindlapally,
Amitalok J. Budkuley
Abstract:
We propose the problem of wiretapped commitment, where two parties, say committer Alice and receiver Bob, engage in a commitment protocol using a noisy channel as a resource, in the presence of an eavesdropper, say Eve. Noisy versions of Alice's transmission over the wiretap channel are received at both Bob and Eve. We seek to determine the maximum commitment throughput in the presence of an eaves…
▽ More
We propose the problem of wiretapped commitment, where two parties, say committer Alice and receiver Bob, engage in a commitment protocol using a noisy channel as a resource, in the presence of an eavesdropper, say Eve. Noisy versions of Alice's transmission over the wiretap channel are received at both Bob and Eve. We seek to determine the maximum commitment throughput in the presence of an eavesdropper, i.e., wiretapped commitment capacity, where in addition to the standard security requirements for two-party commitment, one seeks to ensure that Eve doesn't learn about the commit string.
A key interest in this work is to explore the effect of collusion (or lack of it) between the eavesdropper Eve and either Alice or Bob. Toward the same, we present results on the wiretapped commitment capacity under the so-called 1-private regime (when Alice or Bob cannot collude with Eve) and the 2-private regime (when Alice or Bob may possibly collude with Eve).
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
FairSSD: Understanding Bias in Synthetic Speech Detectors
Authors:
Amit Kumar Singh Yadav,
Kratika Bhagtani,
Davide Salvi,
Paolo Bestagini,
Edward J. Delp
Abstract:
Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect…
▽ More
Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer
Authors:
Amit Kumar Singh Yadav,
Ziyue Xiang,
Kratika Bhagtani,
Paolo Bestagini,
Stefano Tubaro,
Edward J. Delp
Abstract:
Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detect…
▽ More
Many deep learning synthetic speech generation tools are readily available. The use of synthetic speech has caused financial fraud, impersonation of people, and misinformation to spread. For this reason forensic methods that can detect synthetic speech have been proposed. Existing methods often overfit on one dataset and their performance reduces substantially in practical scenarios such as detecting synthetic speech shared on social platforms. In this paper we propose, Patched Spectrogram Synthetic Speech Detection Transformer (PS3DT), a synthetic speech detector that converts a time domain speech signal to a mel-spectrogram and processes it in patches using a transformer neural network. We evaluate the detection performance of PS3DT on ASVspoof2019 dataset. Our experiments show that PS3DT performs well on ASVspoof2019 dataset compared to other approaches using spectrogram for synthetic speech detection. We also investigate generalization performance of PS3DT on In-the-Wild dataset. PS3DT generalizes well than several existing methods on detecting synthetic speech from an out-of-distribution dataset. We also evaluate robustness of PS3DT to detect telephone quality synthetic speech and synthetic speech shared on social platforms (compressed speech). PS3DT is robust to compression and can detect telephone quality synthetic speech better than several existing methods.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Novel application of Relief Algorithm in cascaded artificial neural network to predict wind speed for wind power resource assessment in India
Authors:
Hasmat Malik,
Amit Kumar Yadav,
Fausto Pedro García Márquez,
Jesús María Pinar-Pérez
Abstract:
Wind power generated by wind has non-schedule nature due to stochastic nature of meteorological variable. Hence energy business and control of wind power generation requires prediction of wind speed (WS) from few seconds to different time steps in advance. To deal with prediction shortcomings, various WS prediction methods have been used. Predictive data mining offers variety of methods for WS pre…
▽ More
Wind power generated by wind has non-schedule nature due to stochastic nature of meteorological variable. Hence energy business and control of wind power generation requires prediction of wind speed (WS) from few seconds to different time steps in advance. To deal with prediction shortcomings, various WS prediction methods have been used. Predictive data mining offers variety of methods for WS predictions where artificial neural network (ANN) is one of the reliable and accurate methods. It is observed from the result of this study that ANN gives better accuracy in comparison conventional model. The accuracy of WS prediction models is found to be dependent on input parameters and architecture type algorithms utilized. So the selection of most relevant input parameters is important research area in WS predicton field. The objective of the paper is twofold: first extensive review of ANN for wind power and WS prediction is carried out. Discussion and analysis of feature selection using Relief Algorithm (RA) in WS prediction are considered for different Indian sites. RA identify atmospheric pressure, solar radiation and relative humidity are relevant input variables. Based on relevant input variables Cascade ANN model is developed and prediction accuracy is evaluated. It is found that root mean square error (RMSE) for comparison between predicted and measured WS for training and testing wind speed are found to be 1.44 m/s and 1.49 m/s respectively. The developed cascade ANN model can be used to predict wind speed for sites where there are not WS measuring instruments are installed in India.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise
Authors:
Abhay Kumar Yadav,
Arjun Singh
Abstract:
In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield…
▽ More
In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield a 29.79% score on AlpacaEval. However, our approach, SymNoise, increases this score significantly to 69.04%, using symmetric noisy embeddings. This is a 6.7% improvement over the state-of-the-art method, NEFTune~(64.69%). Furthermore, when tested on various models and stronger baseline instruction datasets, such as Evol-Instruct, ShareGPT, OpenPlatypus, SymNoise consistently outperforms NEFTune. The current literature, including NEFTune, has underscored the importance of more in-depth research into the application of noise-based strategies in the fine-tuning of language models. Our approach, SymNoise, is another significant step towards this direction, showing notable improvement over the existing state-of-the-art method.
△ Less
Submitted 8 December, 2023; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Commitment over Gaussian Unfair Noisy Channels
Authors:
Amitalok J. Budkuley,
Pranav Joshi,
Manideep Mamindlapally,
Anuj Kumar Yadav
Abstract:
Commitment is a key primitive which resides at the heart of several cryptographic protocols. Noisy channels can help realize information-theoretically secure commitment schemes, however, their imprecise statistical characterization can severely impair such schemes, especially their security guarantees. Keeping our focus on channel unreliability in this work, we study commitment over unreliable con…
▽ More
Commitment is a key primitive which resides at the heart of several cryptographic protocols. Noisy channels can help realize information-theoretically secure commitment schemes, however, their imprecise statistical characterization can severely impair such schemes, especially their security guarantees. Keeping our focus on channel unreliability in this work, we study commitment over unreliable continuous alphabet channels called the Gaussian unfair noisy channels or Gaussian UNCs.
We present the first results on the optimal throughput or commitment capacity of Gaussian UNCs. It is known that classical Gaussian channels have infinite commitment capacity, even under finite transmit power constraints. For unreliable Gaussian UNCs, we prove the surprising result that their commitment capacity may be finite, and in some cases, zero. When commitment is possible, we present achievable rate lower bounds by constructing positive - throughput protocols under given input power constraint, and (two-sided) channel elasticity at committer Alice and receiver Bob. Our achievability results establish an interesting fact - Gaussian UNCs with zero elasticity have infinite commitment capacity - which brings a completely new perspective to why classic Gaussian channels, i.e., Gaussian UNCs with zero elasticity, have infinite capacity. Finally, we precisely characterize the positive commitment capacity threshold for a Gaussian UNC in terms of the channel elasticity, when the transmit power tends to infinity.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Information Spectrum Converse for Minimum Entropy Couplings and Functional Representations
Authors:
Yanina Y. Shkel,
Anuj Kumar Yadav
Abstract:
Given two jointly distributed random variables $(X,Y)$, a functional representation of $X$ is a random variable $Z$ independent of $Y$, and a deterministic function $g(\cdot, \cdot)$ such that $X=g(Y,Z)$. The problem of finding a minimum entropy functional representation is known to be equivalent to the problem of finding a minimum entropy coupling where, given a collection of probability distribu…
▽ More
Given two jointly distributed random variables $(X,Y)$, a functional representation of $X$ is a random variable $Z$ independent of $Y$, and a deterministic function $g(\cdot, \cdot)$ such that $X=g(Y,Z)$. The problem of finding a minimum entropy functional representation is known to be equivalent to the problem of finding a minimum entropy coupling where, given a collection of probability distributions $P_1, \dots, P_m$, the goal is to find a coupling $X_1, \dots, X_m$ ($X_i \sim P_i)$ with the smallest entropy $H_α(X_1, \dots, X_m)$. This paper presents a new information spectrum converse, and applies it to obtain direct lower bounds on minimum entropy in both problems. The new results improve on all known lower bounds, including previous lower bounds based on the concept of majorization. In particular, the presented proofs leverage both - the information spectrum and the majorization - perspectives on minimum entropy couplings and functional representations.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
DSVAE: Interpretable Disentangled Representation for Synthetic Speech Detection
Authors:
Amit Kumar Singh Yadav,
Kratika Bhagtani,
Ziyue Xiang,
Paolo Bestagini,
Stefano Tubaro,
Edward J. Delp
Abstract:
Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approach…
▽ More
Tools to generate high quality synthetic speech signal that is perceptually indistinguishable from speech recorded from human speakers are easily available. Several approaches have been proposed for detecting synthetic speech. Many of these approaches use deep learning methods as a black box without providing reasoning for the decisions they make. This limits the interpretability of these approaches. In this paper, we propose Disentangled Spectrogram Variational Auto Encoder (DSVAE) which is a two staged trained variational autoencoder that processes spectrograms of speech using disentangled representation learning to generate interpretable representations of a speech signal for detecting synthetic speech. DSVAE also creates an activation map to highlight the spectrogram regions that discriminate synthetic and bona fide human speech signals. We evaluated the representations obtained from DSVAE using the ASVspoof2019 dataset. Our experimental results show high accuracy (>98%) on detecting synthetic speech from 6 known and 10 out of 11 unknown speech synthesizers. We also visualize the representation obtained from DSVAE for 17 different speech synthesizers and verify that they are indeed interpretable and discriminate bona fide and synthetic speech from each of the synthesizers.
△ Less
Submitted 28 July, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Deep Learning based Segmentation of Optical Coherence Tomographic Images of Human Saphenous Varicose Vein
Authors:
Maryam Viqar,
Violeta Madjarova,
Amit Kumar Yadav,
Desislava Pashkuleva,
Alexander S. Machikhin
Abstract:
Deep-learning based segmentation model is proposed for Optical Coherence Tomography images of human varicose vein based on the U-Net model employing atrous convolution with residual blocks, which gives an accuracy of 0.9932.
Deep-learning based segmentation model is proposed for Optical Coherence Tomography images of human varicose vein based on the U-Net model employing atrous convolution with residual blocks, which gives an accuracy of 0.9932.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Univariate and Multivariate LSTM Model for Short-Term Stock Market Prediction
Authors:
Vishal Kuber,
Divakar Yadav,
Arun Kr Yadav
Abstract:
Designing robust and accurate prediction models has been a viable research area since a long time. While proponents of a well-functioning market predictors believe that it is difficult to accurately predict market prices but many scholars disagree. Robust and accurate prediction systems will not only be helpful to the businesses but also to the individuals in making their financial investments. Th…
▽ More
Designing robust and accurate prediction models has been a viable research area since a long time. While proponents of a well-functioning market predictors believe that it is difficult to accurately predict market prices but many scholars disagree. Robust and accurate prediction systems will not only be helpful to the businesses but also to the individuals in making their financial investments. This paper presents an LSTM model with two different input approaches for predicting the short-term stock prices of two Indian companies, Reliance Industries and Infosys Ltd. Ten years of historic data (2012-2021) is taken from the yahoo finance website to carry out analysis of proposed approaches. In the first approach, closing prices of two selected companies are directly applied on univariate LSTM model. For the approach second, technical indicators values are calculated from the closing prices and then collectively applied on Multivariate LSTM model. Short term market behaviour for upcoming days is evaluated. Experimental outcomes revel that approach one is useful to determine the future trend but multivariate LSTM model with technical indicators found to be useful in accurately predicting the future price behaviours.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
An Overview of Recent Work in Media Forensics: Methods and Threats
Authors:
Kratika Bhagtani,
Amit Kumar Singh Yadav,
Emily R. Bartusiak,
Ziyue Xiang,
Ruiting Shao,
Sriram Baireddy,
Edward J. Delp
Abstract:
In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions…
▽ More
In this paper, we review recent work in media forensics for digital images, video, audio (specifically speech), and documents. For each data modality, we discuss synthesis and manipulation techniques that can be used to create and modify digital media. We then review technological advancements for detecting and quantifying such manipulations. Finally, we consider open issues and suggest directions for future research.
△ Less
Submitted 12 May, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Automatic Text Summarization Methods: A Comprehensive Review
Authors:
Divakar Yadav,
Jalpa Desai,
Arun Kumar Yadav
Abstract:
One of the most pressing issues that have arisen due to the rapid growth of the Internet is known as information overloading. Simplifying the relevant information in the form of a summary will assist many people because the material on any topic is plentiful on the Internet. Manually summarising massive amounts of text is quite challenging for humans. So, it has increased the need for more complex…
▽ More
One of the most pressing issues that have arisen due to the rapid growth of the Internet is known as information overloading. Simplifying the relevant information in the form of a summary will assist many people because the material on any topic is plentiful on the Internet. Manually summarising massive amounts of text is quite challenging for humans. So, it has increased the need for more complex and powerful summarizers. Researchers have been trying to improve approaches for creating summaries since the 1950s, such that the machine-generated summary matches the human-created summary. This study provides a detailed state-of-the-art analysis of text summarization concepts such as summarization approaches, techniques used, standard datasets, evaluation metrics and future scopes for research. The most commonly accepted approaches are extractive and abstractive, studied in detail in this work. Evaluating the summary and increasing the development of reusable resources and infrastructure aids in comparing and replicating findings, adding competition to improve the outcomes. Different evaluation methods of generated summaries are also discussed in this study. Finally, at the end of this study, several challenges and research opportunities related to text summarization research are mentioned that may be useful for potential researchers working in this area.
△ Less
Submitted 3 March, 2022;
originally announced April 2022.
-
On Reverse Elastic Channels and the Asymmetry of Commitment Capacity under Channel Elasticity
Authors:
Amitalok J. Budkuley,
Pranav Joshi,
Manideep Mamindlapally,
Anuj Kumar Yadav
Abstract:
Commitment is an important cryptographic primitive. It is well known that noisy channels are a promising resource to realize commitment in an information-theoretically secure manner. However, oftentimes, channel behaviour may be poorly characterized thereby limiting the commitment throughput and/or degrading the security guarantees; particularly problematic is when a dishonest party, unbeknown to…
▽ More
Commitment is an important cryptographic primitive. It is well known that noisy channels are a promising resource to realize commitment in an information-theoretically secure manner. However, oftentimes, channel behaviour may be poorly characterized thereby limiting the commitment throughput and/or degrading the security guarantees; particularly problematic is when a dishonest party, unbeknown to the honest one, can maliciously alter the channel characteristics. Reverse elastic channels (RECs) are an interesting class of such unreliable channels, where only a dishonest committer, say, Alice can maliciously alter the channel. RECs have attracted recent interest in the study of several cryptographic primitives.
Our principal contribution is the REC commitment capacity characterization; this proves a recent related conjecture. A key result is our tight converse which analyses a specific cheating strategy by Alice. RECs are closely related to the classic unfair noisy channels (UNCs); elastic channels (ECs), where only a dishonest receiver Bob can alter the channel, are similarly related. In stark contrast to UNCs, both RECs and ECs always exhibit positive commitment throughput for all non-trivial parameters. Interestingly, our results show that channels with exclusive one-sided elasticity for dishonest parties, exhibit a fundamental asymmetry where a committer with one-sided elasticity has a more debilitating effect on the commitment throughput than a receiver.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation
Authors:
Carlos Castillo,
Soham De,
Xintong Han,
Bharat Singh,
Abhay Kumar Yadav,
Tom Goldstein
Abstract:
Style transfer is an important task in which the style of a source image is mapped onto that of a target image. The method is useful for synthesizing derivative works of a particular artist or specific painting. This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image. For example, an artist may wish to alter the style of only…
▽ More
Style transfer is an important task in which the style of a source image is mapped onto that of a target image. The method is useful for synthesizing derivative works of a particular artist or specific painting. This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image. For example, an artist may wish to alter the style of only one particular object in a target image without altering the object's general morphology or surroundings. This is useful, for example, in augmented reality applications (such as the recently released Pokemon GO), where one wants to alter the appearance of a single real-world object in an image frame to make it appear as a cartoon. Most notably, the rendering of real-world objects into cartoon characters has been used in a number of films and television show, such as the upcoming series Son of Zorn. We present a method for targeted style transfer that simultaneously segments and stylizes single objects selected by the user. The method uses a Markov random field model to smooth and anti-alias outlier pixels near object boundaries, so that stylized objects naturally blend into their surroundings.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.