-
Automatic Data Retrieval for Cross Lingual Summarization
Authors:
Nikhilesh Bhatnagar,
Ashok Urlana,
Vandan Mujadia,
Pruthwik Mishra,
Dipti Misra Sharma
Abstract:
Cross-lingual summarization involves the summarization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be h…
▽ More
Cross-lingual summarization involves the summarization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be helpful for data acquisition for cross lingual summarization. We analyze the data and propose methods to match articles to video descriptions that serve as document and summary pairs. We also outline filtering methods over reasonable thresholds to ensure the correctness of the summaries. Further, we make available 28,583 mono and cross-lingual article-summary pairs https://github.com/tingc9/Cross-Sum-News-Aligned. We also build and analyze multiple baselines on the collected data and report error analysis.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Experiments in Linear Template Combination using Genetic Algorithms
Authors:
Nikhilesh Bhatnagar,
Radhika Mamidi
Abstract:
Natural Language Generation systems typically have two parts - strategic ('what to say') and tactical ('how to say'). We present our experiments in building an unsupervised corpus-driven template based tactical NLG system. We consider templates as a sequence of words containing gaps. Our idea is based on the observation that templates are grammatical locally (within their textual span). We posit t…
▽ More
Natural Language Generation systems typically have two parts - strategic ('what to say') and tactical ('how to say'). We present our experiments in building an unsupervised corpus-driven template based tactical NLG system. We consider templates as a sequence of words containing gaps. Our idea is based on the observation that templates are grammatical locally (within their textual span). We posit the construction of a sentence as a highly restricted sequence of such templates. This work is an attempt to explore the resulting search space using Genetic Algorithms to arrive at acceptable solutions. We present a baseline implementation of this approach which outputs gapped text.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Simulated Tempering and Swapping on Mean-Field Models
Authors:
Nayantara Bhatnagar,
Dana Randall
Abstract:
Simulated and parallel tempering are families of Markov Chain Monte Carlo algorithms where a temperature parameter is varied during the simulation to overcome bottlenecks to convergence due to multimodality.
In this work we introduce and analyze the convergence for a set of new tempering distributions which we call \textit{entropy dampening}. For asymmetric exponential distributions and the mean…
▽ More
Simulated and parallel tempering are families of Markov Chain Monte Carlo algorithms where a temperature parameter is varied during the simulation to overcome bottlenecks to convergence due to multimodality.
In this work we introduce and analyze the convergence for a set of new tempering distributions which we call \textit{entropy dampening}. For asymmetric exponential distributions and the mean field Ising model with and external field simulated tempering is known to converge slowly. We show that tempering with entropy dampening distributions mixes in polynomial time for these models.
Examining slow mixing times of tempering more closely, we show that for the mean-field 3-state ferromagnetic Potts model, tempering converges slowly regardless of the temperature schedule chosen. On the other hand, tempering with entropy dampening distributions converges in polynomial time to stationarity. Finally we show that the slow mixing can be very expensive practically. In particular, the mixing time of simulated tempering is an exponential factor longer than the mixing time at the fixed temperature.
△ Less
Submitted 19 August, 2015;
originally announced August 2015.
-
The Computational Complexity of Estimating Convergence Time
Authors:
Nayantara Bhatnagar,
Andrej Bogdanov,
Elchanan Mossel
Abstract:
An important problem in the implementation of Markov Chain Monte Carlo algorithms is to determine the convergence time, or the number of iterations before the chain is close to stationarity. For many Markov chains used in practice this time is not known. Even in cases where the convergence time is known to be polynomial, the theoretical bounds are often too crude to be practical. Thus, practitione…
▽ More
An important problem in the implementation of Markov Chain Monte Carlo algorithms is to determine the convergence time, or the number of iterations before the chain is close to stationarity. For many Markov chains used in practice this time is not known. Even in cases where the convergence time is known to be polynomial, the theoretical bounds are often too crude to be practical. Thus, practitioners like to carry out some form of statistical analysis in order to assess convergence. This has led to the development of a number of methods known as convergence diagnostics which attempt to diagnose whether the Markov chain is far from stationarity. We study the problem of testing convergence in the following settings and prove that the problem is hard in a computational sense: Given a Markov chain that mixes rapidly, it is hard for Statistical Zero Knowledge (SZK-hard) to distinguish whether starting from a given state, the chain is close to stationarity by time t or far from stationarity at time ct for a constant c. We show the problem is in AM intersect coAM. Second, given a Markov chain that mixes rapidly it is coNP-hard to distinguish whether it is close to stationarity by time t or far from stationarity at time ct for a constant c. The problem is in coAM. Finally, it is PSPACE-complete to distinguish whether the Markov chain is close to stationarity by time t or far from being mixed at time ct for c at least 1.
△ Less
Submitted 1 July, 2010;
originally announced July 2010.
-
Reconstruction Threshold for the Hardcore Model
Authors:
Nayantara Bhatnagar,
Allan Sly,
Prasad Tetali
Abstract:
In this paper we consider the reconstruction problem on the tree for the hardcore model. We determine new bounds for the non-reconstruction regime on the k-regular tree showing non-reconstruction when lambda < (ln 2-o(1))ln^2(k)/(2 lnln(k)) improving the previous best bound of lambda < e-1. This is almost tight as reconstruction is known to hold when lambda> (e+o(1))ln^2(k). We discuss the relat…
▽ More
In this paper we consider the reconstruction problem on the tree for the hardcore model. We determine new bounds for the non-reconstruction regime on the k-regular tree showing non-reconstruction when lambda < (ln 2-o(1))ln^2(k)/(2 lnln(k)) improving the previous best bound of lambda < e-1. This is almost tight as reconstruction is known to hold when lambda> (e+o(1))ln^2(k). We discuss the relationship for finding large independent sets in sparse random graphs and to the mixing time of Markov chains for sampling independent sets on trees.
△ Less
Submitted 20 April, 2010;
originally announced April 2010.
-
A computational method for bounding the probability of reconstruction on trees
Authors:
Nayantara Bhatnagar,
Elitza Maneva
Abstract:
For a tree Markov random field non-reconstruction is said to hold if as the depth of the tree goes to infinity the information that a typical configuration at the leaves gives about the value at the root goes to zero. The distribution of the measure at the root conditioned on a typical boundary can be computed using a distributional recurrence. However the exact computation is not feasible becau…
▽ More
For a tree Markov random field non-reconstruction is said to hold if as the depth of the tree goes to infinity the information that a typical configuration at the leaves gives about the value at the root goes to zero. The distribution of the measure at the root conditioned on a typical boundary can be computed using a distributional recurrence. However the exact computation is not feasible because the support of the distribution grows exponentially with the depth.
In this work, we introduce a notion of a survey of a distribution over probability vectors which is a succinct representation of the true distribution. We show that a survey of the distribution of the measure at the root can be constructed by an efficient recursive algorithm. The key properties of surveys are that the size does not grow with the depth, they can be constructed recursively, and they still provide a good bound for the distance between the true conditional distribution and the unconditional distribution at the root. This approach applies to a large class of Markov random field models including randomly generated ones. As an application we show bounds on the reconstruction threshold for the Potts model on small-degree trees.
△ Less
Submitted 27 March, 2009;
originally announced March 2009.