-
HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
Authors:
Dor Tsur,
Carol Xuan Long,
Claudio Mayrink Verdun,
Hsiang Hsu,
Chen-Fu Chen,
Haim Permuter,
Sajani Vithana,
Flavio P. Calmon
Abstract:
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is par…
▽ More
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks - such as coding - where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory. The code implementation can be found in https://github.com/DorTsur/HeavyWater_SimplexWater
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Neural Estimation for Scaling Entropic Multimarginal Optimal Transport
Authors:
Dor Tsur,
Ziv Goldfeld,
Kristjan Greenewald,
Haim Permuter
Abstract:
Multimarginal optimal transport (MOT) is a powerful framework for modeling interactions between multiple distributions, yet its applicability is bottlenecked by a high computational overhead. Entropic regularization provides computational speedups via the multimarginal Sinkhorn algorithm, whose time complexity, for a dataset size $n$ and $k$ marginals, generally scales as $O(n^k)$. However, this d…
▽ More
Multimarginal optimal transport (MOT) is a powerful framework for modeling interactions between multiple distributions, yet its applicability is bottlenecked by a high computational overhead. Entropic regularization provides computational speedups via the multimarginal Sinkhorn algorithm, whose time complexity, for a dataset size $n$ and $k$ marginals, generally scales as $O(n^k)$. However, this dependence on the dataset size $n$ is computationally prohibitive for many machine learning problems. In this work, we propose a new computational framework for entropic MOT, dubbed Neural Entropic MOT (NEMOT), that enjoys significantly improved scalability. NEMOT employs neural networks trained using mini-batches, which transfers the computational complexity from the dataset size to the size of the mini-batch, leading to substantial gains. We provide formal guarantees on the accuracy of NEMOT via non-asymptotic error bounds. We supplement these with numerical results that demonstrate the performance gains of NEMOT over Sinkhorn's algorithm, as well as extensions to neural computation of multimarginal entropic Gromov-Wasserstein alignment. In particular, orders-of-magnitude speedups are observed relative to the state-of-the-art, with a notable increase in the feasible number of samples and marginals. NEMOT seamlessly integrates as a module in large-scale machine learning pipelines, and can serve to expand the practical applicability of entropic MOT for tasks involving multimarginal data.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
Optimized Couplings for Watermarking Large Language Models
Authors:
Dor Tsur,
Carol Xuan Long,
Claudio Mayrink Verdun,
Hsiang Hsu,
Haim Permuter,
Flavio P. Calmon
Abstract:
Large-language models (LLMs) are now able to produce text that is, in many cases, seemingly indistinguishable from human-generated content. This has fueled the development of watermarks that imprint a ``signal'' in LLM-generated text with minimal perturbation of an LLM's output. This paper provides an analysis of text watermarking in a one-shot setting. Through the lens of hypothesis testing with…
▽ More
Large-language models (LLMs) are now able to produce text that is, in many cases, seemingly indistinguishable from human-generated content. This has fueled the development of watermarks that imprint a ``signal'' in LLM-generated text with minimal perturbation of an LLM's output. This paper provides an analysis of text watermarking in a one-shot setting. Through the lens of hypothesis testing with side information, we formulate and analyze the fundamental trade-off between watermark detection power and distortion in generated textual quality. We argue that a key component in watermark design is generating a coupling between the side information shared with the watermark detector and a random partition of the LLM vocabulary. Our analysis identifies the optimal coupling and randomization strategy under the worst-case LLM next-token distribution that satisfies a min-entropy constraint. We provide a closed-form expression of the resulting detection rate under the proposed scheme and quantify the cost in a max-min sense. Finally, we provide an array of numerical results, comparing the proposed scheme with the theoretical optimum and existing schemes, in both synthetic data and LLM watermarking. Our code is available at https://github.com/Carol-Long/CC_Watermark
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Efficient Time Series Forecasting via Hyper-Complex Models and Frequency Aggregation
Authors:
Eyal Yakir,
Dor Tsur,
Haim Permuter
Abstract:
Time series forecasting is a long-standing problem in statistics and machine learning. One of the key challenges is processing sequences with long-range dependencies. To that end, a recent line of work applied the short-time Fourier transform (STFT), which partitions the sequence into multiple subsequences and applies a Fourier transform to each separately. We propose the Frequency Information Agg…
▽ More
Time series forecasting is a long-standing problem in statistics and machine learning. One of the key challenges is processing sequences with long-range dependencies. To that end, a recent line of work applied the short-time Fourier transform (STFT), which partitions the sequence into multiple subsequences and applies a Fourier transform to each separately. We propose the Frequency Information Aggregation (FIA)-Net, which is based on a novel complex-valued MLP architecture that aggregates adjacent window information in the frequency domain. To further increase the receptive field of the FIA-Net, we treat the set of windows as hyper-complex (HC) valued vectors and employ HC algebra to efficiently combine information from all STFT windows altogether. Using the HC-MLP backbone allows for improved handling of sequences with long-term dependence. Furthermore, due to the nature of HC operations, the HC-MLP uses up to three times fewer parameters than the equivalent standard window aggregation method. We evaluate the FIA-Net on various time-series benchmarks and show that the proposed methodologies outperform existing state of the art methods in terms of both accuracy and efficiency. Our code is publicly available on https://anonymous.4open.science/r/research-1803/.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Faster parameterized algorithm for 3-Hitting Set
Authors:
Dekel Tsur
Abstract:
In the 3-Hitting Set problem, the input is a hypergraph $G$ such that the size of every hyperedge of $G$ is at most 3, and an integers $k$, and the goal is to decide whether there is a set $S$ of at most $k$ vertices such that every hyperedge of $G$ contains at least one vertex from $S$. In this paper we give an $O^*(2.0409^k)$-time algorithm for 3-Hitting Set.
In the 3-Hitting Set problem, the input is a hypergraph $G$ such that the size of every hyperedge of $G$ is at most 3, and an integers $k$, and the goal is to decide whether there is a set $S$ of at most $k$ vertices such that every hyperedge of $G$ contains at least one vertex from $S$. In this paper we give an $O^*(2.0409^k)$-time algorithm for 3-Hitting Set.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
InfoMat: A Tool for the Analysis and Visualization Sequential Information Transfer
Authors:
Dor Tsur,
Haim Permuter
Abstract:
Despite the popularity of information measures in analysis of probabilistic systems, proper tools for their visualization are not common. This work develops a simple matrix representation of information transfer in sequential systems, termed information matrix (InfoMat). The simplicity of the InfoMat provides a new visual perspective on existing decomposition formulas of mutual information, and en…
▽ More
Despite the popularity of information measures in analysis of probabilistic systems, proper tools for their visualization are not common. This work develops a simple matrix representation of information transfer in sequential systems, termed information matrix (InfoMat). The simplicity of the InfoMat provides a new visual perspective on existing decomposition formulas of mutual information, and enables us to prove new relations between sequential information theoretic measures. We study various estimation schemes of the InfoMat, facilitating the visualization of information transfer in sequential datasets. By drawing a connection between visual patterns in the InfoMat and various dependence structures, we observe how information transfer evolves in the dataset. We then leverage this tool to visualize the effect of capacity-achieving coding schemes on the underlying exchange of information. We believe the InfoMat is applicable to any time-series task for a better understanding of the data at hand.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
TREET: TRansfer Entropy Estimation via Transformers
Authors:
Omer Luxembourg,
Dor Tsur,
Haim Permuter
Abstract:
Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes, providing valuable insights for a wide range of real-world applications. This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes. The proposed approach employs Donsker-Varadhan repr…
▽ More
Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes, providing valuable insights for a wide range of real-world applications. This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes. The proposed approach employs Donsker-Varadhan representation to TE and leverages the attention mechanism for the task of neural estimation. We propose a detailed theoretical and empirical study of the TREET, comparing it to existing methods on a dedicated estimation benchmark. To increase its applicability, we design an estimated TE optimization scheme that is motivated by the functional representation lemma, and use it to estimate the capacity of communication channels with memory, which is a canonical optimization problem in information theory. We further demonstrate how an optimized TREET can be used to estimate underlying densities, providing experimental results. Finally, we apply TREET to feature analysis of patients with Apnea, demonstrating its applicability to real-world physiological data. Our work, applied with state-of-the-art deep learning methods, opens a new door for communication problems which are yet to be solved.
△ Less
Submitted 14 May, 2025; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Max-Sliced Mutual Information
Authors:
Dor Tsur,
Ziv Goldfeld,
Kristjan Greenewald
Abstract:
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA on…
▽ More
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Data-Driven Optimization of Directed Information over Discrete Alphabets
Authors:
Dor Tsur,
Ziv Aharoni,
Ziv Goldfeld,
Haim Permuter
Abstract:
Directed information (DI) is a fundamental measure for the study and analysis of sequential stochastic models. In particular, when optimized over input distributions it characterizes the capacity of general communication channels. However, analytic computation of DI is typically intractable and existing optimization techniques over discrete input alphabets require knowledge of the channel model, w…
▽ More
Directed information (DI) is a fundamental measure for the study and analysis of sequential stochastic models. In particular, when optimized over input distributions it characterizes the capacity of general communication channels. However, analytic computation of DI is typically intractable and existing optimization techniques over discrete input alphabets require knowledge of the channel model, which renders them inapplicable when only samples are available. To overcome these limitations, we propose a novel estimation-optimization framework for DI over discrete input spaces. We formulate DI optimization as a Markov decision process and leverage reinforcement learning techniques to optimize a deep generative model of the input process probability mass function (PMF). Combining this optimizer with the recently developed DI neural estimator, we obtain an end-to-end estimation-optimization algorithm which is applied to estimating the (feedforward and feedback) capacity of various discrete channels with memory. Furthermore, we demonstrate how to use the optimized PMF model to (i) obtain theoretical bounds on the feedback capacity of unifilar finite-state channels; and (ii) perform probabilistic shaping of constellations in the peak power-constrained additive white Gaussian noise channel.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Neural Estimation and Optimization of Directed Information over Continuous Spaces
Authors:
Dor Tsur,
Ziv Aharoni,
Ziv Goldfeld,
Haim Permuter
Abstract:
This work develops a new method for estimating and optimizing the directed information rate between two jointly stationary and ergodic stochastic processes. Building upon recent advances in machine learning, we propose a recurrent neural network (RNN)-based estimator which is optimized via gradient ascent over the RNN parameters. The estimator does not require prior knowledge of the underlying joi…
▽ More
This work develops a new method for estimating and optimizing the directed information rate between two jointly stationary and ergodic stochastic processes. Building upon recent advances in machine learning, we propose a recurrent neural network (RNN)-based estimator which is optimized via gradient ascent over the RNN parameters. The estimator does not require prior knowledge of the underlying joint and marginal distributions. The estimator is also readily optimized over continuous input processes realized by a deep generative model. We prove consistency of the proposed estimation and optimization methods and combine them to obtain end-to-end performance guarantees. Applications for channel capacity estimation of continuous channels with memory are explored, and empirical results demonstrating the scalability and accuracy of our method are provided. When the channel is memoryless, we investigate the mapping learned by the optimized input generator.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Capacity of Continuous Channels with Memory via Directed Information Neural Estimator
Authors:
Ziv Aharoni,
Dor Tsur,
Ziv Goldfeld,
Haim Henry Permuter
Abstract:
Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information (DI) rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, th…
▽ More
Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information (DI) rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, there is no unified framework for calculating or even approximating capacity. This work proposes a novel capacity estimation algorithm that treats the channel as a `black-box', both when feedback is or is not present. The algorithm has two main ingredients: (i) a neural distribution transformer (NDT) model that shapes a noise variable into the channel input distribution, which we are able to sample, and (ii) the DI neural estimator (DINE) that estimates the communication rate of the current NDT model. These models are trained by an alternating maximization procedure to both estimate the channel capacity and obtain an NDT for the optimal input distribution. The method is demonstrated on the moving average additive Gaussian noise channel, where it is shown that both the capacity and feedback capacity are estimated without knowledge of the channel transition kernel. The proposed estimation framework opens the door to a myriad of capacity approximation results for continuous alphabet channels that were inaccessible until now.
△ Less
Submitted 16 May, 2020; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Faster parameterized algorithm for Bicluter Editing
Authors:
Dekel Tsur
Abstract:
In the Bicluter Editing problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether $G$ can be transformed into a bicluster graph by adding and removing at most $k$ edges. In this paper we give an algorithm for Bicluster Editing whose running time is $O^*(3.116^k)$.
In the Bicluter Editing problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether $G$ can be transformed into a bicluster graph by adding and removing at most $k$ edges. In this paper we give an algorithm for Bicluster Editing whose running time is $O^*(3.116^k)$.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
An algorithm for destroying claws and diamonds
Authors:
Dekel Tsur
Abstract:
In the {Claw,Diamond}-Free Edge Deletion problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of edges of size at most $k$ such that removing the edges of the set from $G$ results a graph that does not contain an induced claw or diamond. In this paper we give an algorithm for this problem whose running time is $O^*(3.562^k)$.
In the {Claw,Diamond}-Free Edge Deletion problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of edges of size at most $k$ such that removing the edges of the set from $G$ results a graph that does not contain an induced claw or diamond. In this paper we give an algorithm for this problem whose running time is $O^*(3.562^k)$.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
Kernel for Kt-free edge deletion
Authors:
Dekel Tsur
Abstract:
In the $K_t$-free edge deletion problem, the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of at most $k$ edges of $G$ whose removal results a graph with no clique of size $t$. In this paper we give a kernel to this problem with $O(k^{t-1})$ vertices and edges.
In the $K_t$-free edge deletion problem, the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of at most $k$ edges of $G$ whose removal results a graph with no clique of size $t$. In this paper we give a kernel to this problem with $O(k^{t-1})$ vertices and edges.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
Faster algorithms for cograph edge modification problems
Authors:
Dekel Tsur
Abstract:
In the Cograph Deletion (resp., Cograph Editing) problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of edges of size at most $k$ whose removal from $G$ (resp., removal and addition to $G$) results in a graph that does not contain an induced path with four vertices. In this paper we give algorithms for Cograph Deletion and Cograph Editing whose run…
▽ More
In the Cograph Deletion (resp., Cograph Editing) problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of edges of size at most $k$ whose removal from $G$ (resp., removal and addition to $G$) results in a graph that does not contain an induced path with four vertices. In this paper we give algorithms for Cograph Deletion and Cograph Editing whose running times are $O^*(2.303^k)$ and $O^*(4.329^k)$, respectively.
△ Less
Submitted 30 December, 2019; v1 submitted 3 August, 2019;
originally announced August 2019.
-
An FPT algorithm for orthogonal buttons and scissors
Authors:
Dekel Tsur
Abstract:
We study the puzzle game Buttons and Scissors in which the goal is to remove all buttons from an $n\times m$ grid by a series of horizontal and vertical cuts. We show that the corresponding parameterized problem has an algorithm with time complexity $2^{O(k^2 \log k)} (n+m)^{O(1)}$, where $k$ is an upper bound on the number of cuts.
We study the puzzle game Buttons and Scissors in which the goal is to remove all buttons from an $n\times m$ grid by a series of horizontal and vertical cuts. We show that the corresponding parameterized problem has an algorithm with time complexity $2^{O(k^2 \log k)} (n+m)^{O(1)}$, where $k$ is an upper bound on the number of cuts.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Cluster deletion revisited
Authors:
Dekel Tsur
Abstract:
In the Cluster Deletion problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of at most $k$ edges whose removal from $G$ results a graph in which every connected component is a clique. In this paper we give an algorithm for Cluster Deletion whose running time is $O^*(1.404^k)$.
In the Cluster Deletion problem the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of at most $k$ edges whose removal from $G$ results a graph in which every connected component is a clique. In this paper we give an algorithm for Cluster Deletion whose running time is $O^*(1.404^k)$.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
l-path vertex cover is easier than l-hitting set for small l
Authors:
Dekel Tsur
Abstract:
In the $l$-path vertex cover problem the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that $G-S$ does not contain a path with $l$ vertices. In this paper we give parameterized algorithms for $l$-path vertex cover for $l = 5,6,7$, whose time complexities are $O^*(3.945^k)$, $O^*(4.947^k)$, and…
▽ More
In the $l$-path vertex cover problem the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that $G-S$ does not contain a path with $l$ vertices. In this paper we give parameterized algorithms for $l$-path vertex cover for $l = 5,6,7$, whose time complexities are $O^*(3.945^k)$, $O^*(4.947^k)$, and $O^*(5.951^k)$, respectively.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
Algorithms for deletion problems on split graphs
Authors:
Dekel Tsur
Abstract:
In the Split to Block Vertex Deletion and Split to Threshold Vertex Deletion problems the input is a split graph $G$ and an integer $k$, and the goal is to decide whether there is a set $S$ of at most $k$ vertices such that $G-S$ is a block graph and $G-S$ is a threshold graph, respectively. In this paper we give algorithms for these problems whose running times are $O^*(2.076^k)$ and…
▽ More
In the Split to Block Vertex Deletion and Split to Threshold Vertex Deletion problems the input is a split graph $G$ and an integer $k$, and the goal is to decide whether there is a set $S$ of at most $k$ vertices such that $G-S$ is a block graph and $G-S$ is a threshold graph, respectively. In this paper we give algorithms for these problems whose running times are $O^*(2.076^k)$ and $O^*(2.733^k)$, respectively.
△ Less
Submitted 25 July, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Faster parameterized algorithm for Cluster Vertex Deletion
Authors:
Dekel Tsur
Abstract:
In the Cluster Vertex Deletion problem the input is a graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that the deletion of the vertices of $S$ from $G$ results a graph in which every connected component is a clique. We give an algorithm for Cluster Vertex Deletion whose running time is $O^*(1.811^k)$.
In the Cluster Vertex Deletion problem the input is a graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that the deletion of the vertices of $S$ from $G$ results a graph in which every connected component is a clique. We give an algorithm for Cluster Vertex Deletion whose running time is $O^*(1.811^k)$.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
Faster parameterized algorithm for pumpkin vertex deletion set
Authors:
Dekel Tsur
Abstract:
A directed graph $G$ is called a pumpkin if $G$ is a union of induced paths with a common start vertex $s$ and a common end vertex $t$, and the internal vertices of every two paths are disjoint. We give an algorithm that given a directed graph $G$ and an integer $k$, decides whether a pumpkin can be obtained from $G$ by deleting at most $k$ vertices. The algorithm runs in $O^*(2^k)$ time.
A directed graph $G$ is called a pumpkin if $G$ is a union of induced paths with a common start vertex $s$ and a common end vertex $t$, and the internal vertices of every two paths are disjoint. We give an algorithm that given a directed graph $G$ and an integer $k$, decides whether a pumpkin can be obtained from $G$ by deleting at most $k$ vertices. The algorithm runs in $O^*(2^k)$ time.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Above guarantee parameterization for vertex cover on graphs with maximum degree 4
Authors:
Dekel Tsur
Abstract:
In the vertex cover problem, the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every edge of $G$ is incident on at least one vertex in $S$. We study the vertex cover problem on graphs with maximum degree 4 and minimum degree at least 2, parameterized by $r = k-n/3$. We give an algorithm for this problem whose…
▽ More
In the vertex cover problem, the input is a graph $G$ and an integer $k$, and the goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every edge of $G$ is incident on at least one vertex in $S$. We study the vertex cover problem on graphs with maximum degree 4 and minimum degree at least 2, parameterized by $r = k-n/3$. We give an algorithm for this problem whose running time is $O^*(1.6253^r)$. As a corollary, we obtain an $O^*(1.2403^k)$-time algorithm for vertex cover on graphs with maximum degree 4.
△ Less
Submitted 27 December, 2018;
originally announced December 2018.
-
An O^*(2.619^k) algorithm for 4-path vertex cover
Authors:
Dekel Tsur
Abstract:
In the 4-path vertex cover problem, the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every path with 4 vertices in $G$ contains at least one vertex of $S$. In this paper we give a parameterized algorithm for 4-path vertex cover whose time complexity is $O^*(2.619^k)$.
In the 4-path vertex cover problem, the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every path with 4 vertices in $G$ contains at least one vertex of $S$. In this paper we give a parameterized algorithm for 4-path vertex cover whose time complexity is $O^*(2.619^k)$.
△ Less
Submitted 5 January, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Weighted vertex cover on graphs with maximum degree 3
Authors:
Dekel Tsur
Abstract:
We give a parameterized algorithm for weighted vertex cover on graphs with maximum degree 3 whose time complexity is $O^*(1.402^t)$, where $t$ is the minimum size of a vertex cover of the input graph.
We give a parameterized algorithm for weighted vertex cover on graphs with maximum degree 3 whose time complexity is $O^*(1.402^t)$, where $t$ is the minimum size of a vertex cover of the input graph.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Parameterized algorithm for 3-path vertex cover
Authors:
Dekel Tsur
Abstract:
In the 3-path vertex cover problem, the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every path with 3 vertices in $G$ contains at least one vertex of $S$. In this paper we give parameterized algorithm for 3-path cover whose time complexity is $O^*(1.713^k)$. Our algorithm is faster than previous alg…
▽ More
In the 3-path vertex cover problem, the input is an undirected graph $G$ and an integer $k$. The goal is to decide whether there is a set of vertices $S$ of size at most $k$ such that every path with 3 vertices in $G$ contains at least one vertex of $S$. In this paper we give parameterized algorithm for 3-path cover whose time complexity is $O^*(1.713^k)$. Our algorithm is faster than previous algorithms for this problem.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Faster deterministic parameterized algorithm for k-Path
Authors:
Dekel Tsur
Abstract:
In the k-Path problem, the input is a directed graph $G$ and an integer $k\geq 1$, and the goal is to decide whether there is a simple directed path in $G$ with exactly $k$ vertices. We give a deterministic algorithm for k-Path with time complexity $O^*(2.554^k)$. This improves the previously best deterministic algorithm for this problem of Zehavi [ESA 2015] whose time complexity is…
▽ More
In the k-Path problem, the input is a directed graph $G$ and an integer $k\geq 1$, and the goal is to decide whether there is a simple directed path in $G$ with exactly $k$ vertices. We give a deterministic algorithm for k-Path with time complexity $O^*(2.554^k)$. This improves the previously best deterministic algorithm for this problem of Zehavi [ESA 2015] whose time complexity is $O^*(2.597^k)$. The technique used by our algorithm can also be used to obtain faster deterministic algorithms for k-Tree, r-Dimensional k-Matching, Graph Motif, and Partial Cover.
△ Less
Submitted 24 January, 2019; v1 submitted 13 August, 2018;
originally announced August 2018.
-
The effective entropy of next/previous larger/smaller value queries
Authors:
Dekel Tsur
Abstract:
We study the problem of storing the minimum number of bits required to answer next/previous larger/smaller value queries on an array $A$ of $n$ numbers, without storing $A$. We show that these queries can be answered by storing at most $3.701 n$ bits. Our result improves the result of Jo and Satti [TCS 2016] that gives an upper bound of $4.088n$ bits for this problem.
We study the problem of storing the minimum number of bits required to answer next/previous larger/smaller value queries on an array $A$ of $n$ numbers, without storing $A$. We show that these queries can be answered by storing at most $3.701 n$ bits. Our result improves the result of Jo and Satti [TCS 2016] that gives an upper bound of $4.088n$ bits for this problem.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
Dynamic all scores matrices for LCS score
Authors:
Amir Carmel,
Dekel Tsur,
Michal Ziv-Ukelson
Abstract:
The problem of aligning two strings A,B in order to determine their similarity is fundamental in the field of pattern matching. An important concept in this domain is the "all scores matrix" that encodes the local alignment comparison of two strings. Namely, let K denote the all scores matrix containing the alignment score of every substring of B with A, and let J denote the all scores matrix cont…
▽ More
The problem of aligning two strings A,B in order to determine their similarity is fundamental in the field of pattern matching. An important concept in this domain is the "all scores matrix" that encodes the local alignment comparison of two strings. Namely, let K denote the all scores matrix containing the alignment score of every substring of B with A, and let J denote the all scores matrix containing the alignment score of every suffix of B with every prefix of A.
In this paper we consider the problem of maintaining an all scores matrix where the scoring function is the LCS score, while supporting single character prepend and append operations to A and N. Our algorithms exploit the sparsity parameters L=LCS(A,B) and Delta = |B|-L. For the matrix K we propose an algorithm that supports incremental operations to both ends of A in O(Delta) time. Whilst for the matrix J we propose an algorithm that supports a single type of incremental operation, either a prepend operation to A or an append operation to B, in O(L) time. This structure can also be extended to support both operations simultaneously in O(L log log L) time.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
Representation of ordered trees with a given degree distribution
Authors:
Dekel Tsur
Abstract:
The degree distribution of an ordered tree $T$ with $n$ nodes is $\vec{n} = (n_0,\ldots,n_{n-1})$, where $n_i$ is the number of nodes in $T$ with $i$ children. Let $\mathcal{N}(\vec{n})$ be the number of trees with degree distribution $\vec{n}$. We give a data structure that stores an ordered tree $T$ with $n$ nodes and degree distribution $\vec{n}$ using $\log \mathcal{N}(\vec{n})+O(n/\log^t n)$…
▽ More
The degree distribution of an ordered tree $T$ with $n$ nodes is $\vec{n} = (n_0,\ldots,n_{n-1})$, where $n_i$ is the number of nodes in $T$ with $i$ children. Let $\mathcal{N}(\vec{n})$ be the number of trees with degree distribution $\vec{n}$. We give a data structure that stores an ordered tree $T$ with $n$ nodes and degree distribution $\vec{n}$ using $\log \mathcal{N}(\vec{n})+O(n/\log^t n)$ bits for every constant $t$. The data structure answers tree queries in constant time. This improves the current data structures with lowest space for ordered trees: The structure of Jansson et al.\ [JCSS 2012] that uses $\log\mathcal{N}(\vec{n})+O(n\log\log n/\log n)$ bits, and the structure of Navarro and Sadakane [TALG 2014] that uses $2n+O(n/\log^t n)$ bits for every constant $t$.
△ Less
Submitted 1 July, 2018;
originally announced July 2018.
-
Succinct data structure for dynamic trees with faster queries
Authors:
Dekel Tsur
Abstract:
Navarro and Sadakane [TALG 2014] gave a dynamic succinct data structure for storing an ordinal tree. The structure supports tree queries in either $O(\log n/\log\log n)$ or $O(\log n)$ time, and insertion or deletion of a single node in $O(\log n)$ time. In this paper we improve the result of Navarro and Sadakane by reducing the time complexities of some queries (e.g.\ degree and level\_ancestor)…
▽ More
Navarro and Sadakane [TALG 2014] gave a dynamic succinct data structure for storing an ordinal tree. The structure supports tree queries in either $O(\log n/\log\log n)$ or $O(\log n)$ time, and insertion or deletion of a single node in $O(\log n)$ time. In this paper we improve the result of Navarro and Sadakane by reducing the time complexities of some queries (e.g.\ degree and level\_ancestor) from $O(\log n)$ to $O(\log n/\log\log n)$.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.
-
Succinct data-structure for nearest colored node in a tree
Authors:
Dekel Tsur
Abstract:
We give a succinct data-structure that stores a tree with colors on the nodes. Given a node x and a color alpha, the structure finds the nearest node to x with color alpha. This results improves the $O(n\log n)$-bits structure of Gawrychowski et al.~[CPM 2016].
We give a succinct data-structure that stores a tree with colors on the nodes. Given a node x and a color alpha, the structure finds the nearest node to x with color alpha. This results improves the $O(n\log n)$-bits structure of Gawrychowski et al.~[CPM 2016].
△ Less
Submitted 18 February, 2017; v1 submitted 6 September, 2016;
originally announced September 2016.
-
Succinct representation of labeled trees
Authors:
Dekel Tsur
Abstract:
We give a representation for labeled ordered trees that supports labeled queries such as finding the i-th ancestor of a node with a given label. Our representation is succinct, namely the redundancy is small-o of the optimal space for storing the tree. This improves the representation of He et al. which is succinct unless the entropy of the labels is small.
We give a representation for labeled ordered trees that supports labeled queries such as finding the i-th ancestor of a node with a given label. Our representation is succinct, namely the redundancy is small-o of the optimal space for storing the tree. This improves the representation of He et al. which is succinct unless the entropy of the labels is small.
△ Less
Submitted 20 December, 2013;
originally announced December 2013.
-
Approximate String Matching using a Bidirectional Index
Authors:
Gregory Kucherov,
Kamil Salikhov,
Dekel Tsur
Abstract:
We study strategies of approximate pattern matching that exploit bidirectional text indexes, extending and generalizing ideas of Lam et al. We introduce a formalism, called search schemes, to specify search strategies of this type, then develop a probabilistic measure for the efficiency of a search scheme, prove several combinatorial results on efficient search schemes, and finally, provide experi…
▽ More
We study strategies of approximate pattern matching that exploit bidirectional text indexes, extending and generalizing ideas of Lam et al. We introduce a formalism, called search schemes, to specify search strategies of this type, then develop a probabilistic measure for the efficiency of a search scheme, prove several combinatorial results on efficient search schemes, and finally, provide experimental computations supporting the superiority of our strategies.
△ Less
Submitted 6 September, 2015; v1 submitted 5 October, 2013;
originally announced October 2013.