Search | arXiv e-print repository

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Authors: George Saon, Avihu Dekel, Alexander Brooks, Tohru Nagano, Abraham Daniels, Aharon Satt, Ashish Mittal, Brian Kingsbury, David Haws, Edmilson Morais, Gakuto Kurata, Hagai Aronowitz, Ibrahim Ibrahim, Jeff Kuo, Kate Soule, Luis Lastras, Masayuki Suzuki, Ron Hoory, Samuel Thomas, Sashi Novitasari, Takashi Fukuda, Vishal Sunder, Xiaodong Cui, Zvi Kons

Abstract: Granite-speech LLMs are compact and efficient speech language models specifically designed for English ASR and automatic speech translation (AST). The models were trained by modality aligning the 2B and 8B parameter variants of granite-3.3-instruct to speech on publicly available open-source corpora containing audio inputs and text targets consisting of either human transcripts for ASR or automati… ▽ More Granite-speech LLMs are compact and efficient speech language models specifically designed for English ASR and automatic speech translation (AST). The models were trained by modality aligning the 2B and 8B parameter variants of granite-3.3-instruct to speech on publicly available open-source corpora containing audio inputs and text targets consisting of either human transcripts for ASR or automatically generated translations for AST. Comprehensive benchmarking shows that on English ASR, which was our primary focus, they outperform several competitors' models that were trained on orders of magnitude more proprietary data, and they keep pace on English-to-X AST for major European languages, Japanese, and Chinese. The speech-specific components are: a conformer acoustic encoder using block attention and self-conditioning trained with connectionist temporal classification, a windowed query-transformer speech modality adapter used to do temporal downsampling of the acoustic embeddings and map them to the LLM text embedding space, and LoRA adapters to further fine-tune the text LLM. Granite-speech-3.3 operates in two modes: in speech mode, it performs ASR and AST by activating the encoder, projector, and LoRA adapters; in text mode, it calls the underlying granite-3.3-instruct model directly (without LoRA), essentially preserving all the text LLM capabilities and safety. Both models are freely available on HuggingFace (https://huggingface.co/ibm-granite/granite-speech-3.3-2b and https://huggingface.co/ibm-granite/granite-speech-3.3-8b) and can be used for both research and commercial purposes under a permissive Apache 2.0 license. △ Less

Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: 7 pages, 9 figures

arXiv:2410.16048 [pdf, other]

Continuous Speech Synthesis using per-token Latent Diffusion

Authors: Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, Hagai Aronowitz, David Haws, Ron Hoory, Avihu Dekel

Abstract: The success of autoregressive transformer models with discrete tokens has inspired quantization-based approaches for continuous modalities, though these often limit reconstruction quality. We therefore introduce SALAD, a per-token latent diffusion model for zero-shot text-to-speech, that operates on continuous representations. SALAD builds upon the recently proposed expressive diffusion head for i… ▽ More The success of autoregressive transformer models with discrete tokens has inspired quantization-based approaches for continuous modalities, though these often limit reconstruction quality. We therefore introduce SALAD, a per-token latent diffusion model for zero-shot text-to-speech, that operates on continuous representations. SALAD builds upon the recently proposed expressive diffusion head for image generation, and extends it to generate variable-length outputs. Our approach utilizes semantic tokens for providing contextual information and determining the stopping condition. We suggest three continuous variants for our method, extending popular discrete speech synthesis techniques. Additionally, we implement discrete baselines for each variant and conduct a comparative analysis of discrete versus continuous speech modeling techniques. Our results demonstrate that both continuous and discrete approaches are highly competent, and that SALAD achieves a superior intelligibility score while obtaining speech quality and speaker similarity on par with the ground-truth audio. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Preprint, Under review

arXiv:2309.11210 [pdf, other]

Speak While You Think: Streaming Speech Synthesis During Text Generation

Authors: Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory

Abstract: Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant l… ▽ More Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant latency reduction. LLM2Speech mimics the predictions of a non-streaming teacher model while limiting the exposure to future context in order to enable streaming. It exploits the hidden embeddings of the LLM, a by-product of the text generation that contains informative semantic context. Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Under review for ICASSP 2024

arXiv:2208.01818 [pdf, other]

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Authors: Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury

Abstract: Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses. However, recent studies have shown that decoding with hypothesis merging can achieve a more efficient search with comparable or better performance. But, the full context in recurrent networks is not compatible with hypothesis merging. We propose to use vector-quantized long short-… ▽ More Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses. However, recent studies have shown that decoding with hypothesis merging can achieve a more efficient search with comparable or better performance. But, the full context in recurrent networks is not compatible with hypothesis merging. We propose to use vector-quantized long short-term memory units (VQ-LSTM) in the prediction network of RNN transducers. By training the discrete representation jointly with the ASR network, hypotheses can be actively merged for lattice generation. Our experiments on the Switchboard corpus show that the proposed VQ RNN transducers improve ASR performance over transducers with regular prediction networks while also producing denser lattices with a very low oracle word error rate (WER) for the same beam size. Additional language model rescoring experiments also demonstrate the effectiveness of the proposed lattice generation scheme. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Interspeech 2022 accepted paper

arXiv:2207.12262 [pdf, other]

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Authors: Raul Fernandez, David Haws, Guy Lorberbom, Slava Shechtman, Alexander Sorin

Abstract: Sequence-to-Sequence Text-to-Speech architectures that directly generate low level acoustic features from phonetic sequences are known to produce natural and expressive speech when provided with adequate amounts of training data. Such systems can learn and transfer desired speaking styles from one seen speaker to another (in multi-style multi-speaker settings), which is highly desirable for creati… ▽ More Sequence-to-Sequence Text-to-Speech architectures that directly generate low level acoustic features from phonetic sequences are known to produce natural and expressive speech when provided with adequate amounts of training data. Such systems can learn and transfer desired speaking styles from one seen speaker to another (in multi-style multi-speaker settings), which is highly desirable for creating scalable and customizable Human-Computer Interaction systems. In this work we explore one-to-many style transfer from a dedicated single-speaker conversational corpus with style nuances and interjections. We elaborate on the corpus design and explore the feasibility of such style transfer when assisted with Voice-Conversion-based data augmentation. In a set of subjective listening experiments, this approach resulted in high-fidelity style transfer with no quality degradation. However, a certain voice persona shift was observed, requiring further improvements in voice conversion. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted for presentation at Interspeech 2022

arXiv:2108.10803 [pdf, ps, other]

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

Authors: Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

Abstract: When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to impro… ▽ More When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, when the model must deal with label sequences containing errors. In this paper we investigate approaches to reducing exposure bias in training to improve the generalization of RNNT models for automatic speech recognition (ASR). A label-preserving input perturbation to the prediction network is introduced. The input token sequences are perturbed using SwitchOut and scheduled sampling based on an additional token language model. Experiments conducted on the 300-hour Switchboard dataset demonstrate their effectiveness. By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: accepted to Interspeech 2021

arXiv:2101.09940 [pdf, other]

doi 10.1109/SLT48900.2021.9383591

Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis

Authors: Slava Shechtman, Raul Fernandez, David Haws

Abstract: Although Sequence-to-Sequence (S2S) architectures have become state-of-the-art in speech synthesis, capable of generating outputs that approach the perceptual quality of natural samples, they are limited by a lack of flexibility when it comes to controlling the output. In this work we present a framework capable of controlling the prosodic output via a set of concise, interpretable, disentangled p… ▽ More Although Sequence-to-Sequence (S2S) architectures have become state-of-the-art in speech synthesis, capable of generating outputs that approach the perceptual quality of natural samples, they are limited by a lack of flexibility when it comes to controlling the output. In this work we present a framework capable of controlling the prosodic output via a set of concise, interpretable, disentangled parameters. We apply this framework to the realization of emphatic lexical focus, proposing a variety of architectures designed to exploit different levels of supervision based on the availability of labeled resources. We evaluate these approaches via listening tests that demonstrate we are able to successfully realize controllable focus while maintaining the same, or higher, naturalness over an established baseline, and we explore how the different approaches compare when synthesizing in a target voice with or without labeled data. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: IEEE Spoken Language Technology Workshop (SLT), 2021

arXiv:1503.00829 [pdf, ps, other]

doi 10.1007/s10107-016-1087-2

Polyhedral aspects of score equivalence in Bayesian network structure learning

Authors: James Cussens, David Haws, Milan Studeny

Abstract: This paper deals with faces and facets of the family-variable polytope and the characteristic-imset polytope, which are special polytopes used in integer linear programming approaches to statistically learn Bayesian network structure. A common form of linear objectives to be maximized in this area leads to the concept of score equivalence (SE), both for linear objectives and for faces of the famil… ▽ More This paper deals with faces and facets of the family-variable polytope and the characteristic-imset polytope, which are special polytopes used in integer linear programming approaches to statistically learn Bayesian network structure. A common form of linear objectives to be maximized in this area leads to the concept of score equivalence (SE), both for linear objectives and for faces of the family-variable polytope. We characterize the linear space of SE objectives and establish a one-to-one correspondence between SE faces of the family-variable polytope, the faces of the characteristic-imset polytope, and standardized supermodular functions. The characterization of SE facets in terms of extremality of the corresponding supermodular function gives an elegant method to verify whether an inequality is SE-facet-defining for the family-variable polytope. We also show that when maximizing an SE objective one can eliminate linear constraints of the family-variable polytope that correspond to non-SE facets. However, we show that solely considering SE facets is not enough as a counter-example shows; one has to consider the linear inequality constraints that correspond to facets of the characteristic-imset polytope despite the fact that they may not define facets in the family-variable mode. △ Less

Submitted 10 April, 2015; v1 submitted 3 March, 2015; originally announced March 2015.

Comments: 37 pages

MSC Class: 90-02 ACM Class: G.1.6

Journal ref: Mathematical Programming A 164 (2017) n. 1-2, 285-324

arXiv:1310.1659 [pdf, ps, other]

MINT: Mutual Information based Transductive Feature Selection for Genetic Trait Prediction

Authors: Dan He, Irina Rish, David Haws, Simon Teyssedre, Zivan Karaman, Laxmi Parida

Abstract: Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a great deal of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology. As the number of genotypes is generally much bigger than the number of samples, predictive models suffer from the curse-of-dimensionality. The curse-of-dimensionality problem not onl… ▽ More Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a great deal of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology. As the number of genotypes is generally much bigger than the number of samples, predictive models suffer from the curse-of-dimensionality. The curse-of-dimensionality problem not only affects the computational efficiency of a particular genomic selection method, but can also lead to poor performance, mainly due to correlation among markers. In this work we proposed the first transductive feature selection method based on the MRMR (Max-Relevance and Min-Redundancy) criterion which we call MINT. We applied MINT on genetic trait prediction problems and showed that in general MINT is a better feature selection method than the state-of-the-art inductive method mRMR. △ Less

Submitted 6 October, 2013; originally announced October 2013.

arXiv:1310.1649 [pdf, other]

QuickLexSort: An efficient algorithm for lexicographically sorting nested restrictions of a database

Authors: David Haws

Abstract: Lexicographical sorting is a fundamental problem with applications to contingency tables, databases, Bayesian networks, and more. A standard method to lexicographically sort general data is to iteratively use a stable sort -- a sort which preserves existing orders. Here we present a new method of lexicographical sorting called QuickLexSort. Whereas a stable sort based lexicographical sorting algor… ▽ More Lexicographical sorting is a fundamental problem with applications to contingency tables, databases, Bayesian networks, and more. A standard method to lexicographically sort general data is to iteratively use a stable sort -- a sort which preserves existing orders. Here we present a new method of lexicographical sorting called QuickLexSort. Whereas a stable sort based lexicographical sorting algorithm operates from the least important to most important features, in contrast, QuickLexSort sorts from the most important to least important features, refining the sort as it goes. QuickLexSort first requires a one-time modest pre-processing step where each feature of the data set is sorted independently. When lexicographically sorting a database, QuickLexSort (including pre-processing) has comparable running time to using a stable sort based approach. For a data base with $m$ rows and $n$ columns, and a sorting algorithm running in time $O(mlog(m))$, a stable sort based lexicographical sort and QuickLexSort will both take time $O(nmlog(m))$. However in many applications one has the need to lexicographically sort nested data, e.g.\ all possible sub-matrices up to a certain cardinality of columns. In such cases we show QuickLexSort gives a performance improvement of a log factor of the database length (rows in matrix) over using a standard stable sort based approach. E.g.\ to sort all sub-matrices up to cardinality $k$, QuickLexSort has running time $O(mn^k)$ whereas a stable sort based lexicographical sort will take time $O(mlog(m)n^k)$. After the pre-processing step that is run only once for the entire matrix, QuickLexSort has a running time linear in the number of nested sub-matrices to sort. We conclude with an application to Bayesian network scoring to detect epistasis using SNP marker data. △ Less

Submitted 6 October, 2013; originally announced October 2013.

Comments: 17, 1 figure

MSC Class: 68Q25; 68P10; 62H17 ACM Class: F.2.2; G.3

arXiv:1204.3070 [pdf, other]

Markov degree of the three-state toric homogeneous Markov chain model

Authors: David Haws, Abraham Martín del Campo, Akimichi Takemura, Ruriko Yoshida

Abstract: We consider the three-state toric homogeneous Markov chain model (THMC) without loops and initial parameters. At time $T$, the size of the design matrix is $6 \times 3\cdot 2^{T-1}$ and the convex hull of its columns is the model polytope. We study the behavior of this polytope for $T\geq 3$ and we show that it is defined by 24 facets for all $T\ge 5$. Moreover, we give a complete description of t… ▽ More We consider the three-state toric homogeneous Markov chain model (THMC) without loops and initial parameters. At time $T$, the size of the design matrix is $6 \times 3\cdot 2^{T-1}$ and the convex hull of its columns is the model polytope. We study the behavior of this polytope for $T\geq 3$ and we show that it is defined by 24 facets for all $T\ge 5$. Moreover, we give a complete description of these facets. From this, we deduce that the toric ideal associated with the design matrix is generated by binomials of degree at most 6. Our proof is based on a result due to Sturmfels, who gave a bound on the degree of the generators of a toric ideal, provided the normality of the corresponding toric variety. In our setting, we established the normality of the toric variety associated to the THMC model by studying the geometric properties of the model polytope. △ Less

Submitted 17 September, 2013; v1 submitted 13 April, 2012; originally announced April 2012.

Comments: 26 pages, 1 figure

arXiv:1111.6518

Semigroups and sequential importance sampling for multiway tables

Authors: Ruriko Yoshida, Jing Xi, Shaoceng Wei, Feng Zhou, David Haws

Abstract: When an interval of integers between the lower bound $l_i$ and the upper bound $u_i$ is the support of the marginal distribution $n_i|(n_{i-1}, ...,n_1)$, Chen et al, 2005 noticed that sampling from the interval at each step, for $n_i$ during a sequential importance sampling (SIS) procedure, always produces a table which satisfies the marginal constraints. However, in general, the interval may not… ▽ More When an interval of integers between the lower bound $l_i$ and the upper bound $u_i$ is the support of the marginal distribution $n_i|(n_{i-1}, ...,n_1)$, Chen et al, 2005 noticed that sampling from the interval at each step, for $n_i$ during a sequential importance sampling (SIS) procedure, always produces a table which satisfies the marginal constraints. However, in general, the interval may not be equal to the support of the marginal distribution. In this case, the SIS procedure may produce tables which do not satisfy the marginal constraints, leading to rejection Chen et al 2006. In this paper we consider the uniform distribution as the target distribution. First we show that if we fix the number of rows and columns of the design matrix of the model for contingency tables then there exists a polynomial time algorithm in terms of the input size to sample a table from the set of all tables satisfying all marginals defined by the given model via the SIS procedure without rejection. We then show experimentally that in general the SIS procedure may have large rejection rates even with small tables. Further we show that in general the classical SIS procedure in Chen et al, 2005 can have a large rejection rate whose limit is one. When estimating the number of tables in our simulation study, we used the univariate and bivariate logistic regression models since under this model the SIS procedure seems to have higher rate of rejections even with small tables. △ Less

Submitted 18 January, 2018; v1 submitted 28 November, 2011; originally announced November 2011.

Comments: There are some theoretical mistakes. Thus, we would like to withdraw the paper

arXiv:1109.4453 [pdf, other]

Volumes and Tangent Cones of Matroid Polytopes

Authors: David C. Haws

Abstract: De Loera et al. 2009, showed that when the rank is fixed the Ehrhart polynomial of a matroid polytope can be computed in polynomial time when the number of elements varies. A key to proving this is the fact that the number of simplicial cones in any triangulation of a tangent cone is bounded polynomially in the number of elements when the rank is fixed. The authors speculated whether or not the Eh… ▽ More De Loera et al. 2009, showed that when the rank is fixed the Ehrhart polynomial of a matroid polytope can be computed in polynomial time when the number of elements varies. A key to proving this is the fact that the number of simplicial cones in any triangulation of a tangent cone is bounded polynomially in the number of elements when the rank is fixed. The authors speculated whether or not the Ehrhart polynomial could be computed in polynomial time in terms of the number of bases, where the number of elements and rank are allowed to vary. We show here that for the uniform matroid of rank $r$ on $n$ elements, the number of simplicial cones in any triangulation of a tangent cone is $n-2 \choose r-1$. Therefore, if the rank is allowed to vary, the number of simplicial cones grows exponentially in $n$. Thus, it is unlikely that a Brion-Lawrence type of approach, such as Barvinok's Algorithm, can compute the Ehrhart polynomial efficiently when the rank varies with the number of elements. To prove this result, we provide a triangulation in which the maximal simplicies are in bijection with the spanning thrackles of the complete bipartite graph $K_{r,n-r}$. △ Less

Submitted 20 September, 2011; originally announced September 2011.

Comments: 10 pages, 5 figures

MSC Class: 05; 52B

arXiv:1108.5939 [pdf, other]

Estimating the number of zero-one multi-way tables via sequential importance sampling

Authors: Jing Xi, Ruriko Yoshida, David Haws

Abstract: In 2005, Chen et al introduced a sequential importance sampling (SIS) procedure to analyze zero-one two-way tables with given fixed marginal sums (row and column sums) via the conditional Poisson (CP) distribution. They showed that compared with Monte Carlo Markov chain (MCMC)-based approaches, their importance sampling method is more efficient in terms of running time and also provides an easy an… ▽ More In 2005, Chen et al introduced a sequential importance sampling (SIS) procedure to analyze zero-one two-way tables with given fixed marginal sums (row and column sums) via the conditional Poisson (CP) distribution. They showed that compared with Monte Carlo Markov chain (MCMC)-based approaches, their importance sampling method is more efficient in terms of running time and also provides an easy and accurate estimate of the total number of contingency tables with fixed marginal sums. In this paper we extend their result to zero-one multi-way ($d$-way, $d \geq 2$) contingency tables under the no $d$-way interaction model, i.e., with fixed $d - 1$ marginal sums. Also we show by simulations that the SIS procedure with CP distribution to estimate the number of zero-one three-way tables under the no three-way interaction model given marginal sums works very well even with some rejections. We also applied our method to Samson's monks' data set. We end with further questions on the SIS procedure on zero-one multi-way tables. △ Less

Submitted 28 November, 2011; v1 submitted 30 August, 2011; originally announced August 2011.

Comments: 1 figures, 16 pages

arXiv:1108.2311

Semigroups and sequential importance sampling for multiway tables and beyond

Authors: Jing Xi, Shaoceng Wei, Feng Zhou, Ruriko Yoshida, David Haws

Abstract: When an interval of integers between the lower bound l_i and the upper bounds u_i is the support of the marginal distribution n_i|(n_{i-1}, ...,n_1), Chen et al. 2005 noticed that sampling from the interval at each step, for n_i during the sequential importance sampling (SIS) procedure, always produces a table which satisfies the marginal constraints. However, in general, the interval may not be e… ▽ More When an interval of integers between the lower bound l_i and the upper bounds u_i is the support of the marginal distribution n_i|(n_{i-1}, ...,n_1), Chen et al. 2005 noticed that sampling from the interval at each step, for n_i during the sequential importance sampling (SIS) procedure, always produces a table which satisfies the marginal constraints. However, in general, the interval may not be equal to the support of the marginal distribution. In this case, the SIS procedure may produce tables which do not satisfy the marginal constraints, leading to rejection [Chen et al. 2006]. Rejecting tables is computationally expensive and incorrect proposal distributions result in biased estimators for the number of tables given its marginal sums. This paper has two focuses; (1) we propose a correction coefficient which corrects an interval of integers between the lower bound l_i and the upper bounds u_i to the support of the marginal distribution asymptotically even with rejections and with the same time complexity as the original SIS procedure (2) using univariate and bivariate logistic regression models, we present extensive experiments on simulated data sets for estimating the number of tables, and (3) we applied the volume test proposed by Diaconis and Efron 1985 on 2x2x6 randomly generated tables to compare the performance of SIS versus MCMC. When estimating the number of tables in our simulation study, we used univariate and bivariate logistic regression models since under these models the SIS procedure seems to have higher rate of rejections even with small tables. We also apply our correction coefficients to data sets on coronary heart disease and occurrence of esophageal cancer. △ Less

Submitted 15 November, 2011; v1 submitted 10 August, 2011; originally announced August 2011.

Comments: Jing Xi, Shaoceng Wei and Feng Zhou are joint first authors. Withdrawn for theoretical revisions

MSC Class: 62H17

arXiv:1108.0481 [pdf, other]

Degree Bounds for a Minimal Markov Basis for the Three-State Toric Homogeneous Markov Chain Model

Authors: David Haws, Abraham Martin Del Campo, Ruriko Yoshida

Abstract: We study the three state toric homogeneous Markov chain model and three special cases of it, namely: (i) when the initial state parameters are constant, (ii) without self-loops, and (iii) when both cases are satisfied at the same time. Using as a key tool a directed multigraph associated to the model, the state-graph, we give a bound on the number of vertices of the polytope associated to the mode… ▽ More We study the three state toric homogeneous Markov chain model and three special cases of it, namely: (i) when the initial state parameters are constant, (ii) without self-loops, and (iii) when both cases are satisfied at the same time. Using as a key tool a directed multigraph associated to the model, the state-graph, we give a bound on the number of vertices of the polytope associated to the model which does not depend on the time. Based on our computations, we also conjecture the stabilization of the f-vector of the polytope, analyze the normality of the semigroup, give conjectural bounds on the degree of the Markov bases. △ Less

Submitted 3 August, 2011; v1 submitted 2 August, 2011; originally announced August 2011.

MSC Class: 05cxx; 60J10; 52B20

arXiv:1107.4708 [pdf, ps, other]

On polyhedral approximations of polytopes for learning Bayes nets

Authors: Milan Studeny, David Haws

Abstract: We review three vector encodings of Bayesian network structures. The first one has recently been applied by Jaakkola 2010, the other two use special integral vectors formerly introduced, called imsets [Studeny 2005, Studeny 2010]. The central topic is the comparison of outer polyhedral approximations of the corresponding polytopes. We show how to transform the inequalities suggested by Jaakkola et… ▽ More We review three vector encodings of Bayesian network structures. The first one has recently been applied by Jaakkola 2010, the other two use special integral vectors formerly introduced, called imsets [Studeny 2005, Studeny 2010]. The central topic is the comparison of outer polyhedral approximations of the corresponding polytopes. We show how to transform the inequalities suggested by Jaakkola et al. to the framework of imsets. The result of our comparison is the observation that the implicit polyhedral approximation of the standard imset polytope suggested in [Studeny 2011] gives a closer approximation than the (transformed) explicit polyhedral approximation from [Jaakkola 2010]. Finally, we confirm a conjecture from [Studeny 2011] that the above-mentioned implicit polyhedral approximation of the standard imset polytope is an LP relaxation of the polytope. △ Less

Submitted 3 August, 2011; v1 submitted 23 July, 2011; originally announced July 2011.

MSC Class: 62H17 ACM Class: G.3

Journal ref: 2013, Journal of Algebraic Statistics, 4:1, 59-92

arXiv:1004.2101 [pdf, other]

Statistical Phylogenetic Tree Analysis Using Differences of Means

Authors: Elissaveta Arnaoudova, David Haws, Peter Huggins, Jerzy W. Jaromczyk, Neil Moore, Chris Schardl, Ruriko Yoshida

Abstract: We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by the input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as ho… ▽ More We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by the input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after embedding trees in a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel trick" which speeds up distance calculations when trees are embedded in a high-dimensional feature space, e.g. splits or quartets feature space. In this pilot study, first we test our statistical method's ability to distinguish between sets of gene trees generated under coalescence models with species trees of varying dissimilarity. We follow our simulation results with applications to various data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, {\tt Phylotree}, is provided to facilitate computational experiments. △ Less

Submitted 12 April, 2010; originally announced April 2010.

Comments: 17 pages, 6 figures

arXiv:1004.2073 [pdf, other]

Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope

Authors: David C. Haws, Terrell Hodge, Ruriko Yoshida

Abstract: Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin's formula applied to all binary trees. The BME method is related to the… ▽ More Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional over the BME polytope, the convex hull of the BME vectors obtained from Pauplin's formula applied to all binary trees. The BME method is related to the Neighbor Joining (NJ) algorithm, now known to be a greedy optimization of the BME principle. Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parametrized by disjoint clades. We show that these {\em clade-faces} are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly, we show that the BME cone and every NJ cone associated to a tree $T$ have an intersection of positive measure. △ Less

Submitted 3 February, 2011; v1 submitted 12 April, 2010; originally announced April 2010.

Comments: 24 pages,4 figure

MSC Class: 52B11; 92D15

arXiv:0911.0645 [pdf, ps, other]

Bayes estimators for phylogenetic reconstruction

Authors: Peter Huggins, Wenbin Li, David Haws, Thomas Friedrich, Jinze Liu, Ruriko Yoshida

Abstract: Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet most reconstruction methods like ML do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate which is closest on average to the samples. This ``median'' tree is known as the Bayes es… ▽ More Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet most reconstruction methods like ML do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate which is closest on average to the samples. This ``median'' tree is known as the Bayes estimator (BE). The BE literally maximizes posterior expected accuracy, measured in terms of closeness (distance) to the true tree. We discuss a unified framework of BE trees, focusing especially on tree distances which are expressible as squared euclidean distances. Notable examples include Robinson--Foulds distance, quartet distance, and squared path difference. Using simulated data, we show Bayes estimators can be efficiently computed in practice by hill climbing. We also show that Bayes estimators achieve higher accuracy, compared to maximum likelihood and neighbor joining. △ Less

Submitted 21 November, 2009; v1 submitted 3 November, 2009; originally announced November 2009.

Comments: 31 pages, 4 figures, and 3 tables

arXiv:0905.4405 [pdf, other]

Matroid Polytopes: Algorithms, Theory, and Applications

Authors: David C. Haws

Abstract: This dissertation presents new results on three different themes all related to matroid polytopes. First we investigate properties of Ehrhart polynomials of matroid polytopes, independence matroid polytopes, and polymatroids. We prove that for fixed rank their Ehrhart polynomials are computable in polynomial time. The proof relies on the geometry of these polytopes as well as a new refined analy… ▽ More This dissertation presents new results on three different themes all related to matroid polytopes. First we investigate properties of Ehrhart polynomials of matroid polytopes, independence matroid polytopes, and polymatroids. We prove that for fixed rank their Ehrhart polynomials are computable in polynomial time. The proof relies on the geometry of these polytopes as well as a new refined analysis of the evaluation of Todd polynomials. Second, we discuss theoretical results regarding the algebraic combinatorics of matroid polytopes. We discuss two conjectures about the h^*-vector and coefficients of Ehrhart polynomials of matroid polytopes and provide theoretical and computational evidence for their validity. We also explore a variant of White's conjecture which states that every matroid polytope has a regular unimodular triangulation. We provide extensive computational evidence supporting this new conjecture and propose a combinatorial condition on simplices sufficient for unimodularity. Finally, motivated by recent work on algorithmic theory for non-linear and multicriteria matroid optimization, we have developed algorithms and heuristics aimed at practical solutions of large instances of these difficult problems. Our methods primarily use the local adjacency structure inherent in matroid polytopes to pivot to feasible solutions which may or may not be optimal. We also present a modified breadth-first-search heuristic that uses adjacency to enumerate a subset of feasible solutions. We present other heuristics, and provide computational evidence supporting these new techniques. We implemented all of our algorithms in the software package MOCHA (Matroids Optimization Combinatorial Heuristics and Algorithms). △ Less

Submitted 27 May, 2009; originally announced May 2009.

MSC Class: 52B40; 90C30; 05B35; 90C27

arXiv:0710.4346 [pdf, ps, other]

doi 10.1007/s00454-008-9080-z

Ehrhart polynomials of matroid polytopes and polymatroids

Authors: Jesús A. De Loera, David C. Haws, Matthias Köppe

Abstract: We investigate properties of Ehrhart polynomials for matroid polytopes, independence matroid polytopes, and polymatroids. In the first half of the paper we prove that for fixed rank their Ehrhart polynomials are computable in polynomial time. The proof relies on the geometry of these polytopes as well as a new refined analysis of the evaluation of Todd polynomials. In the second half we discuss… ▽ More We investigate properties of Ehrhart polynomials for matroid polytopes, independence matroid polytopes, and polymatroids. In the first half of the paper we prove that for fixed rank their Ehrhart polynomials are computable in polynomial time. The proof relies on the geometry of these polytopes as well as a new refined analysis of the evaluation of Todd polynomials. In the second half we discuss two conjectures about the h^*-vector and the coefficients of Ehrhart polynomials of matroid polytopes; we provide theoretical and computational evidence for their validity. △ Less

Submitted 23 October, 2007; originally announced October 2007.

Comments: 28 pages, 6 figures, submitted to Discrete and Computational Geometry

MSC Class: 05; 52B

Journal ref: Discrete Comput. Geom. 42 (2009), no. 4, 670-702

arXiv:math/0307350 [pdf, ps, other]

Short Rational Functions for Toric Algebra and Applications

Authors: Jesus De Loera, David Haws, Raymond Hemmecke, Peter Huggins, Bernd Sturmfels, Ruriko Yoshida

Abstract: We encode the binomials belonging to the toric ideal $I_A$ associated with an integral $d \times n$ matrix $A$ using a short sum of rational functions as introduced by Barvinok \cite{bar,newbar}. Under the assumption that $d,n$ are fixed, this representation allows us to compute the Graver basis and the reduced Gröbner basis of the ideal $I_A$, with respect to any term order, in time polynomial… ▽ More We encode the binomials belonging to the toric ideal $I_A$ associated with an integral $d \times n$ matrix $A$ using a short sum of rational functions as introduced by Barvinok \cite{bar,newbar}. Under the assumption that $d,n$ are fixed, this representation allows us to compute the Graver basis and the reduced Gröbner basis of the ideal $I_A$, with respect to any term order, in time polynomial in the size of the input. We also derive a polynomial time algorithm for normal form computation which replaces in this new encoding the usual reductions typical of the division algorithm. We describe other applications, such as the computation of Hilbert series of normal semigroup rings, and we indicate further connections to integer programming and statistics. △ Less

Submitted 26 July, 2003; originally announced July 2003.

Comments: 13 pages, using elsart.sty and elsart.cls

MSC Class: 05A15 (primary); 13P10 (secondary)

Showing 1–23 of 23 results for author: Haws, D