Skip to main content

Showing 1–37 of 37 results for author: Vitanyi, P M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2201.01222  [pdf, other

    cs.LG cs.CV

    The cluster structure function

    Authors: Andrew R. Cohen, Paul M. B. Vitányi

    Abstract: For each partition of a data set into a given number of parts there is a partition such that every part is as much as possible a good model (an "algorithmic sufficient statistic") for the data in that part. Since this can be done for every number between one and the number of data, the result is a function, the cluster structure function. It maps the number of parts of a partition to values relate… ▽ More

    Submitted 14 October, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

  2. Logical depth for reversible Turing machines with an application to the rate of decrease in logical depth for general Turing machines

    Authors: Paul MB Vitanyi

    Abstract: The logical depth of a {\em reversible} Turing machine equals the shortest running time of a shortest program for it. This is applied to show that the result in L.F. Antunes, A. Souto, and P.M.B. Vitányi, On the Rate of Decrease in Logical Depth, Theor. Comput. Sci., 702(2017), 60--64 is valid notwithstanding the error noted in Corrigendum P.M.B. Vitányi, Corrigendum to "On the rate of decrease in… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: Latex 4 pages

    Journal ref: Theor. Comput. Sci., 778(2019), 78-80

  3. Identification of Probabilities

    Authors: Paul M. B. Vitanyi, Nick Chater

    Abstract: Within psychology, neuroscience and artificial intelligence, there has been increasing interest in the proposal that the brain builds probabilistic models of sensory and linguistic input: that is, to infer a probabilistic model from a sample. The practical problems of such inference are substantial: the brain has limited data and restricted computational resources. But there is a more fundamental… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

    Comments: 31 pages LaTeX. arXiv admin note: substantial text overlap with arXiv:1311.7385

    Journal ref: Journal of Mathematical Psychology 51, 135-163 (2007)

  4. Web Similarity in Sets of Search Terms using Database Queries

    Authors: Andrew R. Cohen, Paul M. B. Vitanyi

    Abstract: Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or another large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a common similarity (common semantics) on a scale from 0 (identical) to 1 (completely different). The NWD approximates the simil… ▽ More

    Submitted 23 July, 2020; v1 submitted 20 February, 2015; originally announced February 2015.

    Comments: LaTeX 18 pages, 3 tables. A precursor is arXiv:1308.3177

    Journal ref: SN COMPUT. SCI. 1, 161(2020)

  5. arXiv:1501.06461  [pdf, ps, other

    cs.DS cs.CC

    On The Average-Case Complexity of Shellsort

    Authors: Paul M. B. Vitanyi

    Abstract: We prove a lower bound expressed in the increment sequence on the average-case complexity of the number of inversions of Shellsort. This lower bound is sharp in every case where it could be checked. A special case of this lower bound yields the general Jiang-Li-Vitányi lower bound. We obtain new results e.g. determining the average-case complexity precisely in the Yao-Janson-Knuth 3-pass case.

    Submitted 8 February, 2017; v1 submitted 26 January, 2015; originally announced January 2015.

    Comments: 13 pages LaTeX

    Journal ref: Random Structures and Algorithms, 52:2(2018), 354-363

  6. arXiv:1410.7328  [pdf, ps, other

    cs.IT cs.CC cs.CV cs.DM

    Exact Expression For Information Distance

    Authors: P. M. B. Vitanyi

    Abstract: Information distance can be defined not only between two strings but also in a finite multiset of strings of cardinality greater than two. We give an elementary proof for expressing the information distance in terms of plain Kolmogorov complexity. It is exact since for each cardinality of the multiset the lower bound for some multiset equals the upper bound for all multisets up to a constant addit… ▽ More

    Submitted 11 July, 2017; v1 submitted 27 October, 2014; originally announced October 2014.

    Comments: 6 pages LaTeX. added material and corrected it

    Journal ref: IEEE Trans. Inform. Theory, 63:8(2017), 4725-4728

  7. arXiv:1409.4276  [pdf, ps, other

    cs.LG cs.CE cs.DS

    A Fast Quartet Tree Heuristic for Hierarchical Clustering

    Authors: Rudi L. Cilibrasi, Paul M. B. Vitanyi

    Abstract: The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the $3{n \choose 4}$ weighted quartet topologies on $n$ objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing… ▽ More

    Submitted 12 September, 2014; originally announced September 2014.

    Comments: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with arXiv:cs/0606048 in cs.DS

    Journal ref: Pattern Recognition, 44 (2011) 662-677

  8. arXiv:1311.7385  [pdf, ps, other

    cs.LG

    Algorithmic Identification of Probabilities

    Authors: Paul M. B. Vitanyi, Nick Chater

    Abstract: TThe problem is to identify a probability associated with a set of natural numbers, given an infinite data sequence of elements from the set. If the given sequence is drawn i.i.d. and the probability mass function involved (the target) belongs to a computably enumerable (c.e.) or co-computably enumerable (co-c.e.) set of computable probability mass functions, then there is an algorithm to almost s… ▽ More

    Submitted 11 July, 2014; v1 submitted 28 November, 2013; originally announced November 2013.

    Comments: 19 pages LaTeX.Corrected errors and rewrote the entire paper. arXiv admin note: text overlap with arXiv:1208.5003

  9. arXiv:1310.6976  [pdf, ps, other

    cs.CC

    On Logical Depth and the Running Time of Shortest Programs

    Authors: L. Antunes, A. Souto, P. M. B. Vitanyi

    Abstract: The logical depth with significance $b$ of a finite binary string $x$ is the shortest running time of a binary program for $x$ that can be compressed by at most $b$ bits. There is another definition of logical depth. We give two theorems about the quantitative relation between these versions: the first theorem concerns a variation of a known fact with a new proof, the second theorem and its proof… ▽ More

    Submitted 25 October, 2013; originally announced October 2013.

    Comments: 12 pages LaTex (this supercedes arXiv:1301.4451)

  10. arXiv:1308.3177  [pdf, other

    cs.IR cs.LG

    Normalized Google Distance of Multisets with Applications

    Authors: Andrew R. Cohen, P. M. B. Vitanyi

    Abstract: Normalized Google distance (NGD) is a relative semantic distance based on the World Wide Web (or any other large electronic database, for instance Wikipedia) and a search engine that returns aggregate page counts. The earlier NGD between pairs of search terms (including phrases) is not sufficient for all applications. We propose an NGD of finite multisets of search terms that is better for many ap… ▽ More

    Submitted 14 August, 2013; originally announced August 2013.

    Comments: 25 pages, LaTeX, 3 figures/tables

  11. arXiv:1301.4451  [pdf, ps, other

    cs.CC

    On the logical depth function

    Authors: L. Antunes, A. Souto, A. Teixeira, P. M. B. Vitanyi

    Abstract: For a finite binary string $x$ its logical depth $d$ for significance $b$ is the shortest running time of a program for $x$ of length $K(x)+b$. There is another definition of logical depth. We give a new proof that the two versions are close. There is an infinite sequence of strings of consecutive lengths such that for every string there is a $b$ such that incrementing $b$ by 1 makes the associate… ▽ More

    Submitted 5 July, 2013; v1 submitted 18 January, 2013; originally announced January 2013.

    Comments: 11 pages LaTeX; previous version was incorrect, this is a new version with almost the same results

  12. arXiv:1301.4432  [pdf

    cs.CL

    Language learning from positive evidence, reconsidered: A simplicity-based approach

    Authors: Anne S. Hsu, Nick Chater, Paul M. B. Vitányi

    Abstract: Children learn their native language by exposure to their linguistic and communicative environment, but apparently without requiring that their mistakes are corrected. Such learning from positive evidence has been viewed as raising logical problems for language acquisition. In particular, without correction, how is the child to recover from conjecturing an over-general grammar, which will be consi… ▽ More

    Submitted 18 January, 2013; originally announced January 2013.

    Comments: 39 pages, pdf, 1 figure

    Journal ref: A.S. Hsu, N. Chater, P.M.B. Vitanyi, Language learning from positive evidence, reconsidered: A simplicity-based approach. Topics in Cognitive Science, 5:1(2013), 35-55

  13. arXiv:1212.5711  [pdf, other

    cs.CV cs.IT physics.data-an

    Normalized Compression Distance of Multisets with Applications

    Authors: Andrew R. Cohen, Paul M. B. Vitanyi

    Abstract: Normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity measure between a pair of finite objects based on compression. However, it is not sufficient for all applications. We propose an NCD of finite multisets (a.k.a. multiples) of finite objects that is also a metric. Previously, attempts to obtain such an NCD failed. We cover the entire trajectory from… ▽ More

    Submitted 29 March, 2013; v1 submitted 22 December, 2012; originally announced December 2012.

    Comments: LaTeX 28 pages, 3 figures. This version is changed from the preliminary version to the final version. Updates of the theory. How to compute it, special recepies for classification, more applications and better results (see abstract and especially the detailed results in the paper). The title was changed to reflect this. In v4 corrected the proof of Theorem III-7

    ACM Class: I.5.3; H.3.3; E.4; J.3

    Journal ref: IEEE Trans. Pattern Analysis and Machine Intelligence, 37:8(2015), 1602-1614

  14. arXiv:1208.5003   

    cs.LG math.PR

    Identification of Probabilities of Languages

    Authors: Paul M. B. Vitanyi, Nick Chater

    Abstract: We consider the problem of inferring the probability distribution associated with a language, given data consisting of an infinite sequence of elements of the languge. We do this under two assumptions on the algorithms concerned: (i) like a real-life algorothm it has round-off errors, and (ii) it has no round-off errors. Assuming (i) we (a) consider a probability mass function of the elements of t… ▽ More

    Submitted 15 July, 2014; v1 submitted 24 August, 2012; originally announced August 2012.

    Comments: 23 pages LaTeX, no pictures 1311.7385 This paper has been withdrawn by the auther due to crucial errors. The same subject is attacked more succesfully with reduced claims in ArXiV 1311.7385

    MSC Class: 68

  15. arXiv:1206.0983  [pdf, ps, other

    cs.IT

    Conditional Kolmogorov Complexity and Universal Probability

    Authors: Paul M. B. Vitanyi

    Abstract: The Coding Theorem of L.A. Levin connects unconditional prefix Kolmogorov complexity with the discrete universal distribution. There are conditional versions referred to in several publications but as yet there exist no written proofs in English. Here we provide those proofs. They use a different definition than the standard one for the conditional version of the discrete universal distribution. U… ▽ More

    Submitted 22 January, 2013; v1 submitted 5 June, 2012; originally announced June 2012.

    Comments: 17 pages (LaTeX); Corrected previous version. arXiv admin note: text overlap with arXiv:cs/0204037

    MSC Class: 68Q30; 03D32

  16. arXiv:1201.1223  [pdf, ps, other

    cs.CC

    Turing Machines and Understanding Computational Complexity

    Authors: P. M. B. Vitanyi

    Abstract: We describe the Turing Machine, list some of its many influences on the theory of computation and complexity of computations, and illustrate its importance.

    Submitted 5 January, 2012; originally announced January 2012.

    Comments: 9 pages, 1 figure, LaTeX. To appear in: Alan Turing - His Work and Impact, Elsevier

    Journal ref: In: S. Barry Cooper, Jan van Leeuwen (eds.), "Alan Turing: His Work and Impact", Elsevier, Amsterdam, London, New York, Tokyo, 2013, pp.57-63

  17. arXiv:1201.1221  [pdf, ps, other

    cs.CV cs.IT physics.data-an

    Information Distance: New Developments

    Authors: P. M. B. Vitanyi

    Abstract: In pattern recognition, learning, and data mining one obtains information from information-carrying objects. This involves an objective definition of the information in a single object, the information to go from one object to another object in a pair of objects, the information to go from one object to any other object in a multiple of objects, and the shared information between objects. This is… ▽ More

    Submitted 5 January, 2012; originally announced January 2012.

    Comments: 4 pages, Latex; Series of Publications C, Report C-2011-45, Department of Computer Science, University of Helsinki, pp. 71-74

    Journal ref: Proc. 4th Workshop on Information Theoretic Methods in Science and Engineering (WITSME 2011), 2011, pp. 71-74

  18. arXiv:1110.4544  [pdf, ps, other

    cs.IT

    Compression-based Similarity

    Authors: Paul M. B. Vitanyi

    Abstract: First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case… ▽ More

    Submitted 20 October, 2011; originally announced October 2011.

    Comments: Latex, 8 pages, 2 fgures, in Proc. IEEE 1st Int. Conf. Data Compression, Communication and Processing, Palurno, Italy, June 21-24, 2011, 111--118

  19. arXiv:1103.5985  [pdf, ps, other

    cs.IT cs.LG

    On Empirical Entropy

    Authors: Paul M. B. Vitányi

    Abstract: We propose a compression-based version of the empirical entropy of a finite string over a finite alphabet. Whereas previously one considers the naked entropy of (possibly higher order) Markov processes, we consider the sum of the description of the random variable involved plus the entropy it induces. We assume only that the distribution involved is computable. To test the new notion we compare th… ▽ More

    Submitted 30 March, 2011; originally announced March 2011.

    Comments: 14 pages, LaTeX

    MSC Class: 68; 94 ACM Class: H.1; F.1; J.1

  20. arXiv:1006.3520  [pdf, ps, other

    cs.IT math.PR physics.data-an

    Information Distance

    Authors: Charles H. Bennett, Peter Gacs, Ming Li, Paul M. B. Vitanyi, Wojciech H. Zurek

    Abstract: While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures. We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (di… ▽ More

    Submitted 17 June, 2010; originally announced June 2010.

    Comments: 39 pages, LaTeX, 2 Figures/Tables

    MSC Class: 68Q30; 94A15; 94A17

    Journal ref: C.H. Bennett, P. Gács, M. Li, P.M.B. Vitányi, and W. Zurek, Information Distance, IEEE Trans. Information Theory, 44:4(1998) 1407--1423

  21. arXiv:1006.3275  [pdf, ps, other

    cs.CC cs.CV physics.data-an

    Normalized Information Distance is Not Semicomputable

    Authors: Sebastiaan A. Terwijn, Leen Torenvliet, Paul M. B. Vitanyi

    Abstract: Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called 'normalized compression distance' and it is trivially computable. It is a parameter-free similarity measure based on compres… ▽ More

    Submitted 16 June, 2010; originally announced June 2010.

    Comments: 9 pages, LaTeX, No figures, To appear in J. Comput. Syst. Sci

    MSC Class: 03Dxx; 62B10; 68T10; 91C20

  22. arXiv:1006.3271  [pdf

    cs.CL physics.data-an q-bio.NC

    The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis

    Authors: Anne S. Hsu, Nick Chater, Paul M. B. Vitanyi

    Abstract: There is much debate over the degree to which language learning is governed by innate language-specific biases, or acquired through cognition-general principles. Here we examine the probabilistic language acquisition hypothesis on three levels: We outline a novel theoretical result showing that it is possible to learn the exact generative model underlying a wide class of languages, purely from obs… ▽ More

    Submitted 16 June, 2010; originally announced June 2010.

    Comments: 26 pages, pdf, 4 figures, Submitted to "Cognition"

    MSC Class: 91E10; 97C30; 68T50

  23. arXiv:0910.4353  [pdf, ps, other

    cs.CC cs.IT

    Nonapproximablity of the Normalized Information Distance

    Authors: Sebastiaan A. Terwijn, Leen Torenvliet, Paul M. B. Vitanyi

    Abstract: Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called `normalized compression distance' and it is trivially computable. It is a parameter-free similarity measure based on compr… ▽ More

    Submitted 23 October, 2009; v1 submitted 22 October, 2009; originally announced October 2009.

    Comments: LaTeX 8 pages, Submitted. 2nd version corrected some typos

  24. arXiv:0906.0731  [pdf, ps

    cs.DC cs.DS

    Distributed elections in an Archimedean ring of processors

    Authors: Paul M. B. Vitanyi

    Abstract: Unlimited asynchronism is intolerable in real physically distributed computer systems. Such systems, synchronous or not, use clocks and timeouts. Therefore the magnitudes of elapsed absolute time in the system need to satisfy the axiom of Archimedes. Under this restriction of asynchronicity logically time-independent solutions can be derived which are nonetheless better (in number of message pas… ▽ More

    Submitted 27 May, 2009; originally announced June 2009.

    Journal ref: 16th ACM Symposium on Theory of Computing, Washington D.C., 1984, 542 - 547

  25. arXiv:0905.4452  [pdf, ps, other

    cs.DS cs.CC

    Analysis of Sorting Algorithms by Kolmogorov Complexity (A Survey)

    Authors: Paul M. B. Vitanyi

    Abstract: Recently, many results on the computational complexity of sorting algorithms were obtained using Kolmogorov complexity (the incompressibility method). Especially, the usually hard average-case analysis is ammenable to this method. Here we survey such results about Bubblesort, Heapsort, Shellsort, Dobosiewicz-sort, Shakersort, and sorting with stacks and queues in sequential or parallel mode. Esp… ▽ More

    Submitted 27 May, 2009; originally announced May 2009.

    Comments: 18 Pages, 2 figures, LaTeX

    Journal ref: Pp.209--232 in: In: Entropy, Search, Complexity, Bolyai Society Mathematical Studies, 16, I. Csiszar, G.O.H. Katona, G. Tardos, Eds., Springer-Verlag, 2007

  26. arXiv:0905.4039  [pdf, ps, other

    cs.CL cs.IR

    Normalized Web Distance and Word Similarity

    Authors: Rudi L. Cilibrasi, Paul M. B. Vitanyi

    Abstract: There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Intern… ▽ More

    Submitted 25 May, 2009; originally announced May 2009.

    Comments: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-1420085921

  27. arXiv:0905.3347  [pdf, ps, other

    cs.CV cs.LG

    Information Distance in Multiples

    Authors: Paul M. B. Vitanyi

    Abstract: Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering, and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity, and normalized information distance in multiples. We use the the… ▽ More

    Submitted 20 May, 2009; originally announced May 2009.

    Comments: LateX 14 pages, Submitted to a technical journal

    ACM Class: J.3; E.4

  28. arXiv:0809.2965  [pdf, ps, other

    cs.CC cs.IT

    On Time-Bounded Incompressibility of Compressible Strings and Sequences

    Authors: E. G. Daylight, W. M. Koolen, P. M. B. Vitanyi

    Abstract: For every total recursive time bound $t$, a constant fraction of all compressible (low Kolmogorov complexity) strings is $t$-bounded incompressible (high time-bounded Kolmogorov complexity); there are uncountably many infinite sequences of which every initial segment of length $n$ is compressible to $\log n$ yet $t$-bounded incompressible below ${1/4}n - \log n$; and there are countable infinite… ▽ More

    Submitted 11 August, 2009; v1 submitted 17 September, 2008; originally announced September 2008.

    Comments: 9 pages, LaTeX, no figures, submitted to Information Processing Letters. Changed and added a Barzdins-like lemma for infinite sequences with different quantification oreder, a fixed constant, and uncountably many sequences

  29. arXiv:0809.2754  [pdf, ps, other

    cs.IT cs.LG math.ST

    Algorithmic information theory

    Authors: Peter D. Grunwald, Paul M. B. Vitanyi

    Abstract: We introduce algorithmic information theory, also known as the theory of Kolmogorov complexity. We explain the main concepts of this quantitative approach to defining `information'. We discuss the extent to which Kolmogorov's and Shannon's information theory have a common purpose, and where they are fundamentally different. We indicate how recent developments within the theory allow one to forma… ▽ More

    Submitted 17 September, 2008; v1 submitted 16 September, 2008; originally announced September 2008.

    Comments: 37 pages, 2 figures, pdf, in: Philosophy of Information, P. Adriaans and J. van Benthem, Eds., A volume in Handbook of the philosophy of science, D. Gabbay, P. Thagard, and J. Woods, Eds., Elsevier, 2008. In version 1 of September 16 the refs are missing. Corrected in version 2 of September 17

  30. arXiv:0809.2553  [pdf

    cs.IR cs.AI

    Normalized Information Distance

    Authors: Paul M. B. Vitanyi, Frank J. Balbach, Rudi L. Cilibrasi, Ming Li

    Abstract: The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide… ▽ More

    Submitted 15 September, 2008; originally announced September 2008.

    Comments: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appear

  31. arXiv:cs/0612133  [pdf, ps, other

    cs.IT cs.CC

    Tales of Huffman

    Authors: Paul M. B. Vitanyi, Zvi Lotker

    Abstract: We study the new problem of Huffman-like codes subject to individual restrictions on the code-word lengths of a subset of the source words. These are prefix codes with minimal expected code-word length for a random source where additionally the code-word lengths of a subset of the source words is prescribed, possibly differently for every such source word. Based on a structural analysis of prope… ▽ More

    Submitted 25 December, 2006; originally announced December 2006.

    Comments: LaTex 8 pages

  32. arXiv:cs/0612025  [pdf, ps, other

    cs.DC

    Registers

    Authors: Paul M. B. Vitanyi

    Abstract: Entry in: Encyclopedia of Algorithms, Ming-Yang Kao, Ed., Springer, To appear. Synonyms: Wait-free registers, wait-free shared variables, asynchronous communication hardware. Problem Definition: Consider a system of asynchronous processes that communicate among themselves by only executing read and write operations on a set of shared variables (also known as shared registers). The system has n… ▽ More

    Submitted 5 December, 2006; originally announced December 2006.

    Comments: 5 pages, LaTeX, Entry in: Encyclopedia of Algorithms, Ming-Yang Kao, Ed., Springer, To appear

  33. arXiv:cs/0606048  [pdf, ps, other

    cs.DS cs.CV cs.DM math.ST physics.data-an q-bio.QM

    A New Quartet Tree Heuristic for Hierarchical Clustering

    Authors: Rudi Cilibrasi, Paul M. B. Vitanyi

    Abstract: We consider the problem of constructing an an optimal-weight tree from the 3*(n choose 4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologiesis optimal (so it can be the case that the optimal tree embeds all quartets as non-optimal topologies). We present a heuristic for reconstructing the optimal-weight tree, and a canonical… ▽ More

    Submitted 11 June, 2006; originally announced June 2006.

    Comments: 22 pages, 14 figures

    ACM Class: F.2.2; G.1.6

  34. arXiv:cs/0412098  [pdf, ps, other

    cs.CL cs.AI cs.DB cs.IR cs.LG

    The Google Similarity Distance

    Authors: Rudi Cilibrasi, Paul M. B. Vitanyi

    Abstract: Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the w… ▽ More

    Submitted 30 May, 2007; v1 submitted 21 December, 2004; originally announced December 2004.

    Comments: 15 pages, 10 figures; changed some text/figures/notation/part of theorem. Incorporated referees comments. This is the final published version up to some minor changes in the galley proofs

    ACM Class: I.2.4; I.2.7

    Journal ref: R.L. Cilibrasi, P.M.B. Vitanyi, The Google Similarity Distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), 370-383

  35. arXiv:cs/0411014  [pdf, ps, other

    cs.IT

    Rate Distortion and Denoising of Individual Data Using Kolmogorov complexity

    Authors: Nikolai K. Vereshchagin, Paul M. B. Vitanyi

    Abstract: We examine the structure of families of distortion balls from the perspective of Kolmogorov complexity. Special attention is paid to the canonical rate-distortion function of a source word which returns the minimal Kolmogorov complexity of all distortion balls containing that word subject to a bound on their cardinality. This canonical rate-distortion function is related to the more standard alg… ▽ More

    Submitted 26 November, 2009; v1 submitted 6 November, 2004; originally announced November 2004.

    Comments: LaTex, 31 pages, 2 figures. The new version is again completely rewritten, newly titled, and adds new results

    ACM Class: E.4; H.1.1

  36. arXiv:math/0110086  [pdf, ps, other

    math.PR cs.CR math.ST physics.data-an

    Randomness

    Authors: Paul M. B. Vitanyi

    Abstract: Here we present in a single essay a combination and completion of the several aspects of the problem of randomness of individual objects which of necessity occur scattered in our texbook "An Introduction to Kolmogorov Complexity and Its Applications" (M. Li and P. Vitanyi), 2nd Ed., Springer-Verlag, 1997.

    Submitted 10 October, 2001; v1 submitted 8 October, 2001; originally announced October 2001.

    Comments: LaTeX source, 48 pages, Section contributed to `Matematica, Logica, Informatica' Volume 12 of the "Storia del XX Secolo", published by the "Instituto della Enciclopedia Italiana" (smal addition in new version)

    MSC Class: 60-02; 60A05; 62-02; 62A01

  37. arXiv:quant-ph/0102108  [pdf, ps, other

    quant-ph cs.CC cs.IT math.LO

    Quantum Kolmogorov Complexity Based on Classical Descriptions

    Authors: Paul M. B. Vitanyi

    Abstract: We develop a theory of the algorithmic information in bits contained in an individual pure quantum state. This extends classical Kolmogorov complexity to the quantum domain retaining classical descriptions. Quantum Kolmogorov complexity coincides with the classical Kolmogorov complexity on the classical domain. Quantum Kolmogorov complexity is upper bounded and can be effectively approximated fr… ▽ More

    Submitted 9 October, 2001; v1 submitted 21 February, 2001; originally announced February 2001.

    Comments: 17 pages, LaTeX, final and extended version of quant-ph/9907035, with corrections to the published journal version (the two displayed equations in the right-hand column on page 2466 had the left-hand sides of the displayed formulas erroneously interchanged)

    Journal ref: IEEE Transactions on Information Theory, Vol. 47, No. 6, September 2001, 2464-2479