-
Contemporary AI foundation models increase biological weapons risk
Abstract: The rapid advancement of artificial intelligence has raised concerns about its potential to facilitate biological weapons development. We argue existing safety assessments of contemporary foundation AI models underestimate this risk, largely due to flawed assumptions and inadequate evaluation methods. First, assessments mistakenly assume biological weapons development requires tacit knowledge, or… ▽ More
Submitted 12 June, 2025; originally announced June 2025.
Comments: 58 pages, 10 figures, 4 tables
Report number: WR-A3853-1
-
ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation
Abstract: The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning meth… ▽ More
Submitted 3 December, 2023; v1 submitted 11 September, 2023; originally announced September 2023.
-
HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for Highly Accurate Protein-Ligand Binding Affinity Prediction
Abstract: Applying deep learning concepts from image detection and graph theory has greatly advanced protein-ligand binding affinity prediction, a challenge with enormous ramifications for both drug discovery and protein engineering. We build upon these advances by designing a novel deep learning architecture consisting of a 3-dimensional convolutional neural network utilizing channel-wise attention and two… ▽ More
Submitted 28 March, 2023; v1 submitted 23 December, 2022; originally announced December 2022.
-
Computation of Maximal Determinants of Binary Circulant Matrices
Abstract: We describe algorithms for computing maximal determinants of binary circulant matrices of small orders. Here "binary matrix" means a matrix whose elements are drawn from $\{0,1\}$ or $\{-1,1\}$. We describe efficient parallel algorithms for the search, using Duval's algorithm for generation of necklaces and the well-known representation of the determinant of a circulant in terms of roots of unity.… ▽ More
Submitted 19 February, 2021; v1 submitted 1 January, 2018; originally announced January 2018.
Comments: 22 pages, 4 tables. Improved analysis of quantile estimation in v4, Table 2 updated in v5, Table 4 updated in v6
MSC Class: 05A15 (Primary) 05A19; 65T50 (Secondary)
Journal ref: Journal of Integer Sequences 21 (2018), article 18.5.6, 19 pp
-
arXiv:1303.2772 [pdf, ps, other]
Further analysis of the binary Euclidean algorithm
Abstract: The binary Euclidean algorithm is a variant of the classical Euclidean algorithm. It avoids multiplications and divisions, except by powers of two, so is potentially faster than the classical algorithm on a binary machine. We describe the binary algorithm and consider its average case behaviour. In particular, we correct some errors in the literature, discuss some results of Vallée, and describe a… ▽ More
Submitted 11 March, 2013; originally announced March 2013.
Comments: An old Technical Report which no longer seems to be available from the Oxford University website
Report number: PRG TR-7-99 MSC Class: 68Q25 (Primary) 68W40; 65Y20 (Secondary) ACM Class: F.2.1
Journal ref: Technical Report PRG TR-7-99, Oxford University Computing Laboratory, November 1999, 18 pp
-
arXiv:1212.1958 [pdf, ps, other]
Root optimization of polynomials in the number field sieve
Abstract: The general number field sieve (GNFS) is the most efficient algorithm known for factoring large integers. It consists of several stages, the first one being polynomial selection. The quality of the chosen polynomials in polynomial selection can be modelled in terms of size and root properties. In this paper, we describe some algorithms for selecting polynomials with very good root properties.
Submitted 9 December, 2012; originally announced December 2012.
Comments: 16 pages, 18 references
MSC Class: 11A51 (Primary) 11R09 (Secondary)
Journal ref: Mathematics of Computation 84 (2015), 2447-2457
-
Finding D-optimal designs by randomised decomposition and switching
Abstract: The Hadamard maximal determinant (maxdet) problem is to find the maximum determinant D(n) of a square {+1, -1} matrix of given order n. Such a matrix with maximum determinant is called a saturated D-optimal design. We consider some cases where n > 2 is not divisible by 4, so the Hadamard bound is not attainable, but bounds due to Barba or Ehlich and Wojtas may be attainable. If R is a matrix with… ▽ More
Submitted 11 August, 2012; v1 submitted 20 December, 2011; originally announced December 2011.
Comments: 18 pages, 3 figures, 5 tables (figures corrected in v4). v5 added a reference and made minor improvements. Presented at the International Workshop on Hadamard Matrices held in honour of Kathy Horadam's 60th birthday, Melbourne, Nov. 2011. Data files are available at http://maths.anu.edu.au/~brent/maxdet/
MSC Class: 05B20 (Primary) 15B34; 68R05 (Secondary) ACM Class: F.2.1
Journal ref: Australasian Journal of Combinatorics 55 (2013), 15-30. Erratum http://maths-people.anu.edu.au/~brent/pub/pub245_errata.html
-
arXiv:1108.0486 [pdf, ps, other]
High-Performance Pseudo-Random Number Generation on Graphics Processing Units
Abstract: This work considers the deployment of pseudo-random number generators (PRNGs) on graphics processing units (GPUs), developing an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of both speed and statis… ▽ More
Submitted 2 August, 2011; originally announced August 2011.
Comments: 10 pages, submitted to PPAM 2011 (Torun, Poland, 11-14 Sept. 2011). For further information, see http://maths.anu.edu.au/~brent/pub/pub241.html
MSC Class: 11K45 (Primary) 65C10; 65Y05; 65Y10 (Secondary) ACM Class: D.1.3; G.3; G.4; I.6.8
Journal ref: Lecture Notes in Computer Science, Vol. 7203 (2012), 609-618
-
arXiv:1108.0286 [pdf, ps, other]
Fast computation of Bernoulli, Tangent and Secant numbers
Abstract: We consider the computation of Bernoulli, Tangent (zag), and Secant (zig or Euler) numbers. In particular, we give asymptotically fast algorithms for computing the first n such numbers in O(n^2.(log n)^(2+o(1))) bit-operations. We also give very short in-place algorithms for computing the first n Tangent or Secant numbers in O(n^2) integer operations. These algorithms are extremely simple, and fas… ▽ More
Submitted 5 September, 2011; v1 submitted 1 August, 2011; originally announced August 2011.
Comments: 16 pages. To appear in Computational and Analytical Mathematics (associated with the May 2011 workshop in honour of Jonathan Borwein's 60th birthday). For further information, see http://maths.anu.edu.au/~brent/pub/pub242.html
MSC Class: 05A15 (Primary); 11B68; 11B83; 11-04; 11Y55; 11Y60; 65-04; 68R05 (Secondary) ACM Class: F.2.1
Journal ref: Springer Proceedings in Mathematics and Statistics, Vol. 50, 2013, 127-142
-
arXiv:1005.2314 [pdf, ps, other]
Some comments on C. S. Wallace's random number generators
Abstract: We outline some of Chris Wallace's contributions to pseudo-random number generation. In particular, we consider his idea for generating normally distributed variates without relying on a source of uniform random numbers, and compare it with more conventional methods for generating normal random numbers. Implementations of Wallace's idea can be very fast (approximately as fast as good uniform gener… ▽ More
Submitted 13 May, 2010; originally announced May 2010.
Comments: 13 pages. For further information, see http://wwwmaths.anu.edu.au/~brent/pub/pub213.html
MSC Class: 11K45 (Primary) 65-03; 65C10 (Secondary) ACM Class: G.3; G.4; K.2
Journal ref: The Computer Journal 51, 5 (Sept. 2008), 579-584
-
arXiv:1005.1967 [pdf, ps, other]
The great trinomial hunt
Abstract: We describe a search for primitive trinomials of high degree and its interaction with the Great Internet Mersenne prime search (GIMPS). The search is complete for trinomials whose degree is the exponent of a Mersenne prime, for all 47 currently known Mersenne primes.
Submitted 11 May, 2010; originally announced May 2010.
Comments: 16 pages. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub235.html
MSC Class: 11-04 (Primary); 11B83 (Secondary) ACM Class: G.2.1; G.4
Journal ref: Notices of the American Mathematical Society 58, 2 (2011), 233-239
-
arXiv:1005.1320 [pdf, ps, other]
The myth of equidistribution for high-dimensional simulation
Abstract: A pseudo-random number generator (RNG) might be used to generate w-bit random samples in d dimensions if the number of state bits is at least dw. Some RNGs perform better than others and the concept of equidistribution has been introduced in the literature in order to rank different RNGs. We define what it means for a RNG to be (d,w)-equidistributed, and then argue that (d,w)-equidistribution is n… ▽ More
Submitted 8 May, 2010; originally announced May 2010.
Comments: 8 pages. Based on material presented at a Workshop on High Dimensional Approximation held at the Australian National University, Canberra, 19 February 2007. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub240.html
MSC Class: 65C10 (Primary) 11K36; 11K38; 11K45 (Secondary) ACM Class: G.3; G.4
-
arXiv:1005.1206 [pdf, ps, other]
A Simple Approach to Error Reconciliation in Quantum Key Distribution
Abstract: We discuss the error reconciliation phase in quantum key distribution (QKD) and analyse a simple scheme in which blocks with bad parity (that is, blocks containing an odd number of errors) are discarded. We predict the performance of this scheme and show, using a simulation, that the prediction is accurate.
Submitted 7 May, 2010; originally announced May 2010.
Comments: 19 pages. Presented at the 53rd Annual Meeting of the Australian Mathematical Society, Adelaide, Oct 1, 2009. See also http://wwwmaths.anu.edu.au/~brent/pub/pub239.html
MSC Class: 81P94 (Primary); 94A60 (Secondary) ACM Class: E.3; F.2.2
-
arXiv:1004.5466 [pdf, ps, other]
On computing factors of cyclotomic polynomials
Abstract: For odd square-free n > 1 the n-th cyclotomic polynomial satisfies an identity of Gauss. There are similar identity of Aurifeuille, Le Lasseur and Lucas. These identities all involve certain polynomials with integer coefficients. We show how these coefficients can be computed by simple algorithms which require O(n^2) arithmetic operations and work over the integers. We also give explicit formulae… ▽ More
Submitted 30 April, 2010; originally announced April 2010.
Comments: 21 pages. An old Technical Report, submitted for archival purposes. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub135.html
Report number: Technical Report TR-CS-92-13, Department of Computer Science, Australian National University, September 1992, 21 pages. MSC Class: 11-04 (Primary) 05A15; 11T06; 11T22; 11T24; 11Y05; 11Y16; 12-04; 12E10; 12Y05 (Secondary) ACM Class: G.1.0; G.2.1
Journal ref: Mathematics of Computation 61 (1993), 131-149.
-
arXiv:1004.5439 [pdf, ps, other]
On the periods of generalized Fibonacci recurrences
Abstract: We give a simple condition for a linear recurrence (mod 2^w) of degree r to have the maximal possible period 2^(w-1).(2^r-1). It follows that the period is maximal in the cases of interest for pseudo-random number generation, i.e. for 3-term linear recurrences defined by trinomials which are primitive (mod 2) and of degree r > 2. We consider the enumeration of certain exceptional polynomials which… ▽ More
Submitted 29 April, 2010; originally announced April 2010.
Comments: 13 pages. An old Technical Report, submitted for archival purposes. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub133.html
Report number: Technical Report TR-CS-92-03, Computer Science Department, Australian National University, March 1992 (revised March 1993), 13 pages. MSC Class: 11Y55 (Primary) 05A15; 11-04; 12-04; 12E05; 12E10; 65C10; 68R05 (Secondary) ACM Class: F.2.1; G.3
Journal ref: Mathematics of Computation 63 (1994), 389-401.
-
arXiv:1004.5437 [pdf, ps, other]
Parallel algorithms in linear algebra
Abstract: This report provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributed-memory MIMD machines. To illustrate the basic concepts and key issues, we consider the problem of parallel solution of a nonsingular linear system by Gaussian elimination with partial pivoting. This problem has come to be regarded a… ▽ More
Submitted 29 April, 2010; originally announced April 2010.
Comments: 17 pages. An old Technical Report, submitted for archival purposes. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub128.html
Report number: Technical Report TR-CS-91-06, Computer Sciences Laboratory, Australian National University, Canberra, August 1991, 17 pages MSC Class: 65-01 (Primary) 65-02; 65F05; 65F15; 68-01; 68-02 (Secondary) ACM Class: C.1.2; C.1.4; D.1.3; G.1.0; G.4
Journal ref: Algorithms and Architectures: Proceedings of the Second NEC Research Symposium (edited by T. Ishiguro), SIAM, Philadelphia, 1993, 54-72
-
arXiv:1004.4710 [pdf, ps, other]
Modern Computer Arithmetic (version 0.5.1)
Abstract: This is a draft of a book about algorithms for performing arithmetic, and their implementation on modern computers. We are concerned with software more than hardware - we do not cover computer architecture or the design of computer hardware. Instead we focus on algorithms for efficiently performing arithmetic operations such as addition, multiplication and division, and their connections to topics… ▽ More
Submitted 27 April, 2010; originally announced April 2010.
Comments: Preliminary version of a book to be published by Cambridge University Press. xvi+247 pages. Cite as "Modern Computer Arithmetic, Version 0.5.1, 5 March 2010". For further details, updates and errata see http://wwwmaths.anu.edu.au/~brent/pub/pub226.html or http://www.loria.fr/~zimmerma/mca/pub226.html
MSC Class: 68-02 (Primary) 11Y16; 11Y60; 65G50; 65H05; 65Q99; 65Y04; 65Y20 (Secondary) ACM Class: B.2.4; F.2.1; G.1.0; G.1.2; G.1.5; G.2.1; G.4
Journal ref: Cambridge Monographs on Computational and Applied Mathematics (No. 18), Cambridge University Press, November 2010, 236 pages
-
arXiv:1004.3716 [pdf, ps, other]
Some linear-time algorithms for systolic arrays
Abstract: We survey some results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers. We show how n * n Toeplitz systems of linear equations can be solved in time O(n) on a linear ar… ▽ More
Submitted 21 April, 2010; originally announced April 2010.
Comments: Corrected version of an old (1983) paper. 23 pages. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub079.html
Report number: Report TR-CS-82-15, DCS, Australian National University, December 1982 MSC Class: 65Y05 (Primary) 37B15; 68Q10; 68Q80 (Secondary) ACM Class: G.1.3; B.6.1; C.1.3
Journal ref: Information Processing 83 (edited by R.E.A. Mason), North-Holland, Amsterdam, 1983, 865-876
-
arXiv:1004.3608 [pdf, ps, other]
The complexity of multiple-precision arithmetic
Abstract: In studying the complexity of iterative processes it is usually assumed that the arithmetic operations of addition, multiplication, and division can be performed in certain constant times. This assumption is invalid if the precision required increases as the computation proceeds. We give upper and lower bounds on the number of single-precision operations required to perform various multiple-precis… ▽ More
Submitted 19 March, 2021; v1 submitted 20 April, 2010; originally announced April 2010.
Comments: An old (1976) paper with a postscript (1999) describing more recent developments. 30 pages. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub032.html. Typos corrected in v2
MSC Class: 03D15 (Primary) 68Q17; 68Q25 (Secondary) ACM Class: F.2.1; G.1.0
Journal ref: The Complexity of Computational Problem Solving (edited by R. S. Anderssen and R. P. Brent), University of Queensland Press, Brisbane, 1976, 126-165
-
arXiv:1004.3412 [pdf, ps, other]
Multiple-precision zero-finding methods and the complexity of elementary function evaluation
Abstract: We consider methods for finding high-precision approximations to simple zeros of smooth functions. As an application, we give fast methods for evaluating the elementary functions log(x), exp(x), sin(x) etc. to high precision. For example, if x is a positive floating-point number with an n-bit fraction, then (under rather weak assumptions) an n-bit approximation to log(x) or exp(x) may be computed… ▽ More
Submitted 29 May, 2010; v1 submitted 20 April, 2010; originally announced April 2010.
Comments: An old (1975) paper with a postscript describing more recent developments. See also http://wwwmaths.anu.edu.au/~brent/pub/pub028.html
Report number: Interim Report ADA014059, Department of Computer Science, Carnegie-Mellon University (July 1975), ii+26 pages MSC Class: 11Y60 (Primary) 65Y20 (Secondary) ACM Class: F.2.1; G.1.0
Journal ref: Analytic Computational Complexity (edited by J. F. Traub), Academic Press, New York, 1975, 151-176
-
arXiv:1004.3366 [pdf, ps, other]
Some integer factorization algorithms using elliptic curves
Abstract: Lenstra's integer factorization algorithm is asymptotically one of the fastest known algorithms, and is ideally suited for parallel computation. We suggest a way in which the algorithm can be speeded up by the addition of a second phase. Under some plausible assumptions, the speedup is of order log(p), where p is the factor which is found. In practice the speedup is significant. We mention some re… ▽ More
Submitted 20 April, 2010; originally announced April 2010.
Comments: Corrected version of a paper that appeared in Australian Computer Science Communications 8 (1986), with postscript added 1998. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub102.html
MSC Class: 11A51 (Primary) 11Y16; 68Q25(Secondary) ACM Class: F.2.1
Journal ref: Australian Computer Science Communications 8 (1986), 149-163
-
MP users guide
Abstract: MP is a package of ANSI Standard Fortran (ANS X3.9-1966) subroutines for performing multiple-precision floating-point arithmetic and evaluating elementary and special functions. The subroutines are machine independent and the precision is arbitrary, subject to storage limitations. The User's Guide describes the routines and their calling sequences, example and test programs, use of the Augment pre… ▽ More
Submitted 19 April, 2010; v1 submitted 19 April, 2010; originally announced April 2010.
Comments: MP Users Guide (fourth edition), 73 pages. A technical report that was not published elsewhere, submitted for archival purposes. For further information see http://wwwmaths.anu.edu.au/~brent/pub/pub035.html
Report number: TR-CS-81-08, Department of Computer Science, Australian National University (June 1981) MSC Class: 97N80 (Primary); 11-04 (Secondary) ACM Class: G.1.0
-
arXiv:1004.3169 [pdf, ps, other]
Factorizations of Cunningham numbers with bases 13 to 99
Abstract: This Report updates the tables of factorizations of a^n +- 1 for 13 < a < 100, previously published as CWI Report NM-R9212 (June 1992) and updated in CWI Report NM-R9419 (Update 1, September 1994) and CWI Report NM-R9609 (Update 2, March 1996). A total of 951 new entries in the tables are given here. The factorizations are now complete for n < 76, and there are no composite cofactors smaller than… ▽ More
Submitted 19 April, 2010; v1 submitted 19 April, 2010; originally announced April 2010.
Comments: A Technical Report (December 2000) not published elsewhere, submitted for archival reasons. vi + 463 pages. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub200.html
Report number: PRG TR-14-00 MSC Class: 11Y05 (Primary); 11-04 (Secondary) ACM Class: F.2.1
-
arXiv:1004.3115 [pdf, ps, other]
Some long-period random number generators using shifts and xors
Abstract: Marsaglia recently introduced a class of xorshift random number generators (RNGs) with periods 2n-1 for n = 32, 64, etc. Here we give a generalisation of Marsaglia's xorshift generators in order to obtain fast and high-quality RNGs with extremely long periods. RNGs based on primitive trinomials may be unsatisfactory because a trinomial has very small weight. In contrast, our generators can be chos… ▽ More
Submitted 19 April, 2010; originally announced April 2010.
Comments: 11 pages
MSC Class: 11K45 ACM Class: G.3
Journal ref: ANZIAM Journal 48 (CTAC2006), C188-C202, 2007
-
arXiv:1004.3114 [pdf, ps, other]
A fast vectorised implementation of Wallace's normal random number generator
Abstract: Wallace has proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace'… ▽ More
Submitted 19 April, 2010; originally announced April 2010.
Comments: An old Technical Report, not published elsewhere. 9 pages. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub170.html
Report number: Technical Report TR-CS-97-07, Computer Sciences Laboratory, Australian National University, April 1997 MSC Class: 11K45 ACM Class: G.3
-
arXiv:1004.3108 [pdf, ps, other]
Uses of randomness in computation
Abstract: Random number generators are widely used in practical algorithms. Examples include simulation, number theory (primality testing and integer factorization), fault tolerance, routing, cryptography, optimization by simulated annealing, and perfect hashing. Complexity theory usually considers the worst-case behaviour of deterministic algorithms, but it can also consider average-case behaviour if it is… ▽ More
Submitted 19 April, 2010; v1 submitted 19 April, 2010; originally announced April 2010.
Comments: An old Technical Report, not published elsewhere. 14 pages. For further details see http://wwwmaths.anu.edu.au/~brent/pub/pub147.html
Report number: Technical Report TR-CS-94-06, Computer Sciences Laboratory, Australian National University, June 1994 MSC Class: 68W20 (Primary) 68Q25 (Secondary) ACM Class: F.2.1; F.2.2
-
arXiv:1004.3105 [pdf, ps, other]
Fast normal random number generators on vector processors
Abstract: We consider pseudo-random number generators suitable for vector processors. In particular, we describe vectorised implementations of the Box-Muller and Polar methods, and show that they give good performance on the Fujitsu VP2200. We also consider some other popular methods, e.g. the Ratio method of Kinderman and Monahan (1977) (as improved by Leva (1992)), and the method of Von Neumann and Forsyt… ▽ More
Submitted 19 April, 2010; v1 submitted 19 April, 2010; originally announced April 2010.
Comments: An old Technical Report, not published elsewhere. 6 pages. For details see http://wwwmaths.anu.edu.au/~brent/pub/pub141.html
Report number: Technical Report TR-CS-93-04, Computer Sciences Laboratory, Australian National University, March 1993. MSC Class: 11K45 ACM Class: G.3
-
arXiv:1004.2091 [pdf, ps, other]
An O(M(n) log n) algorithm for the Jacobi symbol
Abstract: The best known algorithm to compute the Jacobi symbol of two n-bit integers runs in time O(M(n) log n), using Schönhage's fast continued fraction algorithm combined with an identity due to Gauss. We give a different O(M(n) log n) algorithm based on the binary recursive gcd algorithm of Stehlé and Zimmermann. Our implementation - which to our knowledge is the first to run in time O(M(n) log n) - is… ▽ More
Submitted 1 June, 2010; v1 submitted 12 April, 2010; originally announced April 2010.
Comments: Submitted to ANTS IX (Nancy, July 2010)
MSC Class: 11Y16
Journal ref: Proc. ANTS-IX (Nancy, 19-23 July 2010), Lecture Notes in Computer Science, Vol. 6197, Springer-Verlag, 2010, 83-95
-
arXiv:0710.4410 [pdf, ps, other]
A Multi-level Blocking Distinct Degree Factorization Algorithm
Abstract: We give a new algorithm for performing the distinct-degree factorization of a polynomial P(x) over GF(2), using a multi-level blocking strategy. The coarsest level of blocking replaces GCD computations by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup (1992), and others. The novelty of our approach is that a finer level of blocking replaces multiplications by squarings… ▽ More
Submitted 24 October, 2007; originally announced October 2007.
Report number: INRIA Tech. Report RR-6331, Oct. 2007
Journal ref: Contemporary Mathematics 461 (2008) 47-58
-
An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery
Abstract: This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, word-order, and word frequency can be repla… ▽ More
Submitted 12 May, 1999; originally announced May 1999.
Comments: 65 double-spaced ms. pages including 3 figures
ACM Class: I.2.0; I.2.6; I.2.7
Journal ref: Brent, M. R. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning 34, 71-105
-
Segmenting speech without a lexicon: The roles of phonotactics and speech source
Abstract: Infants face the difficult problem of segmenting continuous speech into words without the benefit of a fully developed lexicon. Several sources of information in speech might help infants solve this problem, including prosody, semantic correlations and phonotactics. Research to date has focused on determining to which of these sources infants might be sensitive, but little work has been done to… ▽ More
Submitted 15 December, 1994; originally announced December 1994.
Comments: Uses wsuipa font package and latex-acl.sty