-
First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models
Authors:
Yuma Toji,
Jun Takahashi,
Vwani Roychowdhury,
Hideyuki Miyahara
Abstract:
Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades.
The recent rise of large language models (LLMs) has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent…
▽ More
Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades.
The recent rise of large language models (LLMs) has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities.
However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking.
In this work, inspired by the one-dimensional Potts model in statistical physics we construct a simple probabilistic language model that falls under the class of context sensitive grammars (CSG), and numerically demonstrate an unambiguous phase transition in the framework of a natural language model.
We explicitly show that a precisely defined order parameter -- that captures symbol frequency biases in the sentences generated by the language model -- changes from strictly 0 to a strictly nonzero value (in the infinite-length limit of sentences), implying a mathematical singularity arising when tuning the parameter of the stochastic language model we consider.
Furthermore, we identify the phase transition as a variant of the Berezinskii-Kosterlitz-Thouless (BKT) transition, which is known to exhibit critical properties not only at the transition point but also in the entire phase.
This finding leads to the possibility that critical properties in natural languages may not require careful fine-tuning nor self-organized criticality, but is generically explained by the underlying connection between language structures and the BKT phases.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Emergent invariance and scaling properties in the collective return dynamics of a stock market
Authors:
Hideyuki Miyahara,
Hai Qian,
Pavan Holur,
Vwani Roychowdhury
Abstract:
Several works have observed heavy-tailed behavior in the distributions of returns in different markets, which are observable indicators of underlying complex dynamics. Such prior works study return distributions that are marginalized across the individual stocks in the market, and do not track statistics about the joint distributions of returns conditioned on different stocks, which would be usefu…
▽ More
Several works have observed heavy-tailed behavior in the distributions of returns in different markets, which are observable indicators of underlying complex dynamics. Such prior works study return distributions that are marginalized across the individual stocks in the market, and do not track statistics about the joint distributions of returns conditioned on different stocks, which would be useful for optimizing inter-stock asset allocation strategies. As a step towards this goal, we study emergent phenomena in the distributions of returns as captured by their pairwise correlations. In particular, we consider the pairwise (between stocks $i,j$) partial correlations of returns with respect to the market mode, $c_{i,j}(τ)$, (thus, correcting for the baseline return behavior of the market), over different time horizons ($τ$), and discover two novel emergent phenomena: (i) the standardized distributions of the $c_{i,j}(τ)$'s are observed to be invariant of $τ$ ranging from from $1000 \textrm{min}$ (2.5 days) to $30000 \textrm{min}$ (2.5 months); (ii) the scaling of the standard deviation of $c_{i,j}(τ)$'s with $τ$ admits \iffalse within this regime is empirically observed to \fi good fits to simple model classes such as a power-law $τ^{-λ}$ or stretched exponential function $e^{-τ^β}$ ($λ,β> 0$). Moreover, the parameters governing these fits provide a summary view of market health: for instance, in years marked by unprecedented financial crises -- for example $2008$ and $2020$ -- values of $λ$ (scaling exponent) are substantially lower. Finally, we demonstrate that the observed emergent behavior cannot be adequately supported by existing generative frameworks such as single- and multi-factor models. We introduce a promising agent-based Vicsek model that closes this gap.
△ Less
Submitted 9 January, 2024; v1 submitted 24 December, 2022;
originally announced December 2022.
-
Quantum Advantage in Variational Bayes Inference
Authors:
Hideyuki Miyahara,
Vwani Roychowdhury
Abstract:
Variational Bayes (VB) inference algorithm is used widely to estimate both the parameters and the unobserved hidden variables in generative statistical models. The algorithm -- inspired by variational methods used in computational physics -- is iterative and can get easily stuck in local minima, even when classical techniques, such as deterministic annealing (DA), are used. We study a variational…
▽ More
Variational Bayes (VB) inference algorithm is used widely to estimate both the parameters and the unobserved hidden variables in generative statistical models. The algorithm -- inspired by variational methods used in computational physics -- is iterative and can get easily stuck in local minima, even when classical techniques, such as deterministic annealing (DA), are used. We study a variational Bayes (VB) inference algorithm based on a non-traditional quantum annealing approach -- referred to as quantum annealing variational Bayes (QAVB) inference -- and show that there is indeed a quantum advantage to QAVB over its classical counterparts. In particular, we show that such better performance is rooted in key concepts from quantum mechanics: (i) the ground state of the Hamiltonian of a quantum system -- defined from the given variational Bayes (VB) problem -- corresponds to an optimal solution for the minimization problem of the variational free energy at very low temperatures; (ii) such a ground state can be achieved by a technique paralleling the quantum annealing process; and (iii) starting from this ground state, the optimal solution to the VB problem can be achieved by increasing the heat bath temperature to unity, and thereby avoiding local minima introduced by spontaneous symmetry breaking observed in classical physics based VB algorithms. We also show that the update equations of QAVB can be potentially implemented using $\lceil \log K \rceil$ qubits and $\mathcal{O} (K)$ operations per step. Thus, QAVB can match the time complexity of existing VB algorithms, while delivering higher performance.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Estimating the number of serial killers that were never caught
Authors:
M. V. Simkin,
V. P. Roychowdhury
Abstract:
Many serial killers commit tens of murders. At the same time inter-murder intervals can be decades long. This suggests that some serial killers can die of an accident or a disease, having been never caught. We use the distribution of the killers by the number of murders, the distribution of the length of inter-murder intervals and USA life tables to estimate the number of the uncaught killers. The…
▽ More
Many serial killers commit tens of murders. At the same time inter-murder intervals can be decades long. This suggests that some serial killers can die of an accident or a disease, having been never caught. We use the distribution of the killers by the number of murders, the distribution of the length of inter-murder intervals and USA life tables to estimate the number of the uncaught killers. The result is that in 20th century there were about seven of such killers. The most prolific of them likely committed over sixty murders.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Modeling Social Readers: Novel Tools for Addressing Reception from Online Book Reviews
Authors:
Pavan Holur,
Shadi Shahsavari,
Ehsan Ebrahimzadeh,
Timothy R. Tangherlini,
Vwani Roychowdhury
Abstract:
Readers' responses to literature have received scant attention in computational literary studies. The rise of social media offers an opportunity to capture a segment of these responses while data-driven analysis of these responses can provide new critical insight into how people "read". Posts discussing an individual book on Goodreads, a social media platform that hosts user discussions of popular…
▽ More
Readers' responses to literature have received scant attention in computational literary studies. The rise of social media offers an opportunity to capture a segment of these responses while data-driven analysis of these responses can provide new critical insight into how people "read". Posts discussing an individual book on Goodreads, a social media platform that hosts user discussions of popular literature, are referred to as "reviews", and consist of plot summaries, opinions, quotes, or some mixture of these. Since these reviews are written by readers, computationally modeling them allows one to discover the overall non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit ranking of the importance of events, and the readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader generated shared narrative model. Using a corpus of reviews of five popular novels, we discover the readers' distillation of the main storylines in a novel, their understanding of the relative importance of characters, as well as the readers' varying impressions of these characters. In so doing, we make three important contributions to the study of infinite vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a new sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from the reviews; and (iii) a new "impressions" algorithm, SENT2IMP, that provides finer, non-trivial and multi-modal insight into readers' opinions of characters.
△ Less
Submitted 7 May, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
Ansatz-Independent Variational Quantum Classifier
Authors:
Hideyuki Miyahara,
Vwani Roychowdhury
Abstract:
The paradigm of variational quantum classifiers (VQCs) encodes \textit{classical information} as quantum states, followed by quantum processing and then measurements to generate classical predictions. VQCs are promising candidates for efficient utilization of a near-term quantum device: classifiers involving $M$-dimensional datasets can be implemented with only $\lceil \log_2 M \rceil$ qubits by u…
▽ More
The paradigm of variational quantum classifiers (VQCs) encodes \textit{classical information} as quantum states, followed by quantum processing and then measurements to generate classical predictions. VQCs are promising candidates for efficient utilization of a near-term quantum device: classifiers involving $M$-dimensional datasets can be implemented with only $\lceil \log_2 M \rceil$ qubits by using an amplitude encoding. A general framework for designing and training VQCs, however, has not been proposed, and a fundamental understanding of its power and analytical relationships with classical classifiers are not well understood. An encouraging specific embodiment of VQCs, quantum circuit learning (QCL), utilizes an ansatz: it expresses the quantum evolution operator as a circuit with a predetermined topology and parametrized gates; training involves learning the gate parameters through optimization. In this letter, we first address the open questions about VQCs and then show that they, including QCL, fit inside the well-known kernel method. Based on such correspondence, we devise a design framework of efficient ansatz-independent VQCs, which we call the unitary kernel method (UKM): it directly optimizes the unitary evolution operator in a VQC. Thus, we show that the performance of QCL is bounded from above by the UKM. Next, we propose a variational circuit realization (VCR) for designing efficient quantum circuits for a given unitary operator. By combining the UKM with the VCR, we establish an efficient framework for constructing high-performing circuits. We finally benchmark the relatively superior performance of the UKM and the VCR via extensive numerical simulations on multiple datasets.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Brain-inspired automated visual object discovery and detection
Authors:
Lichao Chen,
Sudhir Singh,
Thomas Kailath,
Vwani Roychowdhury
Abstract:
Despite significant recent progress, machine vision systems lag considerably behind their biological counterparts in performance, scalability, and robustness. A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under…
▽ More
Despite significant recent progress, machine vision systems lag considerably behind their biological counterparts in performance, scalability, and robustness. A distinctive hallmark of the brain is its ability to automatically discover and model objects, at multiscale resolutions, from repeated exposures to unlabeled contextual data and then to be able to robustly detect the learned objects under various nonideal circumstances, such as partial occlusion and different view angles. Replication of such capabilities in a machine would require three key ingredients: (i) access to large-scale perceptual data of the kind that humans experience, (ii) flexible representations of objects, and (iii) an efficient unsupervised learning algorithm. The Internet fortunately provides unprecedented access to vast amounts of visual data. This paper leverages the availability of such data to develop a scalable framework for unsupervised learning of object prototypes--brain-inspired flexible, scale, and shift invariant representations of deformable objects (e.g., humans, motorcycles, cars, airplanes) comprised of parts, their different configurations and views, and their spatial relationships. Computationally, the object prototypes are represented as geometric associative networks using probabilistic constructs such as Markov random fields. We apply our framework to various datasets and show that our approach is computationally scalable and can construct accurate and operational part-aware object models much more efficiently than in much of the recent computer vision literature. We also present efficient algorithms for detection and localization in new scenes of objects and their partial views.
△ Less
Submitted 29 September, 2019;
originally announced October 2019.
-
Chess players' fame versus their merit
Authors:
M. V. Simkin,
V. P. Roychowdhury
Abstract:
We investigate a pool of international chess title holders born between 1901 and 1943. Using Elo ratings we compute for every player his expected score in a game with a randomly selected player from the pool. We use this figure as player's merit. We measure players' fame as the number of Google hits. The correlation between fame and merit is 0.38. At the same time the correlation between the logar…
▽ More
We investigate a pool of international chess title holders born between 1901 and 1943. Using Elo ratings we compute for every player his expected score in a game with a randomly selected player from the pool. We use this figure as player's merit. We measure players' fame as the number of Google hits. The correlation between fame and merit is 0.38. At the same time the correlation between the logarithm of fame and merit is 0.61. This suggests that fame grows exponentially with merit.
△ Less
Submitted 30 April, 2015;
originally announced May 2015.
-
A mathematical theory of fame
Authors:
M. V. Simkin,
V. P. Roychowdhury
Abstract:
We study empirically how the fame of WWI fighter-pilot aces, measured in numbers of web pages mentioning them, is related to their achievement, measured in numbers of opponent aircraft destroyed. We find that on the average fame grows exponentially with achievement; the correlation coefficient between achievement and the logarithm of fame is 0.72. The number of people with a particular level of ac…
▽ More
We study empirically how the fame of WWI fighter-pilot aces, measured in numbers of web pages mentioning them, is related to their achievement, measured in numbers of opponent aircraft destroyed. We find that on the average fame grows exponentially with achievement; the correlation coefficient between achievement and the logarithm of fame is 0.72. The number of people with a particular level of achievement decreases exponentially with the level, leading to a power-law distribution of fame. We propose a stochastic model that can explain the exponential growth of fame with achievement. Next, we hypothesize that the same functional relation between achievement and fame that we found for the aces holds for other professions. This allows us to estimate achievement for professions where an unquestionable and universally accepted measure of achievement does not exist. We apply the method to Nobel Prize winners in Physics. For example, we obtain that Paul Dirac, who is a hundred times less famous than Einstein contributed to physics only two times less. We compare our results with Landau's ranking.
△ Less
Submitted 12 January, 2013;
originally announced January 2013.
-
Theory of citing
Authors:
M. V. Simkin,
V. P. Roychowdhury
Abstract:
We present empirical data on misprints in citations to twelve high-profile papers. The great majority of misprints are identical to misprints in articles that earlier cited the same paper. The distribution of the numbers of misprint repetitions follows a power law. We develop a stochastic model of the citation process, which explains these findings and shows that about 70-90% of scientific citatio…
▽ More
We present empirical data on misprints in citations to twelve high-profile papers. The great majority of misprints are identical to misprints in articles that earlier cited the same paper. The distribution of the numbers of misprint repetitions follows a power law. We develop a stochastic model of the citation process, which explains these findings and shows that about 70-90% of scientific citations are copied from the lists of references used in other papers. Citation copying can explain not only why some misprints become popular, but also why some papers become highly cited. We show that a model where a scientist picks few random papers, cites them, and copies a fraction of their references accounts quantitatively for empirically observed distribution of citations.
△ Less
Submitted 10 September, 2011;
originally announced September 2011.
-
Estimating achievement from fame
Authors:
M. V. Simkin,
V. P. Roychowdhury
Abstract:
We report a method for estimating people's achievement based on their fame. Earlier we discovered (cond-mat/0310049) that fame of fighter pilot aces (measured as number of Google hits) grows exponentially with their achievement (number of victories). We hypothesize that the same functional relation between achievement and fame holds for other professions. This allows us to estimate achievement for…
▽ More
We report a method for estimating people's achievement based on their fame. Earlier we discovered (cond-mat/0310049) that fame of fighter pilot aces (measured as number of Google hits) grows exponentially with their achievement (number of victories). We hypothesize that the same functional relation between achievement and fame holds for other professions. This allows us to estimate achievement for professions where an unquestionable and universally accepted measure of achievement does not exist. We apply the method to Nobel Prize winners in Physics. For example, we obtain that Paul Dirac, who is hundred times less famous than Einstein contributed to physics only two times less. We compare our results with Landau's ranking.
△ Less
Submitted 15 March, 2011; v1 submitted 18 June, 2009;
originally announced June 2009.