-
Resolution of Simpson's paradox via the common cause principle
Authors:
A. Hovhannisyan,
A. E. Allahverdyan
Abstract:
Simpson's paradox is an obstacle to establishing a probabilistic association between two events $a_1$ and $a_2$, given the third (lurking) random variable $B$. We focus on scenarios when the random variables $A$ (which combines $a_1$, $a_2$, and their complements) and $B$ have a common cause $C$ that need not be observed. Alternatively, we can assume that $C$ screens out $A$ from $B$. For such cas…
▽ More
Simpson's paradox is an obstacle to establishing a probabilistic association between two events $a_1$ and $a_2$, given the third (lurking) random variable $B$. We focus on scenarios when the random variables $A$ (which combines $a_1$, $a_2$, and their complements) and $B$ have a common cause $C$ that need not be observed. Alternatively, we can assume that $C$ screens out $A$ from $B$. For such cases, the correct association between $a_1$ and $a_2$ is to be defined via conditioning over $C$. This setup generalizes the original Simpson's paradox: now its two contradicting options refer to two particular and different causes $C$. We show that if $B$ and $C$ are binary and $A$ is quaternary (the minimal and the most widespread situation for the Simpson's paradox), the conditioning over any binary common cause $C$ establishes the same direction of association between $a_1$ and $a_2$ as the conditioning over $B$ in the original formulation of the paradox. Thus, for the minimal common cause, one should choose the option of Simpson's paradox that assumes conditioning over $B$ and not its marginalization. The same conclusion is reached when Simpson's paradox is formulated via 3 continuous Gaussian variables: within the minimal formulation of the paradox (3 scalar continuous variables $A_1$, $A_2$, and $B$), one should choose the option with the conditioning over $B$.
△ Less
Submitted 21 July, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Ultimatum game: regret or fairness?
Authors:
Lida H. Aleksanyan,
Armen E. Allahverdyan,
Vardan G. Bardakhchyan
Abstract:
In the ultimatum game, the challenge is to explain why responders reject non-zero offers thereby defying classical rationality. Fairness and related notions have been the main explanations so far. We explain this rejection behavior via the following principle: if the responder regrets less about losing the offer than the proposer regrets not offering the best option, the offer is rejected. This pr…
▽ More
In the ultimatum game, the challenge is to explain why responders reject non-zero offers thereby defying classical rationality. Fairness and related notions have been the main explanations so far. We explain this rejection behavior via the following principle: if the responder regrets less about losing the offer than the proposer regrets not offering the best option, the offer is rejected. This principle qualifies as a rational punishing behavior and it replaces the experimentally falsified classical rationality (the subgame perfect Nash equilibrium) that leads to accepting any non-zero offer. The principle is implemented via the transitive regret theory for probabilistic lotteries. The expected utility implementation is a limiting case of this. We show that several experimental results normally prescribed to fairness and intent-recognition can be given an alternative explanation via rational punishment; e.g. the comparison between "fair" and "superfair", the behavior under raising the stakes etc. Hence we also propose experiments that can distinguish these two scenarios (fairness versus regret-based punishment). They assume different utilities for the proposer and responder. We focus on the mini-ultimatum version of the game and also show how it can emerge from a more general setup of the game.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Unsupervised extraction of local and global keywords from a single text
Authors:
Lida Aleksanyan,
Armen E. Allahverdyan
Abstract:
We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows…
▽ More
We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.
△ Less
Submitted 14 June, 2024; v1 submitted 26 July, 2023;
originally announced July 2023.
-
The most likely common cause
Authors:
A. Hovhannisyan,
A. E. Allahverdyan
Abstract:
The common cause principle for two random variables $A$ and $B$ is examined in the case of causal insufficiency, when their common cause $C$ is known to exist, but only the joint probability of $A$ and $B$ is observed. As a result, $C$ cannot be uniquely identified (the latent confounder problem). We show that the generalized maximum likelihood method can be applied to this situation and allows id…
▽ More
The common cause principle for two random variables $A$ and $B$ is examined in the case of causal insufficiency, when their common cause $C$ is known to exist, but only the joint probability of $A$ and $B$ is observed. As a result, $C$ cannot be uniquely identified (the latent confounder problem). We show that the generalized maximum likelihood method can be applied to this situation and allows identification of $C$ that is consistent with the common cause principle. It closely relates to the maximum entropy principle. Investigation of the two binary symmetric variables reveals a non-analytic behavior of conditional probabilities reminiscent of a second-order phase transition. This occurs during the transition from correlation to anti-correlation in the observed probability distribution. The relation between the generalized likelihood approach and alternative methods, such as predictive likelihood and the minimum common cause entropy, is discussed. The consideration of the common cause for three observed variables (and one hidden cause) uncovers causal structures that defy representation through directed acyclic graphs with the Markov condition.
△ Less
Submitted 24 July, 2024; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Optimal alphabet for single text compression
Authors:
Armen E. Allahverdyan,
Andranik Khachatryan
Abstract:
A text written using symbols from a given alphabet can be compressed using the Huffman code, which minimizes the length of the encoded text. It is necessary, however, to employ a text-specific codebook, i.e. the symbol-codeword dictionary, to decode the original text. Thus, the compression performance should be evaluated by the full code length, i.e. the length of the encoded text plus the length…
▽ More
A text written using symbols from a given alphabet can be compressed using the Huffman code, which minimizes the length of the encoded text. It is necessary, however, to employ a text-specific codebook, i.e. the symbol-codeword dictionary, to decode the original text. Thus, the compression performance should be evaluated by the full code length, i.e. the length of the encoded text plus the length of the codebook. We studied several alphabets for compressing texts -- letters, n-grams of letters, syllables, words, and phrases. If only sufficiently short texts are retained, an alphabet of letters or two-grams of letters is optimal. For the majority of Project Gutenberg texts, the best alphabet (the one that minimizes the full code length) is given by syllables or words, depending on the representation of the codebook. Letter 3 and 4-grams, having on average comparable length to syllables/words, perform noticeably worse than syllables or words. Word 2-grams also are never the best alphabet, on the account of having a very large codebook. We also show that the codebook representation is important -- switching from a naive representation to a compact one significantly improves the matters for alphabets with large number of symbols, most notably the words.
Thus, meaning-expressing elements of the language (syllables or words) provide the best compression alphabet.
△ Less
Submitted 31 July, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Dissipative search of an unstructured database
Authors:
Armen E. Allahverdyan,
David Petrosyan
Abstract:
The search of an unstructured database amounts to finding one element having a certain property out of $N$ elements. The classical search with an oracle checking one element at a time requires on average $N/2$ steps. The Grover algorithm for the quantum search, and its unitary Hamiltonian evolution analogue, accomplish the search asymptotically optimally in $\mathcal{O} (\sqrt{N})$ time steps. We…
▽ More
The search of an unstructured database amounts to finding one element having a certain property out of $N$ elements. The classical search with an oracle checking one element at a time requires on average $N/2$ steps. The Grover algorithm for the quantum search, and its unitary Hamiltonian evolution analogue, accomplish the search asymptotically optimally in $\mathcal{O} (\sqrt{N})$ time steps. We reformulate the search problem as a dissipative Markov process acting on an $N$-level system weakly coupled to a thermal bath. Assuming that the energy levels of the system represent the database elements, we show that, with a proper choice of the spectrum and physically admissible, long-range transition rates between the energy levels, the system relaxes to the ground state, corresponding to the sought element, in time $\mathcal{O} (\ln N)$.
△ Less
Submitted 4 April, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Maximum Entropy competes with Maximum Likelihood
Authors:
A. E. Allahverdyan,
N. H. Martirosyan
Abstract:
Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesi…
▽ More
Maximum entropy (MAXENT) method has a large number of applications in theoretical and applied machine learning, since it provides a convenient non-parametric tool for estimating unknown probabilities. The method is a major contribution of statistical physics to probabilistic inference. However, a systematic approach towards its validity limits is currently missing. Here we study MAXENT in a Bayesian decision theory set-up, i.e. assuming that there exists a well-defined prior Dirichlet density for unknown probabilities, and that the average Kullback-Leibler (KL) distance can be employed for deciding on the quality and applicability of various estimators. These allow to evaluate the relevance of various MAXENT constraints, check its general applicability, and compare MAXENT with estimators having various degrees of dependence on the prior, viz. the regularized maximum likelihood (ML) and the Bayesian estimators. We show that MAXENT applies in sparse data regimes, but needs specific types of prior information. In particular, MAXENT can outperform the optimally regularized ML provided that there are prior rank correlations between the estimated random quantity and its probabilities.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Two halves of a meaningful text are statistically different
Authors:
Weibing Deng,
R. Xie,
S. Deng,
Armen E. Allahverdyan
Abstract:
Which statistical features distinguish a meaningful text (possibly written in an unknown system) from a meaningless set of symbols? Here we answer this question by comparing features of the first half of a text to its second half. This comparison can uncover hidden effects, because the halves have the same values of many parameters (style, genre {\it etc}). We found that the first half has more di…
▽ More
Which statistical features distinguish a meaningful text (possibly written in an unknown system) from a meaningless set of symbols? Here we answer this question by comparing features of the first half of a text to its second half. This comparison can uncover hidden effects, because the halves have the same values of many parameters (style, genre {\it etc}). We found that the first half has more different words and more rare words than the second half. Also, words in the first half are distributed less homogeneously over the text in the sense of of the difference between the frequency and the inverse spatial period. These differences hold for the significant majority of several hundred relatively short texts we studied. The statistical significance is confirmed via the Wilcoxon test. Differences disappear after random permutation of words that destroys the linear structure of the text. The differences reveal a temporal asymmetry in meaningful texts, which is confirmed by showing that texts are much better compressible in their natural way (i.e. along the narrative) than in the word-inverted form. We conjecture that these results connect the semantic organization of a text (defined by the flow of its narrative) to its statistical features.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Observational nonidentifiability, generalized likelihood and free energy
Authors:
A. E. Allahverdyan
Abstract:
We study the parameter estimation problem in mixture models with observational nonidentifiability: the full model (also containing hidden variables) is identifiable, but the marginal (observed) model is not. Hence global maxima of the marginal likelihood are (infinitely) degenerate and predictions of the marginal likelihood are not unique. We show how to generalize the marginal likelihood by intro…
▽ More
We study the parameter estimation problem in mixture models with observational nonidentifiability: the full model (also containing hidden variables) is identifiable, but the marginal (observed) model is not. Hence global maxima of the marginal likelihood are (infinitely) degenerate and predictions of the marginal likelihood are not unique. We show how to generalize the marginal likelihood by introducing an effective temperature, and making it similar to the free energy. This generalization resolves the observational nonidentifiability, since its maximization leads to unique results that are better than a random selection of one degenerate maximum of the marginal likelihood or the averaging over many such maxima. The generalized likelihood inherits many features from the usual likelihood, e.g. it holds the conditionality principle, and its local maximum can be searched for via suitably modified expectation-maximization method. The maximization of the generalized likelihood relates to entropy optimization.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Leadership scenarios in prisoner's dilemma game
Authors:
S. G. Babajanyan,
A. V. Melkikh,
A. E. Allahverdyan
Abstract:
The prisoner's dilemma game is the most known contribution of game theory into social sciences. Here we describe new implications of this game for transactional and transformative leadership. While the autocratic (Stackelberg's) leadership is inefficient for this game, we discuss a Pareto-optimal scenario, where the leader L commits to react probabilistically to pure strategies of the follower F,…
▽ More
The prisoner's dilemma game is the most known contribution of game theory into social sciences. Here we describe new implications of this game for transactional and transformative leadership. While the autocratic (Stackelberg's) leadership is inefficient for this game, we discuss a Pareto-optimal scenario, where the leader L commits to react probabilistically to pure strategies of the follower F, which is free to make the first move. Offering F to resolve the dilemma, L is able to get a larger average pay-off. The exploitation can be stabilized via repeated interaction of L and F, and turns to be more stable than the egalitarian regime, where the pay-offs of L and F are equal. The total (summary) pay-off of the exploiting regime is never larger than in the egalitarian case. We discuss applications of this solution to a soft method of fighting corruption and to modeling the Machiavellian leadership. Whenever the defection benefit is large, the optimal strategies of F are mixed, while the summary pay-off is maximal. One mechanism for sustaining this solution is that L recognizes intentions of F.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
Bargaining with entropy and energy
Authors:
S. G. Babajanyan,
K. H. Cheong,
A. E. Allahverdyan
Abstract:
Statistical mechanics is based on interplay between energy minimization and entropy maximization. Here we formalize this interplay via axioms of cooperative game theory (Nash bargaining) and apply it out of equilibrium. These axioms capture basic notions related to joint maximization of entropy and minus energy, formally represented by utilities of two different players. We predict thermalization…
▽ More
Statistical mechanics is based on interplay between energy minimization and entropy maximization. Here we formalize this interplay via axioms of cooperative game theory (Nash bargaining) and apply it out of equilibrium. These axioms capture basic notions related to joint maximization of entropy and minus energy, formally represented by utilities of two different players. We predict thermalization of a non-equilibrium statistical system employing the axiom of affine covariance|related to the freedom of changing initial points and dimensions for entropy and energy|together with the contraction invariance of the entropy-energy diagram. Whenever the initial non-equilibrium state is active, this mechanism allows thermalization to negative temperatures. Demanding a symmetry between players fixes the final state to a specific positive-temperature (equilibrium) state. The approach solves an important open problem in the maximum entropy inference principle, {\it viz.} generalizes it to the case when the constraint is not known precisely.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Active image restoration
Authors:
Rongrong Xie,
Shengfeng Deng,
Weibing Deng,
Armen E. Allahverdyan
Abstract:
We study active restoration of noise-corrupted images generated via the Gibbs probability of an Ising ferromagnet in external magnetic field. Ferromagnetism accounts for the prior expectation of data smoothness, i.e. a positive correlation between neighbouring pixels (Ising spins), while the magnetic field refers to the bias. The restoration is actively supervised by requesting the true values of…
▽ More
We study active restoration of noise-corrupted images generated via the Gibbs probability of an Ising ferromagnet in external magnetic field. Ferromagnetism accounts for the prior expectation of data smoothness, i.e. a positive correlation between neighbouring pixels (Ising spins), while the magnetic field refers to the bias. The restoration is actively supervised by requesting the true values of certain pixels after a noisy observation. This additional information improves restoration of other pixels. The optimal strategy of active inference is not known for realistic (two-dimensional) images. We determine this strategy for the mean-field version of the model and show that it amounts to supervising the values of spins (pixels) that do not agree with the sign of the average magnetization. The strategy leads to a transparent analytical expression for the minimal Bayesian risk, and shows that there is a maximal number of pixels beyond of which the supervision is useless. We show numerically that this strategy applies for two-dimensional images away from the critical regime. Within this regime the strategy is outperformed by its local (adaptive) version, which supervises pixels that do not agree with their Bayesian estimate. We show on transparent examples how active supervising can be essential in recovering noise-corrupted images and advocate for a wider usage of active methods in image restoration.
△ Less
Submitted 22 September, 2018;
originally announced September 2018.
-
Relating Zipf's law to textual information
Authors:
Weibing Deng,
Armen E. Allahverdyan
Abstract:
Zipf's law is the main regularity of quantitative linguistics. Despite of many works devoted to foundations of this law, it is still unclear whether it is only a statistical regularity, or it has deeper relations with information-carrying structures of the text. This question relates to that of distinguishing a meaningful text (written in an unknown system) from a meaningless set of symbols that m…
▽ More
Zipf's law is the main regularity of quantitative linguistics. Despite of many works devoted to foundations of this law, it is still unclear whether it is only a statistical regularity, or it has deeper relations with information-carrying structures of the text. This question relates to that of distinguishing a meaningful text (written in an unknown system) from a meaningless set of symbols that mimics statistical features of a text. Here we contribute to resolving these questions by comparing features of the first half of a text (from the beginning to the middle) to its second half. This comparison can uncover hidden effects, because the halves have the same values of many parameters (style, genre, author's vocabulary {\it etc}). In all studied texts we saw that for the first half Zipf's law applies from smaller ranks than in the second half, i.e. the law applies better to the first half. Also, words that hold Zipf's law in the first half are distributed more homogeneously over the text. These features do allow to distinguish a meaningful text from a random sequence of words. Our findings correlate with a number of textual characteristics that hold in most cases we studied: the first half is lexically richer, has longer and less repetitive words, more and shorter sentences, more punctuation signs and more paragraphs. These differences between the halves indicate on a higher hierarchic level of text organization that so far went unnoticed in text linguistics. They relate the validity of Zipf's law to textual information. A complete description of this effect requires new models, though one existing model can account for some of its aspects.
△ Less
Submitted 22 September, 2018;
originally announced September 2018.
-
Adaptive Decision Making via Entropy Minimization
Authors:
Armen E. Allahverdyan,
Aram Galstyan,
Ali E. Abbas,
Zbigniew R. Struzik
Abstract:
An agent choosing between various actions tends to take the one with the lowest cost. But this choice is arguably too rigid (not adaptive) to be useful in complex situations, e.g., where exploration-exploitation trade-off is relevant in creative task solving or when stated preferences differ from revealed ones. Here we study an agent who is willing to sacrifice a fixed amount of expected utility f…
▽ More
An agent choosing between various actions tends to take the one with the lowest cost. But this choice is arguably too rigid (not adaptive) to be useful in complex situations, e.g., where exploration-exploitation trade-off is relevant in creative task solving or when stated preferences differ from revealed ones. Here we study an agent who is willing to sacrifice a fixed amount of expected utility for adaptation. How can/ought our agent choose an optimal (in a technical sense) mixed action? We explore consequences of making this choice via entropy minimization, which is argued to be a specific example of risk-aversion. This recovers the $ε$-greedy probabilities known in reinforcement learning. We show that the entropy minimization leads to rudimentary forms of intelligent behavior: (i) the agent assigns a non-negligible probability to costly events; but (ii) chooses with a sizable probability the action related to less cost (lesser of two evils) when confronted with two actions with comparable costs; (iii) the agent is subject to effects similar to cognitive dissonance and frustration. Neither of these features are shown by entropy maximization.
△ Less
Submitted 1 December, 2018; v1 submitted 18 March, 2018;
originally announced March 2018.
-
Memory-induced mechanism for self-sustaining activity in networks
Authors:
A. E. Allahverdyan,
G. Ver Steeg,
A. Galstyan
Abstract:
We study a mechanism of activity sustaining on networks inspired by a well-known model of neuronal dynamics. Our primary focus is the emergence of self-sustaining collective activity patterns, where no single node can stay active by itself, but the activity provided initially is sustained within the collective of interacting agents. In contrast to existing models of self-sustaining activity that a…
▽ More
We study a mechanism of activity sustaining on networks inspired by a well-known model of neuronal dynamics. Our primary focus is the emergence of self-sustaining collective activity patterns, where no single node can stay active by itself, but the activity provided initially is sustained within the collective of interacting agents. In contrast to existing models of self-sustaining activity that are caused by (long) loops present in the network, here we focus on tree--like structures and examine activation mechanisms that are due to temporal memory of the nodes. This approach is motivated by applications in social media, where long network loops are rare or absent. Our results suggest that under a weak behavioral noise, the nodes robustly split into several clusters, with partial synchronization of nodes within each cluster. We also study the randomly-weighted version of the models where the nodes are allowed to change their connection strength (this can model attention redistribution), and show that it does facilitate the self-sustained activity.
△ Less
Submitted 21 December, 2017;
originally announced December 2017.
-
Emergence of Leadership in Communication
Authors:
Armen E. Allahverdyan,
Aram Galstyan
Abstract:
We study a neuro-inspired model that mimics a discussion (or information dissemination) process in a network of agents. During their interaction, agents redistribute activity and network weights, resulting in emergence of leader(s). The model is able to reproduce the basic scenarios of leadership known in nature and society: laissez-faire (irregular activity, weak leadership, sizable inter-followe…
▽ More
We study a neuro-inspired model that mimics a discussion (or information dissemination) process in a network of agents. During their interaction, agents redistribute activity and network weights, resulting in emergence of leader(s). The model is able to reproduce the basic scenarios of leadership known in nature and society: laissez-faire (irregular activity, weak leadership, sizable inter-follower interaction, autonomous sub-leaders); participative or democratic (strong leadership, but with feedback from followers); and autocratic (no feedback, one-way influence). Several pertinent aspects of these scenarios are found as well---e.g., hidden leadership (a hidden clique of agents driving the official autocratic leader), and successive leadership (two leaders influence followers by turns). We study how these scenarios emerge from inter-agent dynamics and how they depend on behavior rules of agents---in particular, on their inertia against state changes.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Non-random structures in universal compression and the Fermi paradox
Authors:
A. V. Gurzadyan,
A. E. Allahverdyan
Abstract:
We study the hypothesis of information panspermia assigned recently among possible solutions of the Fermi paradox ("where are the aliens?"). It suggests that the expenses of alien signaling can be significantly reduced, if their messages contain compressed information. To this end we consider universal compression and decoding mechanisms (e.g. the Lempel-Ziv-Welch algorithm) that can reveal non-ra…
▽ More
We study the hypothesis of information panspermia assigned recently among possible solutions of the Fermi paradox ("where are the aliens?"). It suggests that the expenses of alien signaling can be significantly reduced, if their messages contain compressed information. To this end we consider universal compression and decoding mechanisms (e.g. the Lempel-Ziv-Welch algorithm) that can reveal non-random structures in compressed bit strings. The efficiency of Kolmogorov stochasticity parameter for detection of non-randomness is illustrated, along with the Zipf's law. The universality of these methods, i.e. independence on data details, can be principal in searching for intelligent messages.
△ Less
Submitted 7 February, 2016;
originally announced March 2016.
-
Stochastic model for phonemes uncovers an author-dependency of their usage
Authors:
Weibing Deng,
Armen E. Allahverdyan
Abstract:
We study rank-frequency relations for phonemes, the minimal units that still relate to linguistic meaning. We show that these relations can be described by the Dirichlet distribution, a direct analogue of the ideal-gas model in statistical mechanics. This description allows us to demonstrate that the rank-frequency relations for phonemes of a text do depend on its author. The author-dependency eff…
▽ More
We study rank-frequency relations for phonemes, the minimal units that still relate to linguistic meaning. We show that these relations can be described by the Dirichlet distribution, a direct analogue of the ideal-gas model in statistical mechanics. This description allows us to demonstrate that the rank-frequency relations for phonemes of a text do depend on its author. The author-dependency effect is not caused by the author's vocabulary (common words used in different texts), and is confirmed by several alternative means. This suggests that it can be directly related to phonemes. These features contrast to rank-frequency relations for words, which are both author and text independent and are governed by the Zipf's law.
△ Less
Submitted 20 March, 2016; v1 submitted 5 October, 2015;
originally announced October 2015.
-
Opinion Dynamics with Confirmation Bias
Authors:
A. E. Allahverdyan,
Aram Galstyan
Abstract:
Background: Confirmation bias is the tendency to acquire or evaluate new information in a way that is consistent with one's preexisting beliefs. It is omnipresent in psychology, economics, and even scientific practices. Prior theoretical research of this phenomenon has mainly focused on its economic implications possibly missing its potential connections with broader notions of cognitive science.…
▽ More
Background: Confirmation bias is the tendency to acquire or evaluate new information in a way that is consistent with one's preexisting beliefs. It is omnipresent in psychology, economics, and even scientific practices. Prior theoretical research of this phenomenon has mainly focused on its economic implications possibly missing its potential connections with broader notions of cognitive science. Methodology/Principal Findings: We formulate a (non-Bayesian) model for revising subjective probabilistic opinion of a confirmationally-biased agent in the light of a persuasive opinion. The revision rule ensures that the agent does not react to persuasion that is either far from his current opinion or coincides with it. We demonstrate that the model accounts for the basic phenomenology of the social judgment theory, and allows to study various phenomena such as cognitive dissonance and boomerang effect. The model also displays the order of presentation effect|when consecutively exposed to two opinions, the preference is given to the last opinion (recency) or the first opinion (primacy)|and relates recency to confirmation bias. Finally, we study the model in the case of repeated persuasion and analyze its convergence properties. Conclusions: The standard Bayesian approach to probabilistic opinion revision is inadequate for describing the observed phenomenology of persuasion process. The simple non-Bayesian model proposed here does agree with this phenomenology and is capable of reproducing a spectrum of effects observed in psychology: primacy-recency phenomenon, boomerang effect and cognitive dissonance. We point out several limitations of the model that should motivate its future development.
△ Less
Submitted 16 November, 2014;
originally announced November 2014.
-
Active Inference for Binary Symmetric Hidden Markov Models
Authors:
Armen E. Allahverdyan,
Aram Galstyan
Abstract:
We consider active maximum a posteriori (MAP) inference problem for Hidden Markov Models (HMM), where, given an initial MAP estimate of the hidden sequence, we select to label certain states in the sequence to improve the estimation accuracy of the remaining states. We develop an analytical approach to this problem for the case of binary symmetric HMMs, and obtain a closed form solution that relat…
▽ More
We consider active maximum a posteriori (MAP) inference problem for Hidden Markov Models (HMM), where, given an initial MAP estimate of the hidden sequence, we select to label certain states in the sequence to improve the estimation accuracy of the remaining states. We develop an analytical approach to this problem for the case of binary symmetric HMMs, and obtain a closed form solution that relates the expected error reduction to model parameters under the specified active inference scheme. We then use this solution to determine most optimal active inference scheme in terms of error reduction, and examine the relation of those schemes to heuristic principles of uncertainty reduction and solution unicity.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
Authors:
Armen E. Allahverdyan,
Aram Galstyan
Abstract:
We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generat…
▽ More
We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters.
△ Less
Submitted 16 December, 2013;
originally announced December 2013.
-
Phase Transitions in Community Detection: A Solvable Toy Model
Authors:
Greg Ver Steeg,
Cristopher Moore,
Aram Galstyan,
Armen E. Allahverdyan
Abstract:
Recently, it was shown that there is a phase transition in the community detection problem. This transition was first computed using the cavity method, and has been proved rigorously in the case of $q=2$ groups. However, analytic calculations using the cavity method are challenging since they require us to understand probability distributions of messages. We study analogous transitions in so-calle…
▽ More
Recently, it was shown that there is a phase transition in the community detection problem. This transition was first computed using the cavity method, and has been proved rigorously in the case of $q=2$ groups. However, analytic calculations using the cavity method are challenging since they require us to understand probability distributions of messages. We study analogous transitions in so-called "zero-temperature inference" model, where this distribution is supported only on the most-likely messages. Furthermore, whenever several messages are equally likely, we break the tie by choosing among them with equal probability. While the resulting analysis does not give the correct values of the thresholds, it does reproduce some of the qualitative features of the system. It predicts a first-order detectability transition whenever $q > 2$, while the finite-temperature cavity method shows that this is the case only when $q > 4$. It also has a regime analogous to the "hard but detectable" phase, where the community structure can be partially recovered, but only when the initial messages are sufficiently accurate. Finally, we study a semisupervised setting where we are given the correct labels for a fraction $ρ$ of the nodes. For $q > 2$, we find a regime where the accuracy jumps discontinuously at a critical value of $ρ$.
△ Less
Submitted 2 December, 2013;
originally announced December 2013.
-
Rank-frequency relation for Chinese characters
Authors:
W. B. Deng,
A. E. Allahverdyan,
B. Li,
Q. A. Wang
Abstract:
We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipf…
▽ More
We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.
△ Less
Submitted 26 January, 2014; v1 submitted 6 September, 2013;
originally announced September 2013.
-
Explaining Zipf's Law via Mental Lexicon
Authors:
Armen E. Allahverdyan,
Weibing Deng,
Q. A. Wang
Abstract:
The Zipf's law is the major regularity of statistical linguistics that served as a prototype for rank-frequency relations and scaling laws in natural sciences. Here we show that the Zipf's law -- together with its applicability for a single text and its generalizations to high and low frequencies including hapax legomena -- can be derived from assuming that the words are drawn into the text with r…
▽ More
The Zipf's law is the major regularity of statistical linguistics that served as a prototype for rank-frequency relations and scaling laws in natural sciences. Here we show that the Zipf's law -- together with its applicability for a single text and its generalizations to high and low frequencies including hapax legomena -- can be derived from assuming that the words are drawn into the text with random probabilities. Their apriori density relates, via the Bayesian statistics, to general features of the mental lexicon of the author who produced the text.
△ Less
Submitted 18 February, 2013;
originally announced February 2013.
-
Replicator Dynamics of Co-Evolving Networks
Authors:
Aram Galstyan,
Ardeshir Kianercy,
Armen Allahverdyan
Abstract:
We propose a simple model of network co-evolution in a game-dynamical system of interacting agents that play repeated games with their neighbors, and adapt their behaviors and network links based on the outcome of those games. The adaptation is achieved through a simple reinforcement learning scheme. We show that the collective evolution of such a system can be described by appropriately defined r…
▽ More
We propose a simple model of network co-evolution in a game-dynamical system of interacting agents that play repeated games with their neighbors, and adapt their behaviors and network links based on the outcome of those games. The adaptation is achieved through a simple reinforcement learning scheme. We show that the collective evolution of such a system can be described by appropriately defined replicator dynamics equations. In particular, we suggest an appropriate factorization of the agents' strategies that results in a coupled system of equations characterizing the evolution of both strategies and network structure, and illustrate the framework on two simple examples.
△ Less
Submitted 26 July, 2011;
originally announced July 2011.
-
Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs
Authors:
Greg Ver Steeg,
Aram Galstyan,
Armen E. Allahverdyan
Abstract:
We theoretically study semi-supervised clustering in sparse graphs in the presence of pairwise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs, and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster a…
▽ More
We theoretically study semi-supervised clustering in sparse graphs in the presence of pairwise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs, and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster and between-cluster connectivities below which clusters cannot be recovered with better than random accuracy. The goal of this paper is to examine the impact of pairwise constraints on the clustering accuracy. Our results suggests that the addition of constraints does not provide automatic improvement over the unsupervised case. When the density of the constraints is sufficiently small, their only impact is to shift the detection threshold while preserving the criticality. Conversely, if the density of (hard) constraints is above the percolation threshold, the criticality is suppressed and the detection threshold disappears.
△ Less
Submitted 30 October, 2011; v1 submitted 21 January, 2011;
originally announced January 2011.
-
Community Detection with and without Prior Information
Authors:
Armen E. Allahverdyan,
Greg Ver Steeg,
Aram Galstyan
Abstract:
We study the problem of graph partitioning, or clustering, in sparse networks with prior information about the clusters. Specifically, we assume that for a fraction $ρ$ of the nodes their true cluster assignments are known in advance. This can be understood as a semi--supervised version of clustering, in contrast to unsupervised clustering where the only available information is the graph structur…
▽ More
We study the problem of graph partitioning, or clustering, in sparse networks with prior information about the clusters. Specifically, we assume that for a fraction $ρ$ of the nodes their true cluster assignments are known in advance. This can be understood as a semi--supervised version of clustering, in contrast to unsupervised clustering where the only available information is the graph structure. In the unsupervised case, it is known that there is a threshold of the inter--cluster connectivity beyond which clusters cannot be detected. Here we study the impact of the prior information on the detection threshold, and show that even minute [but generic] values of $ρ>0$ shift the threshold downwards to its lowest possible value. For weighted graphs we show that a small semi--supervising can be used for a non-trivial definition of communities.
△ Less
Submitted 5 October, 2010; v1 submitted 27 July, 2009;
originally announced July 2009.
-
On Maximum a Posteriori Estimation of Hidden Markov Processes
Authors:
Armen Allahverdyan,
Aram Galstyan
Abstract:
We present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a…
▽ More
We present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a finite range of sufficiently low noise levels, the solution is uniquely related to the observed sequence, while the accuracy degrades linearly with increasing the noise strength. For intermediate noise values, the accuracy is nearly noise-independent, but now there are exponentially many solutions to the estimation problem, which is reflected in non-zero ground-state entropy for the Ising model. Finally, for even larger noise intensities, the number of solutions reduces again, but the accuracy is poor. It is shown that these regimes are different thermodynamic phases of the Ising model that are related to each other via first-order phase transitions.
△ Less
Submitted 10 June, 2009;
originally announced June 2009.
-
Entropy of Hidden Markov Processes via Cycle Expansion
Authors:
Armen E. Allahverdyan
Abstract:
Hidden Markov Processes (HMP) is one of the basic tools of the modern probabilistic modeling. The characterization of their entropy remains however an open problem. Here the entropy of HMP is calculated via the cycle expansion of the zeta-function, a method adopted from the theory of dynamical systems. For a class of HMP this method produces exact results both for the entropy and the moment-gene…
▽ More
Hidden Markov Processes (HMP) is one of the basic tools of the modern probabilistic modeling. The characterization of their entropy remains however an open problem. Here the entropy of HMP is calculated via the cycle expansion of the zeta-function, a method adopted from the theory of dynamical systems. For a class of HMP this method produces exact results both for the entropy and the moment-generating function. The latter allows to estimate, via the Chernoff bound, the probabilities of large deviations for the HMP. More generally, the method offers a representation of the moment-generating function and of the entropy via convergent series.
△ Less
Submitted 23 October, 2008;
originally announced October 2008.