Skip to main content

Showing 1–26 of 26 results for author: Altmann, E G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.03757  [pdf, other

    cs.SI physics.soc-ph

    Synthetic graphs for link prediction benchmarking

    Authors: Alexey Vlaskin, Eduardo G. Altmann

    Abstract: Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ub… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: 15 pages, 8 figures; code available at: https://github.com/avlaskin/synthetic-graphs-for-lp

    Journal ref: J. Phys. Complex. 6 015004 (2025)

  2. arXiv:2405.14168  [pdf, other

    cs.SI physics.soc-ph

    A generative model for community types in directed networks

    Authors: Cathy Xuanchi Liu, Tristram J. Alexander, Eduardo G. Altmann

    Abstract: Large complex networks are often organized into groups or communities. In this paper, we introduce and investigate a generative model of network evolution that reproduces all four pairwise community types that exist in directed networks: assortative, core-periphery, disassortative, and the newly introduced source-basin type. We fix the number of nodes and the community membership of each node, all… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

    Journal ref: Journal of Complex Networks 13(1), cnae048 (2025)

  3. arXiv:2310.07372  [pdf, other

    math.CO cond-mat.stat-mech cs.CG math.GT physics.comp-ph

    Sampling triangulations of manifolds using Monte Carlo methods

    Authors: Eduardo G. Altmann, Jonathan Spreer

    Abstract: We propose a Monte Carlo method to efficiently find, count, and sample abstract triangulations of a given manifold M. The method is based on a biased random walk through all possible triangulations of M (in the Pachner graph), constructed by combining (bi-stellar) moves with suitable chosen accept/reject probabilities (Metropolis-Hastings). Asymptotically, the method guarantees that samples of tri… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 29 pages, 6 figures

    MSC Class: 57Q15; 60J10; 57-08

  4. arXiv:2305.02457  [pdf, other

    cs.CL cond-mat.stat-mech physics.soc-ph

    Quantifying the Dissimilarity of Texts

    Authors: Benjamin Shade, Eduardo G. Altmann

    Abstract: Quantifying the dissimilarity of two texts is an important aspect of a number of natural language processing tasks, including semantic information retrieval, topic classification, and document clustering. In this paper, we compared the properties and performance of different dissimilarity measures $D$ using three different representations of texts -- vocabularies, word frequency distributions, and… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 16 pages, 4 figures, part of the Special Issue Novel Methods and Applications in Natural Language Processing

    Journal ref: Information 2023, 14, 271

  5. arXiv:2106.15821  [pdf, other

    cs.SI physics.soc-ph stat.ML

    Multilayer Networks for Text Analysis with Multiple Data Types

    Authors: Charles C. Hyland, Yuanming Tao, Lamiae Azizi, Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

    Abstract: We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main innovation of our approach over other techniques is th… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: 17 pages, 6 figures

    Journal ref: EPJ Data Science volume 10, Article number: 33 (2021)

  6. arXiv:2004.12707  [pdf, other

    physics.soc-ph cs.SI

    Scaling laws and dynamics of hashtags on Twitter

    Authors: Hongjia H. Chen, Tristram J. Alexander, Diego F. M. Oliveira, Eduardo G. Altmann

    Abstract: In this paper we quantify the statistical properties and dynamics of the frequency of hashtag use on Twitter. Hashtags are special words used in social media to attract attention and to organize content. Looking at the collection of all hashtags used in a period of time, we identify the scaling laws underpinning the hashtag frequency distribution (Zipf's law), the number of unique hashtags as a fu… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: 8 pages and 4 figures. Submitted to the journal "Chaos", special edition on "Dynamics of Social Systems". Data available at https://zenodo.org/record/3673744#.Xqa5t_GhSv4

    Journal ref: Chaos 30, 063112 (2020); Codes available at: https://github.com/edugalt/TwitterHashtags

  7. arXiv:1907.06361  [pdf, other

    physics.soc-ph cs.SI

    Micro, Meso, Macro: the effect of triangles on communities in networks

    Authors: Sophie Wharrie, Lamiae Azizi, Eduardo G. Altmann

    Abstract: Meso-scale structures (communities) are used to understand the macro-scale properties of complex networks, such as their functionality and formation mechanisms. Micro-scale structures are known to exist in most complex networks (e.g., large number of triangles or motifs), but they are absent in the simple random-graph models considered (e.g., as null models) in community-detection algorithms. In t… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 9 pages, 4 figures. A repository with our codes is available at https://github.com/sophiewharrie/micro-meso-macro-code

    Journal ref: Phys. Rev. E 100, 022315 (2019)

  8. arXiv:1903.06588  [pdf, other

    physics.soc-ph cond-mat.dis-nn cs.SI

    Unraveling the Origin of Social Bursts in Collective Attention

    Authors: Manlio De Domenico, Eduardo G. Altmann

    Abstract: In the era of social media, every day billions of individuals produce content in socio-technical systems resulting in a deluge of information. However, human attention is a limited resource and it is increasingly challenging to consume the most suitable content for one's interests. In fact, the complex interplay between individual and social activities in social systems overwhelmed by information… ▽ More

    Submitted 15 March, 2019; originally announced March 2019.

    Comments: 14 pages, 10 figures

    Journal ref: Sci Rep 10, 4629 (2020)

  9. arXiv:1708.01677  [pdf, other

    stat.ML cs.CL physics.data-an physics.soc-ph

    A network approach to topic models

    Authors: Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann

    Abstract: One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous application… ▽ More

    Submitted 19 July, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

    Comments: 22 pages, 10 figures, code available at https://topsbm.github.io/

    Journal ref: Science Advances 4, eaaq1360 (2018)

  10. arXiv:1706.08671  [pdf, other

    cs.DL physics.soc-ph

    Using text analysis to quantify the similarity and evolution of scientific disciplines

    Authors: Laercio Dias, Martin Gerlach, Joachim Scharloth, Eduardo G. Altmann

    Abstract: We use an information-theoretic measure of linguistic similarity to investigate the organization and evolution of scientific fields. An analysis of almost 20M papers from the past three decades reveals that the linguistic similarity is related but different from experts and citation-based classifications, leading to an improved view on the organization of science. A temporal analysis of the simila… ▽ More

    Submitted 27 June, 2017; originally announced June 2017.

    Comments: 9 pages, 4 figures

    Journal ref: R. Soc. open sci. 5: 171545 (2018)

  11. arXiv:1611.03596  [pdf, other

    physics.soc-ph cs.CL

    Generalized Entropies and the Similarity of Texts

    Authors: Eduardo G. Altmann, Laercio Dias, Martin Gerlach

    Abstract: We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generaliz… ▽ More

    Submitted 11 November, 2016; originally announced November 2016.

    Comments: 13 pages, 6 figures; Results presented at the StatPhys-2016 meeting in Lyon

    Journal ref: J. Stat. Mech. 014002 (2017)

  12. arXiv:1605.07465  [pdf, ps, other

    physics.soc-ph cs.DL

    Impact of lexical and sentiment factors on the popularity of scientific papers

    Authors: Julian Sienkiewicz, Eduardo G. Altmann

    Abstract: We investigate how textual properties of scientific papers relate to the number of citations they receive. Our main finding is that correlations are non-linear and affect differently most-cited and typical papers. For instance, we find that in most journals short titles correlate positively with citations only for the most cited papers, for typical papers the correlation is in most cases negative.… ▽ More

    Submitted 24 May, 2016; originally announced May 2016.

    Comments: 9 pages, 3 figures, 3 tables

    Journal ref: R. Soc. open sci. 3: 160140 (2016)

  13. arXiv:1510.00277  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Similarity of symbol frequency distributions with heavy tails

    Authors: Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann

    Abstract: Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in t… ▽ More

    Submitted 15 April, 2016; v1 submitted 1 October, 2015; originally announced October 2015.

    Comments: 13 pages, 7 figures

    Journal ref: Phys. Rev. X 6, 021009 (2016)

  14. arXiv:1507.08696  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.SI

    Sampling motif-constrained ensembles of networks

    Authors: Rico Fischer, Jorge C. Leitao, Tiago P. Peixoto, Eduardo G. Altmann

    Abstract: The statistical significance of network properties is conditioned on null models which satisfy spec- ified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency, or due to the impossibility to sample networks from them. These problem… ▽ More

    Submitted 17 November, 2015; v1 submitted 30 July, 2015; originally announced July 2015.

    Comments: Updated version, as published in the journal. 7 pages, 5 figures, one Supplemental Material

    Journal ref: Phys. Rev. Lett. 115, 188701 (2015)

  15. arXiv:1507.01716  [pdf, other

    physics.soc-ph cs.SI physics.data-an

    Temporal-varying failures of nodes in networks

    Authors: Georgie Knight, Giampaolo Cristadoro, Eduardo G. Altmann

    Abstract: We consider networks in which random walkers are removed because of the failure of specific nodes. We interpret the rate of loss as a measure of the importance of nodes, a notion we denote as failure-centrality. We show that the degree of the node is not sufficient to determine this measure and that, in a first approximation, the shortest loops through the node have to be taken into account. We pr… ▽ More

    Submitted 7 July, 2015; originally announced July 2015.

    Comments: 7 pages, 3 figures

    Journal ref: Phys. Rev. E 92, 022810 (2015)

  16. arXiv:1502.03296  [pdf, other

    physics.soc-ph cs.LG physics.data-an

    Statistical laws in linguistics

    Authors: Eduardo G. Altmann, Martin Gerlach

    Abstract: Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language. Here we review and critically discuss how these laws can be statistically interpreted, fitted, and tested (falsified). The modern availability of large databases of written text allows for tests with an unprecedent statistical accuracy and also a characterization of the fluctuations around… ▽ More

    Submitted 11 February, 2015; originally announced February 2015.

    Comments: Proceedings of the Flow Machines Workshop: Creativity and Universality in Language, Paris, June 18 to 20, 2014

  17. arXiv:1406.4498  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Extracting information from S-curves of language change

    Authors: Fakhteh Ghanbarnejad, Martin Gerlach, Jose M. Miotto, Eduardo G. Altmann

    Abstract: It is well accepted that adoption of innovations are described by S-curves (slow start, accelerating period, and slow end). In this paper, we analyze how much information on the dynamics of innovation spreading can be obtained from a quantitative description of S-curves. We focus on the adoption of linguistic innovations for which detailed databases of written texts from the last 200 years allow f… ▽ More

    Submitted 30 October, 2014; v1 submitted 17 June, 2014; originally announced June 2014.

    Comments: 9 pages, 5 figures, Supplementary Material is available at http://dx.doi.org/10.6084/m9.figshare.1221782

    Journal ref: J. R. Soc. Interface 6 December 2014 vol. 11 no. 101 20141044

  18. arXiv:1406.4441  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Scaling laws and fluctuations in the statistics of word frequencies

    Authors: Martin Gerlach, Eduardo G. Altmann

    Abstract: In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by m… ▽ More

    Submitted 4 November, 2014; v1 submitted 17 June, 2014; originally announced June 2014.

    Comments: 19 pages, 4 figures

    Journal ref: New Journal of Physics 16 (2014), 113010

  19. arXiv:1403.3616  [pdf, other

    physics.soc-ph cs.SI physics.data-an

    Predictability of extreme events in social media

    Authors: José M. Miotto, Eduardo G. Altmann

    Abstract: It is part of our daily social-media experience that seemingly ordinary items (videos, news, publications, etc.) unexpectedly gain an enormous amount of attention. Here we investigate how unexpected these events are. We propose a method that, given some information on the items, quantifies the predictability of events, i.e., the potential of identifying in advance the most successful items defined… ▽ More

    Submitted 8 December, 2014; v1 submitted 14 March, 2014; originally announced March 2014.

    Comments: 13 pages, 3 figures

    Journal ref: Miotto JM, Altmann EG (2014) Predictability of Extreme Events in Social Media. PLoS ONE 9(11): e111506

  20. arXiv:1303.0347  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Probing the statistical properties of unknown texts: application to the Voynich Manuscript

    Authors: Diego R. Amancio, Eduardo G. Altmann, Diego Rybski, Osvaldo N. Oliveira Jr., Luciano da F. Costa

    Abstract: While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed investigating the properties of statistical measurements across different languages and texts. In this study we propose a framework that aims at determining if a text is compatible with a natural language and which languages are c… ▽ More

    Submitted 1 March, 2013; originally announced March 2013.

    Journal ref: PLoS ONE 8(7): e67310 (2013)

  21. arXiv:1302.3892  [pdf, ps, other

    physics.soc-ph cond-mat.dis-nn cs.CL q-bio.PE

    Identifying trends in word frequency dynamics

    Authors: Eduardo G. Altmann, Zakary L. Whichard, Adilson E. Motter

    Abstract: The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent… ▽ More

    Submitted 15 February, 2013; originally announced February 2013.

    Journal ref: J. Stat. Phys. 151, p. 277 (2013)

  22. arXiv:1212.1362  [pdf, other

    physics.soc-ph cs.CL physics.data-an

    Stochastic model for the vocabulary growth in natural languages

    Authors: Martin Gerlach, Eduardo G. Altmann

    Abstract: We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually… ▽ More

    Submitted 4 April, 2013; v1 submitted 6 December, 2012; originally announced December 2012.

    Comments: corrected typos and errors in reference list; 10 pages text, 15 pages supplemental material; to appear in Physical Review X

    Journal ref: Phys. Rev. X 3, 021006 (2013)

  23. arXiv:1207.0658  [pdf, other

    physics.data-an cs.CL physics.soc-ph

    On the origin of long-range correlations in texts

    Authors: Eduardo G. Altmann, Giampaolo Cristadoro, Mirko Degli Esposti

    Abstract: The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitra… ▽ More

    Submitted 3 July, 2012; originally announced July 2012.

    Comments: Full paper (8 pages) and Supporting Information (19 pages)

    Journal ref: Proc. Natl. Acad. Sci. USA 109, 11582 (2012)

  24. arXiv:1112.6045  [pdf, other

    physics.soc-ph cs.CL cs.SI physics.data-an

    Comparing intermittency and network measurements of words and their dependency on authorship

    Authors: Diego R. Amancio, Eduardo G. Altmann, Osvaldo N. Oliveira Jr., Luciano da F. Costa

    Abstract: Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuri… ▽ More

    Submitted 27 December, 2011; originally announced December 2011.

    Journal ref: New J. Phys. (2011) 13 123024

  25. arXiv:1009.3321  [pdf, other

    cs.CL cond-mat.dis-nn nlin.AO physics.soc-ph q-bio.PE

    Niche as a determinant of word fate in online groups

    Authors: Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter

    Abstract: Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between {their intrinsic properties and the environments in which they function}. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship… ▽ More

    Submitted 2 June, 2011; v1 submitted 16 September, 2010; originally announced September 2010.

    Comments: Supporting Information is available here: http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0019009.s001

    Journal ref: PLoS ONE 6(5), e19009 (2011)

  26. arXiv:0901.2349  [pdf, other

    cs.CL cond-mat.dis-nn physics.data-an physics.soc-ph

    Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words

    Authors: Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter

    Abstract: Background: Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of si… ▽ More

    Submitted 11 November, 2009; v1 submitted 15 January, 2009; originally announced January 2009.

    Journal ref: PLoS ONE 4 (11): e7678 (2009)