Structural time series grammar over variable blocks
Authors:
David Rushing Dewhurst
Abstract:
A structural time series model additively decomposes into generative, semantically-meaningful components, each of which depends on a vector of parameters. We demonstrate that considering each generative component together with its vector of parameters as a single latent structural time series node can simplify reasoning about collections of structural time series components. We then introduce a fo…
▽ More
A structural time series model additively decomposes into generative, semantically-meaningful components, each of which depends on a vector of parameters. We demonstrate that considering each generative component together with its vector of parameters as a single latent structural time series node can simplify reasoning about collections of structural time series components. We then introduce a formal grammar over structural time series nodes and parameter vectors. Valid sentences in the grammar can be interpreted as generative structural time series models. An extension of the grammar can also express structural time series models that include changepoints, though these models are necessarily not generative. We demonstrate a preliminary implementation of the language generated by this grammar. We close with a discussion of possible future work.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
The sociospatial factors of death: Analyzing effects of geospatially-distributed variables in a Bayesian mortality model for Hong Kong
Authors:
Thayer Alshaabi,
David Rushing Dewhurst,
James P. Bagrow,
Peter Sheridan Dodds,
Christopher M. Danforth
Abstract:
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information…
▽ More
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neighbors is unclear. There is also little information on the temporal components of this relationship. Using the districts of Hong Kong over multiple census years as a case study, we demonstrate that there are differences in how wealth indicator variables are associated with longevity in (a) areas that are affluent but neighbored by socially deprived districts versus (b) wealthy areas surrounded by similarly wealthy districts. We also show that the inclusion of spatially-distributed variables reduces uncertainty in mortality rate predictions in each census year when compared with a baseline model. Our results suggest that geographic mortality models should incorporate nonlocal information (e.g., spatial neighbors) to lower the variance of their mortality estimates, and point to a more in-depth analysis of sociospatial spillover effects on mortality rates.
△ Less
Submitted 25 January, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
The growing amplification of social media: Measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009-2020
Authors:
Thayer Alshaabi,
David R. Dewhurst,
Joshua R. Minot,
Michael V. Arnold,
Jane L. Adams,
Christopher M. Danforth,
Peter Sheridan Dodds
Abstract:
Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio':…
▽ More
Working from a dataset of 118 billion messages running from the start of 2009 to the end of 2019, we identify and explore the relative daily use of over 150 languages on Twitter. We find that eight languages comprise 80% of all tweets, with English, Japanese, Spanish, and Portuguese being the most dominant. To quantify social spreading in each language over time, we compute the 'contagion ratio': The balance of retweets to organic messages. We find that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content. By the end of 2019, the contagion ratios for half of the top 30 languages, including English and Spanish, had reached above 1 -- the naive contagion threshold. In 2019, the top 5 languages with the highest average daily ratios were, in order, Thai (7.3), Hindi, Tamil, Urdu, and Catalan, while the bottom 5 were Russian, Swedish, Esperanto, Cebuano, and Finnish (0.26). Further, we show that over time, the contagion ratios for most common languages are growing more strongly than those of rare languages.
△ Less
Submitted 8 March, 2021; v1 submitted 7 March, 2020;
originally announced March 2020.