How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Liao, Han-Teng; Fu, King-wa; Hale, Scott A.

doi:10.1145/2786451.2786486

Computer Science > Social and Information Networks

arXiv:1506.00572 (cs)

[Submitted on 1 Jun 2015 (v1), last revised 13 Jun 2015 (this version, v2)]

Title:How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Authors:Han-Teng Liao, King-wa Fu, Scott A. Hale

View PDF

Abstract:This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese, and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying "how much can be said" in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.

Comments:	9 pages, 4 figures WebSci 2015
Subjects:	Social and Information Networks (cs.SI); Computation and Language (cs.CL); Computers and Society (cs.CY)
ACM classes:	H.5.3, H.5.4
Cite as:	arXiv:1506.00572 [cs.SI]
	(or arXiv:1506.00572v2 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.1506.00572
Related DOI:	https://doi.org/10.1145/2786451.2786486

Submission history

From: Scott A. Hale [view email]
[v1] Mon, 1 Jun 2015 17:06:00 UTC (157 KB)
[v2] Sat, 13 Jun 2015 14:37:25 UTC (158 KB)

Computer Science > Social and Information Networks

Title:How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title:How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators