-
Building Dynamic Ontological Models for Place using Social Media Data from Twitter and Sina Weibo
Authors:
Ming-Hsiang Tsou,
Qingyun Zhang,
Jian Xu,
Atsushi Nara,
Mark Gawron
Abstract:
Place holds human thoughts and experiences. Space is defined with geometric measurement and coordinate systems. Social media served as the connection between place and space. In this study, we use social media data (Twitter, Weibo) to build a dynamic ontological model in two separate areas: Beijing, China and San Diego, the U.S.A. Three spatial analytics methods are utilized to generate the place…
▽ More
Place holds human thoughts and experiences. Space is defined with geometric measurement and coordinate systems. Social media served as the connection between place and space. In this study, we use social media data (Twitter, Weibo) to build a dynamic ontological model in two separate areas: Beijing, China and San Diego, the U.S.A. Three spatial analytics methods are utilized to generate the place name ontology: 1) Kernel Density Estimation (KDE); 2) Dynamic Method Density-based spatial clustering of applications with noise (DBSCAN); 3) hierarchal clustering. We identified feature types of place name ontologies from geotagged social media data and classified them by comparing their default search radius of KDE of geo-tagged points. By tracing the seasonal changes of highly dynamic non-administrative places, seasonal variation patterns were observed, which illustrates the dynamic changes in place ontology caused by the change in human activities and conversation over time and space. We also investigate the semantic meaning of each place name by examining Pointwise Mutual Information (PMI) scores and word clouds. The major contribution of this research is to link and analyze the associations between place, space, and their attributes in the field of geography. Researchers can use crowd-sourced data to study the ontology of places rather than relying on traditional gazetteers. The dynamic ontology in this research can provide bright insight into urban planning and re-zoning and other related industries.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Estimating hourly population distribution change at high spatiotemporal resolution in urban areas using geo-tagged tweets, land use data, and dasymetric maps
Authors:
Ming-Hsiang Tsou,
Hao Zhang,
Atsushi Nara,
Su Yeon Han
Abstract:
This paper introduces a spatiotemporal analysis framework for estimating hourly changing population distribution in urban areas using geo-tagged tweets (the messages containing users' physical locations), land use data, and dasymetric maps. We collected geo-tagged social media (tweets) within the County of San Diego during one year (2015) by using Twitter's Streaming Application Programming Interf…
▽ More
This paper introduces a spatiotemporal analysis framework for estimating hourly changing population distribution in urban areas using geo-tagged tweets (the messages containing users' physical locations), land use data, and dasymetric maps. We collected geo-tagged social media (tweets) within the County of San Diego during one year (2015) by using Twitter's Streaming Application Programming Interfaces (APIs). A semi-manual Twitter content verification procedure for data cleaning was applied first to separate tweets created by humans and non-human users (bots). The next step is to calculate the number of unique Twitter users every hour with the two different geographical units: (1) census blocks, and (2) 1km by 1km resolution grids of LandScan. The final step is to estimate actual dynamic population by transforming the numbers of unique Twitter users in each census block or grid into estimated population densities with spatial and temporal variation factors. A temporal factor was based on hourly frequency changes of unique Twitter users within San Diego County, CA. A spatial factor was estimated by using the dasymetric method with land use maps and 2010 census data. Several comparison maps were created to visualize the spatiotemporal pattern changes of dynamic population distribution.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Mapping Web Pages by Internet Protocol (IP) addresses: Analyzing Spatial and Temporal Characteristics of Web Search Engine Results
Authors:
Ming-Hsiang Tsou,
Daniel Lusher
Abstract:
Internet Protocol (IP) addresses are frequently used as a method of locating web users by researchers in several different fields. However, there are competing reports concerning the accuracy of those locations, and little research has been done in manually comparing the IP geolocation databases and web page geographic information. This paper categorized web page from the Yahoo search engine into…
▽ More
Internet Protocol (IP) addresses are frequently used as a method of locating web users by researchers in several different fields. However, there are competing reports concerning the accuracy of those locations, and little research has been done in manually comparing the IP geolocation databases and web page geographic information. This paper categorized web page from the Yahoo search engine into twelve categories, ranging from 'Blog' and 'News' to 'Education' and 'Governmental'. Then we manually compared the mailing or street address of the web page's content creator with the geolocation results by the given IP address. We introduced a cartographic design method by creating kernel density maps for visualizing the information landscape of web pages associated with specific keywords.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets)
Authors:
Ming-Hsiang Tsou,
Hao Zhang,
Chin-Te Jung
Abstract:
Many social media researchers and data scientists collected geo-tagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percen…
▽ More
Many social media researchers and data scientists collected geo-tagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percentage (1%) of active Twitter users can create a large portion (16%) of geo-tagged tweets. Second, there is a significant amount (57.3%) of geo-tagged tweets located outside the Twitter Streaming API's bounding box in San Diego. Third, we can detect spam, bot, cyborg tweets (data noises) by examining the "source" metadata field. The portion of data noises in geo-tagged tweets is significant (29.42% in San Diego, CA and 53.47% in Columbus, OH) in our case study. Finally, the majority of geo-tagged tweets are not created by the generic Twitter apps in Android or iPhone devices, but by other platforms, such as Instagram and Foursquare. We recommend a multi-step procedure to remove these noises for the future research projects utilizing geo-tagged tweets.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.