Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework

Khan, Sakib Mahmud; Chowdhury, Mashrur; Ngo, Linh B.; Apon, Amy

Computer Science > Social and Information Networks

arXiv:1905.02916 (cs)

[Submitted on 8 May 2019 (v1), last revised 28 Aug 2019 (this version, v3)]

Title:Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework

Authors:Sakib Mahmud Khan, Mashrur Chowdhury, Linh B. Ngo, Amy Apon

View PDF

Abstract:This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labelled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves more than 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation-related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. Findings from this study demonstrate that the analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation-related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM.

Subjects:	Social and Information Networks (cs.SI)
Cite as:	arXiv:1905.02916 [cs.SI]
	(or arXiv:1905.02916v3 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.1905.02916
Journal reference:	Cities 96(2020)

Submission history

From: Sakib Khan [view email]
[v1] Wed, 8 May 2019 05:08:59 UTC (1,397 KB)
[v2] Thu, 18 Jul 2019 14:10:58 UTC (1,273 KB)
[v3] Wed, 28 Aug 2019 19:04:13 UTC (1,339 KB)

Computer Science > Social and Information Networks

Title:Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title:Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators