-
Computationally Intensive Research: Advancing a Role for Secondary Analysis of Qualitative Data
Authors:
Kaveh Mohajeri,
Amir Karami
Abstract:
This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of this data usually remains unused. In this paper, we first make a general case for secondary analysis of qualitative data by discussing its benefits, distinctions,…
▽ More
This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of this data usually remains unused. In this paper, we first make a general case for secondary analysis of qualitative data by discussing its benefits, distinctions, and epistemological aspects. We then argue for opportunities with computationally intensive secondary analysis, highlighting the possibility of drawing on data assemblages spanning multiple contexts and timeframes to address cross-contextual and longitudinal research phenomena and questions. We propose a scheme to perform computationally intensive secondary analysis and advance ideas on how this approach can help facilitate the development of innovative research designs. Finally, we enumerate some key challenges and ongoing concerns associated with qualitative data sharing and reuse.
△ Less
Submitted 15 January, 2025;
originally announced June 2025.
-
Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
Authors:
Ali Karami,
Thi Kieu Khanh Ho,
Narges Armanfard
Abstract:
Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action.…
▽ More
Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.
△ Less
Submitted 30 August, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
NERFBK: A High-Quality Benchmark for NERF-Based 3D Reconstruction
Authors:
Ali Karami,
Simone Rigon,
Gabriele Mazzacca,
Ziyang Yan,
Fabio Remondino
Abstract:
This paper introduces a new real and synthetic dataset called NeRFBK specifically designed for testing and comparing NeRF-based 3D reconstruction algorithms. High-quality 3D reconstruction has significant potential in various fields, and advancements in image-based algorithms make it essential to evaluate new advanced techniques. However, gathering diverse data with precise ground truth is challen…
▽ More
This paper introduces a new real and synthetic dataset called NeRFBK specifically designed for testing and comparing NeRF-based 3D reconstruction algorithms. High-quality 3D reconstruction has significant potential in various fields, and advancements in image-based algorithms make it essential to evaluate new advanced techniques. However, gathering diverse data with precise ground truth is challenging and may not encompass all relevant applications. The NeRFBK dataset addresses this issue by providing multi-scale, indoor and outdoor datasets with high-resolution images and videos and camera parameters for testing and comparing NeRF-based algorithms. This paper presents the design and creation of the NeRFBK benchmark, various examples and application scenarios, and highlights its potential for advancing the field of 3D reconstruction.
△ Less
Submitted 15 June, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Graph Anomaly Detection in Time Series: A Survey
Authors:
Thi Kieu Khanh Ho,
Ali Karami,
Narges Armanfard
Abstract:
With the recent advances in technology, a wide range of systems continue to collect a large amount of data over time and thus generate time series. Time-Series Anomaly Detection (TSAD) is an important task in various time-series applications such as e-commerce, cybersecurity, vehicle maintenance, and healthcare monitoring. However, this task is very challenging as it requires considering both the…
▽ More
With the recent advances in technology, a wide range of systems continue to collect a large amount of data over time and thus generate time series. Time-Series Anomaly Detection (TSAD) is an important task in various time-series applications such as e-commerce, cybersecurity, vehicle maintenance, and healthcare monitoring. However, this task is very challenging as it requires considering both the intra-variable dependency (relationships within a variable over time) and the inter-variable dependency (relationships between multiple variables) existing in time-series data. Recent graph-based approaches have made impressive progress in tackling the challenges of this field. In this survey, we conduct a comprehensive and up-to-date review of TSAD using graphs, referred to as G-TSAD. First, we explore the significant potential of graph representation for time-series data and and its contributions to facilitating anomaly detection. Then, we review state-of-the-art graph anomaly detection techniques, mostly leveraging deep learning architectures, in the context of time series. For each method, we discuss its strengths, limitations, and the specific applications where it excels. Finally, we address both the technical and application challenges currently facing the field, and suggest potential future directions for advancing research and improving practical outcomes.
△ Less
Submitted 29 April, 2025; v1 submitted 31 January, 2023;
originally announced February 2023.
-
2020 U.S. presidential election in swing states: Gender differences in Twitter conversations
Authors:
Amir Karami,
Spring B. Clark,
Anderson Mackenzie,
Dorathea Lee,
Michael Zhu,
Hannah R. Boyajieff,
Bailey Goldschmidt
Abstract:
Social media is commonly used by the public during election campaigns to express their opinions regarding different issues. Among various social media channels, Twitter provides an efficient platform for researchers and politicians to explore public opinion regarding a wide range of topics such as the economy and foreign policy. Current literature mainly focuses on analyzing the content of tweets…
▽ More
Social media is commonly used by the public during election campaigns to express their opinions regarding different issues. Among various social media channels, Twitter provides an efficient platform for researchers and politicians to explore public opinion regarding a wide range of topics such as the economy and foreign policy. Current literature mainly focuses on analyzing the content of tweets without considering the gender of users. This research collects and analyzes a large number of tweets and uses computational, human coding, and statistical analyses to identify topics in more than 300,000 tweets posted during the 2020 U.S. presidential election and to compare female and male users regarding the average weight of the discussed topics. Our findings are based upon a wide range of topics, such as tax, climate change, and the COVID-19 pandemic. Out of the topics, there exists a significant difference between female and male users for more than 70% of topics.
△ Less
Submitted 13 July, 2022; v1 submitted 20 August, 2021;
originally announced August 2021.
-
COVID-19 Vaccine and Social Media: Exploring Emotions and Discussions on Twitter
Authors:
Amir Karami,
Michael Zhu,
Bailey Goldschmidt,
Hannah R. Boyajieff,
Mahdi M. Najafabadi
Abstract:
The understanding of the public response to COVID-19 vaccines is the key success factor to control the COVID-19 pandemic. To understand the public response, there is a need to explore public opinion. Traditional surveys are expensive and time-consuming, address limited health topics, and obtain small-scale data. Twitter can provide a great opportunity to understand public opinion regarding COVID-1…
▽ More
The understanding of the public response to COVID-19 vaccines is the key success factor to control the COVID-19 pandemic. To understand the public response, there is a need to explore public opinion. Traditional surveys are expensive and time-consuming, address limited health topics, and obtain small-scale data. Twitter can provide a great opportunity to understand public opinion regarding COVID-19 vaccines. The current study proposes an approach using computational and human coding methods to collect and analyze a large number of tweets to provide a wider perspective on the COVID-19 vaccine. This study identifies the sentiment of tweets using a machine learning rule-based approach, discovers major topics, explores temporal trend and compares topics of negative and non-negative tweets using statistical tests, and discloses top topics of tweets having negative and non-negative sentiment. Our findings show that the negative sentiment regarding the COVID-19 vaccine had a decreasing trend between November 2020 and February 2021. We found Twitter users have discussed a wide range of topics from vaccination sites to the 2020 U.S. election between November 2020 and February 2021. The findings show that there was a significant difference between tweets having negative and non-negative sentiment regarding the weight of most topics. Our results also indicate that the negative and non-negative tweets had different topic priorities and focuses. This research illustrates that Twitter data can be used to explore public opinion regarding the COVID-19 vaccine.
△ Less
Submitted 29 September, 2021; v1 submitted 29 July, 2021;
originally announced August 2021.
-
Social Media and COVID-19: Can Social Distancing be Quantified without Measuring Human Movements?
Authors:
Mackenzie Anderson,
Amir Karami,
Parisa Bozorgi
Abstract:
The COVID-19 outbreak has posed significant threats to international health and the economy. In the absence of treatment for this virus, public health officials asked the public to practice social distancing to reduce the number of physical contacts. However, quantifying social distancing is a challenging task and current methods are based on human movements. We propose a time and cost-effective a…
▽ More
The COVID-19 outbreak has posed significant threats to international health and the economy. In the absence of treatment for this virus, public health officials asked the public to practice social distancing to reduce the number of physical contacts. However, quantifying social distancing is a challenging task and current methods are based on human movements. We propose a time and cost-effective approach to measure how people practice social distancing. This study proposes a new method based on utilizing the frequency of hashtags supporting and encouraging social distancing for measuring social distancing. We have identified 18 related hashtags and tracked their trends between Jan and May 2020. Our evaluation results show that there is a strong correlation (P<0.05) between our findings and the Google social distancing report.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
Unwanted Advances in Higher Education: Uncovering Sexual Harassment Experiences in Academia with Text Mining
Authors:
Amir Karami,
Cynthia Nicole White,
Kayla Ford,
Suzanne Swan,
Melek Yildiz Spinel
Abstract:
Sexual harassment in academia is often a hidden problem because victims are usually reluctant to report their experiences. Recently, a web survey was developed to provide an opportunity to share thousands of sexual harassment experiences in academia. Using an efficient approach, this study collected and investigated more than 2,000 sexual harassment experiences to better understand these unwanted…
▽ More
Sexual harassment in academia is often a hidden problem because victims are usually reluctant to report their experiences. Recently, a web survey was developed to provide an opportunity to share thousands of sexual harassment experiences in academia. Using an efficient approach, this study collected and investigated more than 2,000 sexual harassment experiences to better understand these unwanted advances in higher education. This paper utilized text mining to disclose hidden topics and explore their weight across three variables: harasser gender, institution type, and victim's field of study. We mapped the topics on five themes drawn from the sexual harassment literature and found that more than 50% of the topics were assigned to the unwanted sexual attention theme. Fourteen percent of the topics were in the gender harassment theme, in which insulting, sexist, or degrading comments or behavior was directed towards women. Five percent of the topics involved sexual coercion (a benefit is offered in exchange for sexual favors), 5% involved sex discrimination, and 7% of the topics discussed retaliation against the victim for reporting the harassment, or for simply not complying with the harasser. Findings highlight the power differential between faculty and students, and the toll on students when professors abuse their power. While some topics did differ based on type of institution, there were no differences between the topics based on gender of harasser or field of study. This research can be beneficial to researchers in further investigation of this paper's dataset, and to policymakers in improving existing policies to create a safe and supportive environment in academia.
△ Less
Submitted 11 December, 2019;
originally announced January 2020.
-
FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents
Authors:
Amir Karami,
Aryya Gangopadhyay,
Bin Zhou,
Hadi Kharrazi
Abstract:
One of the challenges for text analysis in medical domains is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. The themes in the documents help to retrieve documents on the same topic with and without a query. In this…
▽ More
One of the challenges for text analysis in medical domains is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. The themes in the documents help to retrieve documents on the same topic with and without a query. In this paper, we present a novel approach to topic modeling using fuzzy clustering. To evaluate our model, we experiment with two text datasets of medical documents. The evaluation metrics carried out through document classification and document modeling show that our model produces better performance than LDA, indicating that fuzzy set theory can improve the performance of topic models in medical domains.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Application of Fuzzy Clustering for Text Data Dimensionality Reduction
Authors:
Amir Karami
Abstract:
Large textual corpora are often represented by the document-term frequency matrix whose elements are the frequency of terms; however, this matrix has two problems: sparsity and high dimensionality. Four dimension reduction strategies are used to address these problems. Of the four strategies, unsupervised feature transformation (UFT) is a popular and efficient strategy to map the terms to a new ba…
▽ More
Large textual corpora are often represented by the document-term frequency matrix whose elements are the frequency of terms; however, this matrix has two problems: sparsity and high dimensionality. Four dimension reduction strategies are used to address these problems. Of the four strategies, unsupervised feature transformation (UFT) is a popular and efficient strategy to map the terms to a new basis in the document-term frequency matrix. Although several UFT-based methods have been developed, fuzzy clustering has not been considered for dimensionality reduction. This research explores fuzzy clustering as a new UFT-based approach to create a lower-dimensional representation of documents. Performance of fuzzy clustering with and without using global term weighting methods is shown to exceed principal component analysis and singular value decomposition. This study also explores the effect of applying different fuzzifier values on fuzzy clustering for dimensionality reduction purpose.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
-
Hidden in Plain Sight For Too Long: Using Text Mining Techniques to Shine a Light on Workplace Sexism and Sexual Harassment
Authors:
Amir Karami,
Suzanne C. Swan,
Cynthia Nicole White,
Kayla Ford
Abstract:
Objective: The goal of this study is to understand how people experience sexism and sexual harassment in the workplace by discovering themes in 2,362 experiences posted on the Everyday Sexism Project's website everydaysexism.com. Method: This study used both quantitative and qualitative methods. The quantitative method was a computational framework to collect and analyze a large number of workplac…
▽ More
Objective: The goal of this study is to understand how people experience sexism and sexual harassment in the workplace by discovering themes in 2,362 experiences posted on the Everyday Sexism Project's website everydaysexism.com. Method: This study used both quantitative and qualitative methods. The quantitative method was a computational framework to collect and analyze a large number of workplace sexual harassment experiences. The qualitative method was the analysis of the topics generated by a text mining method. Results: Twenty-three topics were coded and then grouped into three overarching themes from the sex discrimination and sexual harassment literature. The Sex Discrimination theme included experiences in which women were treated unfavorably due to their sex, such as being passed over for promotion, denied opportunities, paid less than men, and ignored or talked over in meetings. The Sex Discrimination and Gender harassment theme included stories about sex discrimination and gender harassment, such as sexist hostility behaviors ranging from insults and jokes invoking misogynistic stereotypes to bullying behavior. The last theme, Unwanted Sexual Attention, contained stories describing sexual comments and behaviors used to degrade women. Unwanted touching was the highest weighted topic, indicating how common it was for website users to endure being touched, hugged or kissed, groped, and grabbed. Conclusions: This study illustrates how researchers can use automatic processes to go beyond the limits of traditional research methods and investigate naturally occurring large scale datasets on the internet to achieve a better understanding of everyday workplace sexism experiences.
△ Less
Submitted 30 June, 2019;
originally announced July 2019.
-
Exploring Diseases and Syndromes in Neurology Case Reports from 1955 to 2017 with Text Mining
Authors:
Amir Karami,
Mehdi Ghasemi,
Souvik Sen,
Marcos Moraes,
Vishal Shah
Abstract:
Background: A large number of neurology case reports have been published, but it is a challenging task for human medical experts to explore all of these publications. Text mining offers a computational approach to investigate neurology literature and capture meaningful patterns. The overarching goal of this study is to provide a new perspective on case reports of neurological disease and syndrome…
▽ More
Background: A large number of neurology case reports have been published, but it is a challenging task for human medical experts to explore all of these publications. Text mining offers a computational approach to investigate neurology literature and capture meaningful patterns. The overarching goal of this study is to provide a new perspective on case reports of neurological disease and syndrome analysis over the last six decades using text mining.
Methods: We extracted diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017. Text mining was applied to reports on the detected DsSs to investigate high-frequency DsSs, categorize them, and explore the linear trends over the 63-year time frame.
Results: The text mining methods explored high-frequency neurologic DsSs and their trends and the relationships between them from 1955 to 2017. We detected more than 18,000 unique DsSs and found 10 categories of neurologic DsSs. While the trend analysis showed the increasing trends in the case reports for top-10 high-frequency DsSs, the categories had mixed trends.
Conclusion: Our study provided new insights into the application of text mining methods to investigate DsSs in a large number of medical case reports that occur over several decades. The proposed approach can be used to provide a macro level analysis of medical literature by discovering interesting patterns and tracking them over several years to help physicians explore these case reports more efficiently.
△ Less
Submitted 23 May, 2019;
originally announced June 2019.
-
Twitter Speaks: A Case of National Disaster Situational Awareness
Authors:
Amir Karami,
Vishal Shah,
Reza Vaezi,
Amit Bansal
Abstract:
In recent years, we have been faced with a series of natural disasters causing a tremendous amount of financial, environmental, and human losses. The unpredictable nature of natural disasters' behavior makes it hard to have a comprehensive situational awareness (SA) to support disaster management. Using opinion surveys is a traditional approach to analyze public concerns during natural disasters;…
▽ More
In recent years, we have been faced with a series of natural disasters causing a tremendous amount of financial, environmental, and human losses. The unpredictable nature of natural disasters' behavior makes it hard to have a comprehensive situational awareness (SA) to support disaster management. Using opinion surveys is a traditional approach to analyze public concerns during natural disasters; however, this approach is limited, expensive, and time-consuming. Luckily the advent of social media has provided scholars with an alternative means of analyzing public concerns. Social media enable users (people) to freely communicate their opinions and disperse information regarding current events including natural disasters. This research emphasizes the value of social media analysis and proposes an analytical framework: Twitter Situational Awareness (TwiSA). This framework uses text mining methods including sentiment analysis and topic modeling to create a better SA for disaster preparedness, response, and recovery. TwiSA has also effectively deployed on a large number of tweets and tracks the negative concerns of people during the 2015 South Carolina flood.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
An Exploratory Study of (#)Exercise in the Twittersphere
Authors:
George Shaw,
Amir Karami
Abstract:
Social media analytics allows us to extract, analyze, and establish semantic from user-generated contents in social media platforms. This study utilized a mixed method including a three-step process of data collection, topic modeling, and data annotation for recognizing exercise related patterns. Based on the findings, 86% of the detected topics were identified as meaningful topics after conductin…
▽ More
Social media analytics allows us to extract, analyze, and establish semantic from user-generated contents in social media platforms. This study utilized a mixed method including a three-step process of data collection, topic modeling, and data annotation for recognizing exercise related patterns. Based on the findings, 86% of the detected topics were identified as meaningful topics after conducting the data annotation process. The most discussed exercise-related topics were physical activity (18.7%), lifestyle behaviors (6.6%), and dieting (4%). The results from our experiment indicate that the exploratory data analysis is a practical approach to summarizing the various characteristics of text data for different health and medical applications.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Political Popularity Analysis in Social Media
Authors:
Amir Karami,
Aida Elkouri
Abstract:
Popularity is a critical success factor for a politician and her/his party to win in elections and implement their plans. Finding the reasons behind the popularity can provide a stable political movement. This research attempts to measure popularity in Twitter using a mixed method. In recent years, Twitter data has provided an excellent opportunity for exploring public opinions by analyzing a larg…
▽ More
Popularity is a critical success factor for a politician and her/his party to win in elections and implement their plans. Finding the reasons behind the popularity can provide a stable political movement. This research attempts to measure popularity in Twitter using a mixed method. In recent years, Twitter data has provided an excellent opportunity for exploring public opinions by analyzing a large number of tweets. This study has collected and examined 4.5 million tweets related to a US politician, Senator Bernie Sanders. This study investigated eight economic reasons behind the senator's popularity in Twitter. This research has benefits for politicians, informatics experts, and policymakers to explore public opinion. The collected data will also be available for further investigation.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
"Life never matters in the DEMOCRATS MIND": Examining Strategies of Retweeted Social Bots During a Mass Shooting Event
Authors:
Vanessa L. Kitzie,
Ehsan Mohammadi,
Amir Karami
Abstract:
This exploratory study examines the strategies of social bots on Twitter that were retweeted following a mass shooting event. Using a case study method to frame our work, we collected over seven million tweets during a one-month period following a mass shooting in Parkland, Florida. From this dataset, we selected retweets of content generated by over 400 social bot accounts to determine what strat…
▽ More
This exploratory study examines the strategies of social bots on Twitter that were retweeted following a mass shooting event. Using a case study method to frame our work, we collected over seven million tweets during a one-month period following a mass shooting in Parkland, Florida. From this dataset, we selected retweets of content generated by over 400 social bot accounts to determine what strategies these bots were using and the effectiveness of these strategies as indicated by the number of retweets. We employed qualitative and quantitative methods to capture both macro- and micro-level perspectives. Our findings suggest that bots engage in more diverse strategies than solely waging disinformation campaigns, including baiting and sharing information. Further, we found that while bots amplify conversation about mass shootings, humans were primarily responsible for disseminating bot-generated content. These findings add depth to the current understanding of bot strategies and their effectiveness. Understanding these strategies can inform efforts to combat dubious information as well as more insidious disinformation campaigns.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.
-
Characterizing Transgender Health Issues in Twitter
Authors:
Amir Karami,
Frank Webb,
Vanessa L. Kitzie
Abstract:
Although there are millions of transgender people in the world, a lack of information exists about their health issues. This issue has consequences for the medical field, which only has a nascent understanding of how to identify and meet this population's health-related needs. Social media sites like Twitter provide new opportunities for transgender people to overcome these barriers by sharing the…
▽ More
Although there are millions of transgender people in the world, a lack of information exists about their health issues. This issue has consequences for the medical field, which only has a nascent understanding of how to identify and meet this population's health-related needs. Social media sites like Twitter provide new opportunities for transgender people to overcome these barriers by sharing their personal health experiences. Our research employs a computational framework to collect tweets from self-identified transgender users, detect those that are health-related, and identify their information needs. This framework is significant because it provides a macro-scale perspective on an issue that lacks investigation at national or demographic levels. Our findings identified 54 distinct health-related topics that we grouped into 7 broader categories. Further, we found both linguistic and topical differences in the health-related information shared by transgender men (TM) as com-pared to transgender women (TW). These findings can help inform medical and policy-based strategies for health interventions within transgender communities. Also, our proposed approach can inform the development of computational strategies to identify the health-related information needs of other marginalized populations.
△ Less
Submitted 28 September, 2018; v1 submitted 17 August, 2018;
originally announced August 2018.
-
What do the US West Coast Public Libraries Post on Twitter?
Authors:
Amir Karami,
Matthew Collins
Abstract:
Twitter has provided a great opportunity for public libraries to disseminate information for a variety of purposes. Twitter data have been applied in different domains such as health, politics, and history. There are thousands of public libraries in the US, but no study has yet investigated the content of their social media posts like tweets to find their interests. Moreover, traditional content a…
▽ More
Twitter has provided a great opportunity for public libraries to disseminate information for a variety of purposes. Twitter data have been applied in different domains such as health, politics, and history. There are thousands of public libraries in the US, but no study has yet investigated the content of their social media posts like tweets to find their interests. Moreover, traditional content analysis of Twitter content is not an efficient task for exploring thousands of tweets. Therefore, there is a need for automatic methods to overcome the limitations of manual methods. This paper proposes a computational approach to collecting and analyzing using Twitter Application Programming Interfaces (API) and investigates more than 138,000 tweets from 48 US west coast libraries using topic modeling. We found 20 topics and assigned them to five categories including public relations, book, event, training, and social good. Our results show that the US west coast libraries are more interested in using Twitter for public relations and book-related events. This research has both practical and theoretical applications for libraries as well as other organizations to explore social media actives of their customer and themselves.
△ Less
Submitted 28 September, 2018; v1 submitted 17 August, 2018;
originally announced August 2018.
-
Computational Analysis of Insurance Complaints: GEICO Case Study
Authors:
Amir Karami,
Noelle M. Pendergraft
Abstract:
The online environment has provided a great opportunity for insurance policyholders to share their complaints with respect to different services. These complaints can reveal valuable information for insurance companies who seek to improve their services; however, analyzing a huge number of online complaints is a complicated task for human and must involve computational methods to create an efficie…
▽ More
The online environment has provided a great opportunity for insurance policyholders to share their complaints with respect to different services. These complaints can reveal valuable information for insurance companies who seek to improve their services; however, analyzing a huge number of online complaints is a complicated task for human and must involve computational methods to create an efficient process. This research proposes a computational approach to characterize the major topics of a large number of online complaints. Our approach is based on using the topic modeling approach to disclose the latent semantic of complaints. The proposed approach deployed on thousands of GEICO negative reviews. Analyzing 1,371 GEICO complaints indicates that there are 30 major complains in four categories: (1) customer service, (2) insurance coverage, paperwork, policy, and reports, (3) legal issues, and (4) costs, estimates, and payments. This research approach can be used in other applications to explore a large number of reviews.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Characterizing Diseases and disorders in Gay Users' tweets
Authors:
Frank Webb,
Amir Karami,
Vanessa Kitzie
Abstract:
A lack of information exists about the health issues of lesbian, gay, bisexual, transgender, and queer (LGBTQ) people who are often excluded from national demographic assessments, health studies, and clinical trials. As a result, medical experts and researchers lack a holistic understanding of the health disparities facing these populations. Fortunately, publicly available social media data such a…
▽ More
A lack of information exists about the health issues of lesbian, gay, bisexual, transgender, and queer (LGBTQ) people who are often excluded from national demographic assessments, health studies, and clinical trials. As a result, medical experts and researchers lack a holistic understanding of the health disparities facing these populations. Fortunately, publicly available social media data such as Twitter data can be utilized to support the decisions of public health policy makers and managers with respect to LGBTQ people. This research employs a computational approach to collect tweets from gay users on health-related topics and model these topics. To determine the nature of health-related information shared by men who have sex with men on Twitter, we collected thousands of tweets from 177 active users. We sampled these tweets using a framework that can be applied to other LGBTQ sub-populations in future research. We found 11 diseases in 7 categories based on ICD 10 that are in line with the published studies and official reports.
△ Less
Submitted 24 March, 2018;
originally announced March 2018.
-
Social Media Analysis For Organizations: Us Northeastern Public And State Libraries Case Study
Authors:
Matthew Collins,
Amir Karami
Abstract:
Social networking sites such as Twitter have provided a great opportunity for organizations such as public libraries to disseminate information for public relations purposes. However, there is a need to analyze vast amounts of social media data. This study presents a computational approach to explore the content of tweets posted by nine public libraries in the northeastern United States of America…
▽ More
Social networking sites such as Twitter have provided a great opportunity for organizations such as public libraries to disseminate information for public relations purposes. However, there is a need to analyze vast amounts of social media data. This study presents a computational approach to explore the content of tweets posted by nine public libraries in the northeastern United States of America. In December 2017, this study extracted more than 19,000 tweets from the Twitter accounts of seven state libraries and two urban public libraries. Computational methods were applied to collect the tweets and discover meaningful themes. This paper shows how the libraries have used Twitter to represent their services and provides a starting point for different organizations to evaluate the themes of their public tweets.
△ Less
Submitted 24 March, 2018;
originally announced March 2018.
-
All nearest neighbor calculation based on Delaunay graphs
Authors:
Nasrin Mazaheri Soudani,
Ali Karami
Abstract:
When we have two data sets and want to find the nearest neighbour of each point in the first dataset among points in the second one, we need the all nearest neighbour operator. This is an operator in spatial databases that has many application in different fields such as GIS and VLSI circuit design. Existing algorithms for calculating this operator assume that there is no pre computation on these…
▽ More
When we have two data sets and want to find the nearest neighbour of each point in the first dataset among points in the second one, we need the all nearest neighbour operator. This is an operator in spatial databases that has many application in different fields such as GIS and VLSI circuit design. Existing algorithms for calculating this operator assume that there is no pre computation on these data sets. These algorithms has o(n*m*d) time complexity where n and m are the number of points in two data sets and d is the dimension of data points. With assumption of some pre computation on data sets algorithms with lower time complexity can be obtained. One of the most common pre computation on spatial data is Delaunay graphs. In the Delaunay graph of a data set each point is linked to its nearest neighbours. In this paper, we introduce an algorithm for computing the all nearest neighbour operator on spatial data sets based on their Delaunay graphs. The performance of this algorithm is compared with one of the best existing algorithms for computing ANN operator in terms of CPU time and the number of IOs. The experimental results show that this algorithm has better performance than the other.
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
Mining Public Opinion about Economic Issues: Twitter and the U.S. Presidential Election
Authors:
Amir Karami,
London S. Bennett,
Xiaoyun He
Abstract:
Opinion polls have been the bridge between public opinion and politicians in elections. However, developing surveys to disclose people's feedback with respect to economic issues is limited, expensive, and time-consuming. In recent years, social media such as Twitter has enabled people to share their opinions regarding elections. Social media has provided a platform for collecting a large amount of…
▽ More
Opinion polls have been the bridge between public opinion and politicians in elections. However, developing surveys to disclose people's feedback with respect to economic issues is limited, expensive, and time-consuming. In recent years, social media such as Twitter has enabled people to share their opinions regarding elections. Social media has provided a platform for collecting a large amount of social media data. This paper proposes a computational public opinion mining approach to explore the discussion of economic issues in social media during an election. Current related studies use text mining methods independently for election analysis and election prediction; this research combines two text mining methods: sentiment analysis and topic modeling. The proposed approach has effectively been deployed on millions of tweets to analyze economic concerns of people during the 2012 US presidential election.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
Taming Wild High Dimensional Text Data with a Fuzzy Lash
Authors:
Amir Karami
Abstract:
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words o…
▽ More
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy.
△ Less
Submitted 16 December, 2017;
originally announced December 2017.
-
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification
Authors:
Ali Diba,
Mohsen Fayyaz,
Vivek Sharma,
Amir Hossein Karami,
Mohammad Mahdi Arzani,
Rahman Yousefzadeh,
Luc Van Gool
Abstract:
The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We e…
▽ More
The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets.
The other issue in training 3D ConvNets is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D ConvNets is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and finetuned on the target datasets, e.g. HMDB51/UCF101. The T3D codes will be released
△ Less
Submitted 22 November, 2017;
originally announced November 2017.
-
Characterizing Diabetes, Diet, Exercise, and Obesity Comments on Twitter
Authors:
Amir Karami,
Alicia A. Dahl,
Gabrielle Turner-McGrievy,
Hadi Kharrazi, Jr.,
George Shaw
Abstract:
Social media provide a platform for users to express their opinions and share information. Understanding public health opinions on social media, such as Twitter, offers a unique approach to characterizing common health issues such as diabetes, diet, exercise, and obesity (DDEO), however, collecting and analyzing a large scale conversational public health data set is a challenging research task. Th…
▽ More
Social media provide a platform for users to express their opinions and share information. Understanding public health opinions on social media, such as Twitter, offers a unique approach to characterizing common health issues such as diabetes, diet, exercise, and obesity (DDEO), however, collecting and analyzing a large scale conversational public health data set is a challenging research task. The goal of this research is to analyze the characteristics of the general public's opinions in regard to diabetes, diet, exercise and obesity (DDEO) as expressed on Twitter. A multi-component semantic and linguistic framework was developed to collect Twitter data, discover topics of interest about DDEO, and analyze the topics. From the extracted 4.5 million tweets, 8% of tweets discussed diabetes, 23.7% diet, 16.6% exercise, and 51.7% obesity. The strongest correlation among the topics was determined between exercise and obesity. Other notable correlations were: diabetes and obesity, and diet and obesity DDEO terms were also identified as subtopics of each of the DDEO topics. The frequent subtopics discussed along with Diabetes, excluding the DDEO terms themselves, were blood pressure, heart attack, yoga, and Alzheimer. The non-DDEO subtopics for Diet included vegetarian, pregnancy, celebrities, weight loss, religious, and mental health, while subtopics for Exercise included computer games, brain, fitness, and daily plan. Non-DDEO subtopics for Obesity included Alzheimer, cancer, and children. With 2.67 billion social media users in 2016, publicly available data such as Twitter posts can be utilized to support clinical providers, public health experts, and social scientists in better understanding common public opinions in regard to diabetes, diet, exercise, and obesity.
△ Less
Submitted 22 September, 2017;
originally announced September 2017.
-
Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise
Authors:
George Shaw Jr.,
Amir Karami
Abstract:
Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users' post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments, regar…
▽ More
Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users' post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments, regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the collection of 6 million Tweets for one month, this study identified the prominent topics of users as it relates to the negative sentiments. Our proposed framework uses two text mining methods, sentiment analysis and topic modeling, to discover negative topics. The negative sentiments of Twitter users support the literature narratives and the many morbidity issues that are associated with DDEO and the linkage between obesity and diabetes. The framework offers a potential method to understand the publics' opinions and sentiments regarding DDEO. More importantly, this research provides new opportunities for computational social scientists, medical experts, and public health professionals to collectively address DDEO-related issues.
△ Less
Submitted 22 September, 2017;
originally announced September 2017.
-
Fuzzy Approach Topic Discovery in Health and Medical Corpora
Authors:
Amir Karami,
Aryya Gangopadhyay,
Bin Zhou,
Hadi Kharrazi
Abstract:
The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the…
▽ More
The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health & medical corpora is topic modeling, however, this approach still needs new perspectives. In this research we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health & medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation (LDA), the most popular topic model.
△ Less
Submitted 25 May, 2017; v1 submitted 2 May, 2017;
originally announced May 2017.
-
Novel LDPC Decoder via MLP Neural Networks
Authors:
Alireza Karami,
Mahmoud Ahmadian Attari
Abstract:
In this paper, a new method for decoding Low Density Parity Check (LDPC) codes, based on Multi-Layer Perceptron (MLP) neural networks is proposed. Due to the fact that in neural networks all procedures are processed in parallel, this method can be considered as a viable alternative to Message Passing Algorithm (MPA), with high computational complexity. Our proposed algorithm runs with soft criteri…
▽ More
In this paper, a new method for decoding Low Density Parity Check (LDPC) codes, based on Multi-Layer Perceptron (MLP) neural networks is proposed. Due to the fact that in neural networks all procedures are processed in parallel, this method can be considered as a viable alternative to Message Passing Algorithm (MPA), with high computational complexity. Our proposed algorithm runs with soft criterion and concurrently does not use probabilistic quantities to decide what the estimated codeword is. Although the neural decoder performance is close to the error performance of Sum Product Algorithm (SPA), it is comparatively less complex. Therefore, the proposed decoder emerges as a new infrastructure for decoding LDPC codes.
△ Less
Submitted 12 November, 2014;
originally announced November 2014.
-
A Concurrency Control Method Based on Commitment Ordering in Mobile Databases
Authors:
Ali Karami,
Ahmad Baraani-Dastjerdi
Abstract:
Disconnection of mobile clients from server, in an unclear time and for an unknown duration, due to mobility of mobile clients, is the most important challenges for concurrency control in mobile database with client-server model. Applying pessimistic common classic methods of concurrency control (like 2pl) in mobile database leads to long duration blocking and increasing waiting time of transactio…
▽ More
Disconnection of mobile clients from server, in an unclear time and for an unknown duration, due to mobility of mobile clients, is the most important challenges for concurrency control in mobile database with client-server model. Applying pessimistic common classic methods of concurrency control (like 2pl) in mobile database leads to long duration blocking and increasing waiting time of transactions. Because of high rate of aborting transactions, optimistic methods aren`t appropriate in mobile database. In this article, OPCOT concurrency control algorithm is introduced based on optimistic concurrency control method. Reducing communications between mobile client and server, decreasing blocking rate and deadlock of transactions, and increasing concurrency degree are the most important motivation of using optimistic method as the basis method of OPCOT algorithm. To reduce abortion rate of transactions, in execution time of transactions` operators a timestamp is assigned to them. In other to checking commitment ordering property of scheduler, the assigned timestamp is used in server on time of commitment. In this article, serializability of OPCOT algorithm scheduler has been proved by using serializability graph. Results of evaluating simulation show that OPCOT algorithm decreases abortion rate and waiting time of transactions in compare to 2pl and optimistic algorithms.
△ Less
Submitted 9 December, 2011;
originally announced December 2011.