Skip to main content

Showing 1–20 of 20 results for author: Shahi, G K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01984  [pdf, ps, other

    cs.LG cs.CL cs.SI

    Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

    Authors: Gautam Kishore Shahi

    Abstract: Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few studies have explored multimodal feature combinations, such as integrating text and images for building a classification model to detect misinformation. This study in… ▽ More

    Submitted 26 June, 2025; originally announced July 2025.

  2. arXiv:2504.08776  [pdf, other

    cs.CL cs.CY cs.LG

    SemCAFE: When Named Entities make the Difference Assessing Web Source Reliability through Entity-level Analytics

    Authors: Gautam Kishore Shahi, Oshani Seneviratne, Marc Spaniol

    Abstract: With the shift from traditional to digital media, the online landscape now hosts not only reliable news articles but also a significant amount of unreliable content. Digital media has faster reachability by significantly influencing public opinion and advancing political agendas. While newspaper readers may be familiar with their preferred outlets political leanings or credibility, determining unr… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  3. arXiv:2504.06976  [pdf, other

    cs.CY cs.ET cs.HC

    A Year of the DSA Transparency Database: What it (Does Not) Reveal About Platform Moderation During the 2024 European Parliament Election

    Authors: Gautam Kishore Shahi, Benedetta Tessa, Amaury Trujillo, Stefano Cresci

    Abstract: Social media platforms face heightened risks during major political events; yet, how platforms adapt their moderation practices in response remains unclear. The Digital Services Act Transparency Database offers an unprecedented opportunity to systematically study content moderation at scale, enabling researchers and policymakers to assess platforms' compliance and effectiveness. Herein, we analyze… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2502.18500  [pdf, other

    cs.SI cs.HC

    Too Little, Too Late: Moderation of Misinformation around the Russo-Ukrainian Conflict

    Authors: Gautam Kishore Shahi, Yelena Mejova

    Abstract: In this study, we examine the role of Twitter as a first line of defense against misinformation by tracking the public engagement with, and the platforms response to, 500 tweets concerning the RussoUkrainian conflict which were identified as misinformation. Using a realtime sample of 543 475 of their retweets, we find that users who geolocate themselves in the U.S. both produce and consume the lar… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 11 pages

  5. arXiv:2502.15745  [pdf, other

    cs.CL cs.DL cs.LG

    On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

    Authors: Gautam Kishore Shahi, Oliver Hummel

    Abstract: The rapid advancement of Large Language Models (LLMs) has led to a multitude of application opportunities. One traditional task for Information Retrieval systems is the summarization and classification of texts, both of which are important for supporting humans in navigating large literature bodies as they e.g. exist with scientific publications. Due to this rapidly growing body of scientific know… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  6. arXiv:2410.05287  [pdf, other

    cs.CL cs.AI cs.SI

    Hate Speech Detection Using Cross-Platform Social Media Data In English and German Language

    Authors: Gautam Kishore Shahi, Tim A. Majchrzak

    Abstract: Hate speech has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. Multiple approaches have been developed to detect hate speech using artificial intelligence, but a generalized model is yet unaccomplished. The challenge for hate speech detection as text classification is the cost of obtaining high-quality training data. This study focuses on dete… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  7. arXiv:2407.12968  [pdf, other

    cs.SI

    Multi-Platform Framing Analysis: A Case Study of Kristiansand Quran Burning

    Authors: Anna-Katharina Jung, Gautam Kishore Shahi, Jennifer Fromm, Kari Anne Røysland, Kim Henrik Gronert

    Abstract: The framing of events in various media and discourse spaces is crucial in the era of misinformation and polarization. Many studies, however, are limited to specific media or networks, disregarding the importance of cross-platform diffusion. This study overcomes that limitation by conducting a multi-platform framing analysis on Twitter, YouTube, and traditional media analyzing the 2019 Koran burnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  8. arXiv:2404.02921  [pdf, other

    cs.DL cs.HC cs.IR

    Enhancing Research Information Systems with Identification of Domain Experts

    Authors: Gautam Kishore Shahi, Oliver Hummel

    Abstract: Research organisations and their research outputs have been growing considerably in the past decades. This large body of knowledge attracts various stakeholders, e.g., for knowledge sharing, technology transfer, or potential collaborations. However, due to the large amount of complex knowledge created, traditional methods of manually curating catalogues are often out of time, imprecise, and cumber… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures accepted paper at BIR 2024 Workshop

  9. arXiv:2403.01646  [pdf, other

    cs.SI cs.HC cs.IR

    TweetInfo: An Interactive System to Mitigate Online Harm

    Authors: Gautam Kishore Shahi

    Abstract: The increase in active users on social networking sites (SNSs) has also observed an increase in harmful content on social media sites. Harmful content is described as an inappropriate activity to harm or deceive an individual or a group of users. Alongside existing methods to detect misinformation and hate speech, users still need to be well-informed about the harmfulness of the content on SNSs. T… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 3 pages

  10. arXiv:2401.16625  [pdf, other

    cs.IR cs.SI

    FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

    Authors: Gautam Kishore Shahi, Amit Kumar Jaiswal, Thomas Mandl

    Abstract: We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-check… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted in the IR4Good Track at the 46th European Conference on Information Retrieval (ECIR) 2024

  11. Regret, Delete, (Do Not) Repeat: An Analysis of Self-Cleaning Practices on Twitter After the Outbreak of the COVID-19 Pandemic

    Authors: Nicolás E. Díaz Ferreyra, Gautam Kishore Shahi, Catherine Tony, Stefan Stieglitz, Riccardo Scandariato

    Abstract: During the outbreak of the COVID-19 pandemic, many people shared their symptoms across Online Social Networks (OSNs) like Twitter, hoping for others' advice or moral support. Prior studies have shown that those who disclose health-related information across OSNs often tend to regret it and delete their publications afterwards. Hence, deleted posts containing sensitive data can be seen as manifesta… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted at CHI '23 Late Breaking Work (LBW)

  12. arXiv:2202.07986  [pdf

    cs.SI

    Towards a Better Understanding of Online Influence: Differences in Twitter CommunicationBetween Companies and Influencers

    Authors: Diana C. Hernandez-Bocanegra, Angela Borchert, Felix Brünker, Gautam Kishore Shahi, Björn Ross

    Abstract: In the last decade, Social Media platforms such as Twitter have gained importance in the various marketing strategies of companies. This work aims to examine the presence of influential content on a textual level, by investigating characteristics of tweets in the context of social impact theory, and its dimension immediacy. To this end, we analysed influential Twitter communication data during Bla… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: Australasian Conference on Information Systems, 2020, Wellington

    MSC Class: 91D30 ACM Class: H.0

  13. arXiv:2112.09301  [pdf

    cs.CL cs.AI cs.SI

    Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

    Authors: Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schaefer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, Amit Kumar Jaiswal

    Abstract: The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  14. arXiv:2109.12987  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

    Authors: Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Watheq Mansour, Bayan Hamdan, Zien Sheikh Ali, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, Thomas Mandl, Mucahid Kutlu, Yavuz Selim Kartal

    Abstract: We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Detecting Previously Fact-Checked Claims, Social Media Verification, Computational Journalism, COVID-19

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: CLEF-2021

  15. arXiv:2109.05492  [pdf

    cs.SI cs.CY

    Who shapes crisis communication on Twitter? An analysis of influential German-language accounts during the COVID-19 pandemic

    Authors: Gautam Kishore Shahi, Sünje Clausen, Stefan Stieglitz

    Abstract: Twitter is becoming an increasingly important platform for disseminating information during crisis situations, such as the COVID-19 pandemic. Effective crisis communication on Twitter can shape the public perception of the crisis, influence adherence to preventative measures, and thus affect public health. Influential accounts are particularly important as they reach large audiences quickly. This… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: 10 pages

  16. arXiv:2108.05927  [pdf

    cs.CL cs.CY

    Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages

    Authors: Thomas Mandla, Sandip Modha, Gautam Kishore Shahi, Amit Kumar Jaiswal, Durgesh Nandini, Daksh Patel, Prasenjit Majumder, Johannes Schäfer

    Abstract: With the growth of social media, the spread of hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech is spreading in these countries. This brings a need for multilingual Hate Speech detection algorithms. Much research in this area is dedicated to English at the moment. The HASOC track intends to provide a platform to develop and optimize Hate Spee… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 25 pages

  17. arXiv:2106.04726  [pdf, other

    cs.SI cs.CL cs.CV

    Tiplines to Combat Misinformation on Encrypted Platforms: A Case Study of the 2019 Indian Election on WhatsApp

    Authors: Ashkan Kazemi, Kiran Garimella, Gautam Kishore Shahi, Devin Gaffney, Scott A. Hale

    Abstract: There is currently no easy way to fact-check content on WhatsApp and other end-to-end encrypted platforms at scale. In this paper, we analyze the usefulness of a crowd-sourced "tipline" through which users can submit content ("tips") that they want fact-checked. We compare the tips sent to a WhatsApp tipline run during the 2019 Indian national elections with the messages circulating in large, publ… ▽ More

    Submitted 23 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  18. arXiv:2010.00502  [pdf, other

    cs.SI cs.CL cs.IR

    AMUSED: An Annotation Framework of Multi-modal Social Media Data

    Authors: Gautam Kishore Shahi

    Abstract: In this paper, we present a semi-automated framework called AMUSED for gathering multi-modal annotated data from the multiple social media platforms. The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process. From a given list of the articles from professional news media or blog, AMUSED d… ▽ More

    Submitted 10 August, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: 10 pages, 5 figures, 3 tables

  19. arXiv:2006.11343  [pdf

    cs.CY cs.SI

    FakeCovid -- A Multilingual Cross-domain Fact Check News Dataset for COVID-19

    Authors: Gautam Kishore Shahi, Durgesh Nandini

    Abstract: In this paper, we present a first multilingual cross-domain dataset of 5182 fact-checked news articles for COVID-19, collected from 04/01/2020 to 15/05/2020. We have collected the fact-checked articles from 92 different fact-checking websites after obtaining references from Poynter and Snopes. We have manually annotated articles into 11 different categories of the fact-checked news according to th… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: CySoc 2020 International Workshop on Cyber Social Threats, ICWSM 2020

  20. arXiv:2005.05710  [pdf, other

    cs.SI cs.CY

    An Exploratory Study of COVID-19 Misinformation on Twitter

    Authors: Gautam Kishore Shahi, Anne Dirkson, Tim A. Majchrzak

    Abstract: During the COVID-19 pandemic, social media has become a home ground for misinformation. To tackle this infodemic, scientific oversight, as well as a better understanding by practitioners in crisis management, is needed. We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19 in order to gain early insights. We have… ▽ More

    Submitted 24 August, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: 20 pages, nine figures, four tables. Submitted for peer review, revision 1