Skip to main content

Showing 1–4 of 4 results for author: Ozcelik, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.13829  [pdf, other

    cs.CL cs.SI

    ARC-NLP at Multimodal Hate Speech Event Detection 2023: Multimodal Methods Boosted by Ensemble Learning, Syntactical and Entity Features

    Authors: Umitcan Sahin, Izzet Emre Kucukkaya, Oguzhan Ozcelik, Cagri Toraman

    Abstract: Text-embedded images can serve as a means of spreading hate speech, propaganda, and extremist beliefs. Throughout the Russia-Ukraine war, both opposing factions heavily relied on text-embedded images as a vehicle for spreading propaganda and hate speech. Ensuring the effective detection of hate speech and propaganda is of utmost importance to mitigate the negative effect of hate speech disseminati… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Submitted to CASE at RANLP 2023

  2. arXiv:2302.13403  [pdf, other

    cs.SI cs.CL cs.IR

    Tweets Under the Rubble: Detection of Messages Calling for Help in Earthquake Disaster

    Authors: Cagri Toraman, Izzet Emre Kucukkaya, Oguzhan Ozcelik, Umitcan Sahin

    Abstract: The importance of social media is again exposed in the recent tragedy of the 2023 Turkey and Syria earthquake. Many victims who were trapped under the rubble called for help by posting messages in Twitter. We present an interactive tool to provide situational awareness for missing and trapped people, and disaster relief for rescue and donation efforts. The system (i) collects tweets, (ii) classifi… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  3. arXiv:2210.05401  [pdf, other

    cs.SI cs.CL cs.IR

    MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection

    Authors: Cagri Toraman, Oguzhan Ozcelik, Furkan Şahinuç, Fazli Can

    Abstract: The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Published at LREC-COLING 2024

  4. Impact of Tokenization on Language Models: An Analysis for Turkish

    Authors: Cagri Toraman, Eyup Halit Yilmaz, Furkan Şahinuç, Oguzhan Ozcelik

    Abstract: Tokenization is an important text preprocessing step to prepare input tokens for deep language models. WordPiece and BPE are de facto methods employed by important models, such as BERT and GPT. However, the impact of tokenization can be different for morphologically rich languages, such as Turkic languages, where many words can be generated by adding prefixes and suffixes. We compare five tokenize… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: submitted to ACM TALLIP

    Journal ref: ACM Transactions on Asian and Low-Resource Language Information Processing (2023) Volume 22 Issue 4 pp 1-21