Skip to main content

Showing 1–31 of 31 results for author: Koksal, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.21701  [pdf, ps, other

    cs.CL

    Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing

    Authors: Raoyuan Zhao, Abdullatif Köksal, Ali Modarressi, Michael A. Hedderich, Hinrich Schütze

    Abstract: The reliability of large language models (LLMs) is greatly compromised by their tendency to hallucinate, underscoring the need for precise identification of knowledge gaps within LLMs. Various methods for probing such gaps exist, ranging from calibration-based to prompting-based methods. To evaluate these probing methods, in this paper, we propose a new process based on using input variations and… ▽ More

    Submitted 30 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  2. arXiv:2505.12099  [pdf, ps, other

    cs.CV

    TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

    Authors: Aybora Koksal, A. Aydin Alatan

    Abstract: Remote-sensing applications often run on edge hardware that cannot host today's 7B-parameter multimodal language models. This paper introduces TinyRS, the first 2B-parameter multimodal small language model (MSLM) optimized for remote sensing tasks, and TinyRS-R1, its reasoning-augmented variant. Built upon Qwen2-VL-2B, TinyRS is trained through a four-stage pipeline: pre-training on million satell… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: Submitted to BMVC 2025. Code, models, and the captions for datasets will be released

  3. arXiv:2505.07984  [pdf, ps, other

    cs.CV

    MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing

    Authors: Aybora Koksal, A. Aydin Alatan

    Abstract: Remarkable capabilities in understanding and generating text-image content have been demonstrated by recent advancements in multimodal large language models (MLLMs). However, their effectiveness in specialized domains-particularly those requiring resource-efficient and domain-specific adaptations-has remained limited. In this work, a lightweight multimodal language model termed MilChat is introduc… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Submitted to JSTARS on April 2, 2025. Code and dataset will be available upon acceptance

  4. arXiv:2502.11020  [pdf, ps, other

    cs.CL cs.AI

    TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

    Authors: Jafar Isbarov, Arofat Akhundjanova, Mammad Hajili, Kavsar Huseynova, Dmitry Gaynullin, Anar Rzayev, Osman Tursun, Aizirek Turdubaeva, Ilshat Saetov, Rinat Kharisov, Saule Belginova, Ariana Kenbayeva, Amina Alisheva, Abdullatif Köksal, Samir Rustamov, Duygu Ataman

    Abstract: Being able to thoroughly assess massive multi-task language understanding (MMLU) capabilities is essential for advancing the applicability of multilingual language models. However, preparing such benchmarks in high quality native language is often costly and therefore limits the representativeness of evaluation datasets. While recent efforts focused on building more inclusive MMLU benchmarks, thes… ▽ More

    Submitted 13 June, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: Accepted to ACL 2025, Main Conference

  5. arXiv:2412.11621  [pdf, other

    cs.CV cs.MM

    VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting

    Authors: Muhammet Furkan Ilaslan, Ali Koksal, Kevin Qinhong Lin, Burak Satar, Mike Zheng Shou, Qianli Xu

    Abstract: Large Language Model (LLM)-based agents have shown promise in procedural tasks, but the potential of multimodal instructions augmented by texts and videos to assist users remains under-explored. To address this gap, we propose the Visually Grounded Text-Video Prompting (VG-TVP) method which is a novel LLM-empowered Multimodal Procedural Planning (MPP) framework. It generates cohesive text and vide… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: Accepted for The 39th Annual AAAI Conference on Artificial Intelligence 2025 in Main Track, 19 pages, 24 figures

  6. arXiv:2411.19240  [pdf, other

    cs.CL

    How far can bias go? -- Tracing bias from pretraining data to alignment

    Authors: Marion Thaler, Abdullatif Köksal, Alina Leidinger, Anna Korhonen, Hinrich Schütze

    Abstract: As LLMs are increasingly integrated into user-facing applications, addressing biases that perpetuate societal inequalities is crucial. While much work has gone into measuring or mitigating biases in these models, fewer studies have investigated their origins. Therefore, this study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs, focusing… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

  7. arXiv:2410.12656  [pdf, ps, other

    cs.CL cs.AI

    Evaluating Morphological Compositional Generalization in Large Language Models

    Authors: Mete Ismayilzada, Defne Circi, Jonne Sälevä, Hale Sirin, Abdullatif Köksal, Bhuwan Dhingra, Antoine Bosselut, Duygu Ataman, Lonneke van der Plas

    Abstract: Large language models (LLMs) have demonstrated significant progress in various natural language generation and understanding tasks. However, their linguistic generalization capabilities remain questionable, raising doubts about whether these models learn language similarly to humans. While humans exhibit compositional generalization and linguistic creativity in language use, the extent to which LL… ▽ More

    Submitted 5 June, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: Accepted to NAACL 2025

  8. arXiv:2409.12958  [pdf, other

    cs.CL cs.AI cs.LG

    MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions

    Authors: Abdullatif Köksal, Marion Thaler, Ayyoob Imani, Ahmet Üstün, Anna Korhonen, Hinrich Schütze

    Abstract: Instruction tuning enhances large language models (LLMs) by aligning them with human preferences across diverse tasks. Traditional approaches to create instruction tuning datasets face serious challenges for low-resource languages due to their dependence on data annotation. This work introduces a novel method, Multilingual Reverse Instructions (MURI), which generates high-quality instruction tunin… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  9. arXiv:2409.02098  [pdf, other

    cs.CL cs.AI cs.LG

    CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation

    Authors: Ingo Ziegler, Abdullatif Köksal, Desmond Elliott, Hinrich Schütze

    Abstract: Building high-quality datasets for specialized tasks is a time-consuming and resource-intensive process that often requires specialized domain knowledge. We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets, given a small number of user-written few-shots that demonstrate the task to be performed. Given the few-shot examples, we use large-… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  10. arXiv:2408.17437  [pdf, other

    cs.CL

    SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists

    Authors: Raoyuan Zhao, Abdullatif Köksal, Yihong Liu, Leonie Weissweiler, Anna Korhonen, Hinrich Schütze

    Abstract: Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral te… ▽ More

    Submitted 7 November, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: EMNLP 2024 - Findings

  11. arXiv:2407.12402  [pdf, other

    cs.CL

    TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

    Authors: Arda Yüksel, Abdullatif Köksal, Lütfi Kerem Şenel, Anna Korhonen, Hinrich Schütze

    Abstract: Multiple choice question answering tasks evaluate the reasoning, comprehension, and mathematical abilities of Large Language Models (LLMs). While existing benchmarks employ automatic translation for multilingual evaluation, this approach is error-prone and potentially introduces culturally biased questions, especially in social sciences. We introduce the first multitask, multiple-choice Turkish QA… ▽ More

    Submitted 3 October, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: EMNLP 2024 - Findings

  12. arXiv:2407.06699  [pdf, other

    cs.CL

    Consistent Document-Level Relation Extraction via Counterfactuals

    Authors: Ali Modarressi, Abdullatif Köksal, Hinrich Schütze

    Abstract: Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using ent… ▽ More

    Submitted 15 October, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2404.11672  [pdf, other

    cs.CL

    MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

    Authors: Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, Mohsen Fayyaz, Hinrich Schütze

    Abstract: While current large language models (LLMs) perform well on many knowledge-related tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with memorizing rare events and with updating their memory as facts change over time. In addition, the uninterpretable nature of parametric memory makes it challenging to prevent hallucination. Model ed… ▽ More

    Submitted 17 April, 2025; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  14. arXiv:2404.09692  [pdf, other

    cs.CV

    XoFTR: Cross-modal Feature Matching Transformer

    Authors: Önder Tuzcuoğlu, Aybora Köksal, Buğra Sofu, Sinan Kalkan, A. Aydın Alatan

    Abstract: We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall sh… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: CVPR Image Matching Workshop, 2024. 12 pages, 7 figures, 5 tables. Codes and dataset are available at https://github.com/OnderT/XoFTR

  15. arXiv:2403.06965  [pdf, other

    cs.CL

    Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena

    Authors: Leonie Weissweiler, Abdullatif Köksal, Hinrich Schütze

    Abstract: Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, ``She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact that ``sneeze'' in this context causes movement… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  16. arXiv:2311.07424  [pdf, other

    cs.CL cs.AI

    Hallucination Augmented Recitations for Language Models

    Authors: Abdullatif Köksal, Renat Aksitov, Chung-Ching Chang

    Abstract: Attribution is a key concept in large language models (LLMs) as it enables control over information sources and enhances the factuality of LLMs. While existing approaches utilize open book question answering to improve attribution, factual datasets may reward language models to recall facts that they already know from their pretraining data, not attribution. In contrast, counterfactual open book Q… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  17. arXiv:2309.07409  [pdf, other

    cs.CV

    Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos

    Authors: Fen Fang, Yun Liu, Ali Koksal, Qianli Xu, Joo-Hwee Lim

    Abstract: A key challenge with procedure planning in instructional videos lies in how to handle a large decision space consisting of a multitude of action types that belong to various tasks. To understand real-world video content, an AI agent must proficiently discern these action types (e.g., pour milk, pour water, open lid, close lid, etc.) based on brief visual observation. Moreover, it must adeptly capt… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 7 pages (main text excluding references), 3 figures, 7 tables

  18. arXiv:2305.13302  [pdf, other

    cs.CL

    Language-Agnostic Bias Detection in Language Models with Bias Probing

    Authors: Abdullatif Köksal, Omer Faruk Yalcin, Ahmet Akbiyik, M. Tahir Kilavuz, Anna Korhonen, Hinrich Schütze

    Abstract: Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique called LABDet, for evaluating social bias in PLMs with a robust and language-agnostic method. For nation… ▽ More

    Submitted 20 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  19. arXiv:2304.08460  [pdf, other

    cs.CL cs.AI cs.LG

    LongForm: Effective Instruction Tuning with Reverse Instructions

    Authors: Abdullatif Köksal, Timo Schick, Anna Korhonen, Hinrich Schütze

    Abstract: Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We g… ▽ More

    Submitted 3 October, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: EMNLP 2024 Findings. This version extends the training with recent LLMs, evaluation with new metrics, and NLU tasks

  20. arXiv:2304.01890  [pdf, other

    cs.CL cs.AI cs.LG

    Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

    Authors: Antonis Maronikolakis, Abdullatif Köksal, Hinrich Schütze

    Abstract: We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid sho… ▽ More

    Submitted 17 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  21. arXiv:2211.08358  [pdf, other

    cs.CL

    MEAL: Stable and Active Learning for Few-Shot Prompting

    Authors: Abdullatif Köksal, Timo Schick, Hinrich Schütze

    Abstract: Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data selection) and across different finetuning runs (run variability). This is problematic not only because it impedes the fair comparison of different approaches, but… ▽ More

    Submitted 20 November, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: EMNLP 2023 Findings

  22. arXiv:2210.13181  [pdf, other

    cs.CL

    The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative

    Authors: Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze

    Abstract: Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntact… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  23. arXiv:2210.06207  [pdf, other

    cs.CL

    SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

    Authors: Abdullatif Köksal, Silvia Severini, Hinrich Schütze

    Abstract: Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance… ▽ More

    Submitted 27 March, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

  24. arXiv:2202.13080  [pdf, other

    cs.CV

    Improved Hard Example Mining Approach for Single Shot Object Detectors

    Authors: Aybora Koksal, Onder Tuzcuoglu, Kutalmis Gokalp Ince, Yoldas Ataseven, A. Aydin Alatan

    Abstract: Hard example mining methods generally improve the performance of the object detectors, which suffer from imbalanced training sets. In this work, two existing hard example mining approaches (LRM and focal loss, FL) are adapted and combined in a state-of-the-art real-time object detector, YOLOv5. The effectiveness of the proposed approach for improving the performance on hard examples is extensively… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: ICIP 2022. 5 pages, 2 figures, 7 tables. The codes are available at https://github.com/aybora/yolov5Loss

  25. arXiv:2109.04712  [pdf, other

    cs.CL

    Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

    Authors: Yi Huang, Buse Giledereli, Abdullatif Köksal, Arzucan Özgür, Elif Ozkirimli

    Abstract: Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for addressing the class imbalance problem, however, they are not effective when there is label dependency besides class imbalance because they result in oversampling o… ▽ More

    Submitted 15 October, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  26. Semi-Automatic Annotation For Visual Object Tracking

    Authors: Kutalmis Gokalp Ince, Aybora Koksal, Arda Fazla, A. Aydin Alatan

    Abstract: We propose a semi-automatic bounding box annotation method for visual object tracking by utilizing temporal information with a tracking-by-detection approach. For detection, we use an off-the-shelf object detector which is trained iteratively with the annotations generated by the proposed method, and we perform object detection on each frame independently. We employ Multiple Hypothesis Tracking (M… ▽ More

    Submitted 19 August, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Accepted to The 2nd Anti-UAV Workshop & Challenge - ICCV Workshops, 2021. Resulting uav_detection_2 annotations and our codes are publicly available at https://github.com/aybora/Semi-Automatic-Video-Annotation-OGAM

  27. arXiv:2010.09381  [pdf, other

    cs.CL

    The RELX Dataset and Matching the Multilingual Blanks for Cross-Lingual Relation Classification

    Authors: Abdullatif Köksal, Arzucan Özgür

    Abstract: Relation classification is one of the key topics in information extraction, which can be used to construct knowledge bases or to provide useful information for question answering. Current approaches for relation classification are mainly focused on the English language and require lots of training data with human annotations. Creating and annotating a large amount of training data for low-resource… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020

  28. arXiv:2009.02526  [pdf, other

    cs.IR cs.LG q-bio.MN

    Vapur: A Search Engine to Find Related Protein-Compound Pairs in COVID-19 Literature

    Authors: Abdullatif Köksal, Hilal Dönmez, Rıza Özçelik, Elif Ozkirimli, Arzucan Özgür

    Abstract: Coronavirus Disease of 2019 (COVID-19) created dire consequences globally and triggered an intense scientific effort from different domains. The resulting publications created a huge text collection in which finding the studies related to a biomolecule of interest is challenging for general purpose search engines because the publications are rich in domain specific terminology. Here, we present Va… ▽ More

    Submitted 13 October, 2020; v1 submitted 5 September, 2020; originally announced September 2020.

    Comments: EMNLP 2020 - COVID-19 Workshop

  29. Effect of Annotation Errors on Drone Detection with YOLOv3

    Authors: Aybora Koksal, Kutalmis Gokalp Ince, A. Aydin Alatan

    Abstract: Following the recent advances in deep networks, object detection and tracking algorithms with deep learning backbones have been improved significantly; however, this rapid development resulted in the necessity of large amounts of annotated labels. Even if the details of such semi-automatic annotation processes for most of these datasets are not known precisely, especially for the video annotations… ▽ More

    Submitted 12 January, 2021; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: Best Paper Award at The 1st Anti-UAV Workshop & Challenge - CVPR Workshops, 2020

  30. arXiv:2002.10416  [pdf, other

    cs.CL

    Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool

    Authors: Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Gözde Berk, Seyyit Talha Bedir, Abdullatif Köksal, Balkız Öztürk Başaran, Tunga Güngör, Arzucan Özgür

    Abstract: In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regard… ▽ More

    Submitted 16 September, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Language Resource and Evaluation

  31. arXiv:1505.05193  [pdf, other

    cs.CE cs.LO q-bio.MN

    Synthesising Executable Gene Regulatory Networks from Single-cell Gene Expression Data

    Authors: Jasmin Fisher, Ali Sinan Köksal, Nir Piterman, Steven Woodhouse

    Abstract: Recent experimental advances in biology allow researchers to obtain gene expression profiles at single-cell resolution over hundreds, or even thousands of cells at once. These single-cell measurements provide snapshots of the states of the cells that make up a tissue, instead of the population-level averages provided by conventional high-throughput experiments. This new data therefore provides an… ▽ More

    Submitted 17 January, 2018; v1 submitted 19 May, 2015; originally announced May 2015.

    Comments: Final published version to appear in Computer Aided Verification (CAV), Springer, July 2015