Automatic Text Summarization (ATS) for Research Documents in Sorani Kurdish

Abdulrahman, Rondik Hadi; Hassani, Hossein

Computer Science > Computation and Language

arXiv:2504.14630 (cs)

[Submitted on 20 Apr 2025]

Title:Automatic Text Summarization (ATS) for Research Documents in Sorani Kurdish

Authors:Rondik Hadi Abdulrahman, Hossein Hassani

View PDF HTML (experimental)

Abstract:Extracting concise information from scientific documents aids learners, researchers, and practitioners. Automatic Text Summarization (ATS), a key Natural Language Processing (NLP) application, automates this process. While ATS methods exist for many languages, Kurdish remains underdeveloped due to limited resources. This study develops a dataset and language model based on 231 scientific papers in Sorani Kurdish, collected from four academic departments in two universities in the Kurdistan Region of Iraq (KRI), averaging 26 pages per document. Using Sentence Weighting and Term Frequency-Inverse Document Frequency (TF-IDF) algorithms, two experiments were conducted, differing in whether the conclusions were included. The average word count was 5,492.3 in the first experiment and 5,266.96 in the second. Results were evaluated manually and automatically using ROUGE-1, ROUGE-2, and ROUGE-L metrics, with the best accuracy reaching 19.58%. Six experts conducted manual evaluations using three criteria, with results varying by document. This research provides valuable resources for Kurdish NLP researchers to advance ATS and related fields.

Comments:	18 pages, 11 figures, 8 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2504.14630 [cs.CL]
	(or arXiv:2504.14630v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.14630

Submission history

From: Hossein Hassani [view email]
[v1] Sun, 20 Apr 2025 14:17:17 UTC (3,675 KB)

Computer Science > Computation and Language

Title:Automatic Text Summarization (ATS) for Research Documents in Sorani Kurdish

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Automatic Text Summarization (ATS) for Research Documents in Sorani Kurdish

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators