-
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models
Authors:
Mehrnoush Shamsfard,
Zahra Saaberi,
Mostafa Karimi manesh,
Seyed Mohammad Hossein Hashemi,
Zahra Vatankhah,
Motahareh Ramezani,
Niki Pourazin,
Tara Zare,
Maryam Azimi,
Sarina Chitsaz,
Sama Khoraminejad,
Morteza Mahdavi Mortazavi,
Mohammad Mahdi Chizari,
Sahar Maleki,
Seyed Soroush Majd,
Mostafa Masumi,
Sayed Ali Musavi Khoeini,
Amir Mohseni,
Sogol Alipour
Abstract:
Research on evaluating and analyzing large language models (LLMs) has been extensive for resource-rich languages such as English, yet their performance in languages such as Persian has received considerably less attention. This paper introduces FarsEval-PKBETS benchmark, a subset of FarsEval project for evaluating large language models in Persian. This benchmark consists of 4000 questions and answ…
▽ More
Research on evaluating and analyzing large language models (LLMs) has been extensive for resource-rich languages such as English, yet their performance in languages such as Persian has received considerably less attention. This paper introduces FarsEval-PKBETS benchmark, a subset of FarsEval project for evaluating large language models in Persian. This benchmark consists of 4000 questions and answers in various formats, including multiple choice, short answer and descriptive responses. It covers a wide range of domains and tasks,including medicine, law, religion, Persian language, encyclopedic knowledge, human preferences, social knowledge, ethics and bias, text generation, and respecting others' rights. This bechmark incorporates linguistics, cultural, and local considerations relevant to the Persian language and Iran. To ensure the questions are challenging for current LLMs, three models -- Llama3-70B, PersianMind, and Dorna -- were evaluated using this benchmark. Their average accuracy was below 50%, meaning they provided fully correct answers to fewer than half of the questions. These results indicate that current language models are still far from being able to solve this benchmark
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
FaBERT: Pre-training BERT on Persian Blogs
Authors:
Mostafa Masumi,
Seyed Soroush Majd,
Mehrnoush Shamsfard,
Hamid Beigy
Abstract:
We introduce FaBERT, a Persian BERT-base model pre-trained on the HmBlogs corpus, encompassing both informal and formal Persian texts. FaBERT is designed to excel in traditional Natural Language Understanding (NLU) tasks, addressing the intricacies of diverse sentence structures and linguistic styles prevalent in the Persian language. In our comprehensive evaluation of FaBERT on 12 datasets in var…
▽ More
We introduce FaBERT, a Persian BERT-base model pre-trained on the HmBlogs corpus, encompassing both informal and formal Persian texts. FaBERT is designed to excel in traditional Natural Language Understanding (NLU) tasks, addressing the intricacies of diverse sentence structures and linguistic styles prevalent in the Persian language. In our comprehensive evaluation of FaBERT on 12 datasets in various downstream tasks, encompassing Sentiment Analysis (SA), Named Entity Recognition (NER), Natural Language Inference (NLI), Question Answering (QA), and Question Paraphrasing (QP), it consistently demonstrated improved performance, all achieved within a compact model size. The findings highlight the importance of utilizing diverse and cleaned corpora, such as HmBlogs, to enhance the performance of language models like BERT in Persian Natural Language Processing (NLP) applications. FaBERT is openly accessible at https://huggingface.co/sbunlp/fabert
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
A Deep Convolutional Neural Networks Based Multi-Task Ensemble Model for Aspect and Polarity Classification in Persian Reviews
Authors:
Milad Vazan,
Fatemeh Sadat Masoumi,
Sepideh Saeedi Majd
Abstract:
Aspect-based sentiment analysis is of great importance and application because of its ability to identify all aspects discussed in the text. However, aspect-based sentiment analysis will be most effective when, in addition to identifying all the aspects discussed in the text, it can also identify their polarity. Most previous methods use the pipeline approach, that is, they first identify the aspe…
▽ More
Aspect-based sentiment analysis is of great importance and application because of its ability to identify all aspects discussed in the text. However, aspect-based sentiment analysis will be most effective when, in addition to identifying all the aspects discussed in the text, it can also identify their polarity. Most previous methods use the pipeline approach, that is, they first identify the aspects and then identify the polarities. Such methods are unsuitable for practical applications since they can lead to model errors. Therefore, in this study, we propose a multi-task learning model based on Convolutional Neural Networks (CNNs), which can simultaneously detect aspect category and detect aspect category polarity. creating a model alone may not provide the best predictions and lead to errors such as bias and high variance. To reduce these errors and improve the efficiency of model predictions, combining several models known as ensemble learning may provide better results. Therefore, the main purpose of this article is to create a model based on an ensemble of multi-task deep convolutional neural networks to enhance sentiment analysis in Persian reviews. We evaluated the proposed method using a Persian language dataset in the movie domain. Jacquard index and Hamming loss measures were used to evaluate the performance of the developed models. The results indicate that this new approach increases the efficiency of the sentiment analysis model in the Persian language.
△ Less
Submitted 29 August, 2023; v1 submitted 17 January, 2022;
originally announced January 2022.