-
COPER: a Query-adaptable Semantics-based Search Engine for Persian COVID-19 Articles
Authors:
Reza Khanmohammadi,
Mitra Sadat Mirshafiee,
Mehdi Allahyari
Abstract:
With the surge of pretrained language models, a new pathway has been opened to incorporate Persian text contextual information. Meanwhile, as many other countries, including Iran, are fighting against COVID-19, a plethora of COVID-19 related articles has been published in Iranian Healthcare magazines to better inform the public of the situation. However, finding answers in this sheer volume of inf…
▽ More
With the surge of pretrained language models, a new pathway has been opened to incorporate Persian text contextual information. Meanwhile, as many other countries, including Iran, are fighting against COVID-19, a plethora of COVID-19 related articles has been published in Iranian Healthcare magazines to better inform the public of the situation. However, finding answers in this sheer volume of information is an extremely difficult task. In this paper, we collected a large dataset of these articles, leveraged different BERT variations as well as other keyword models such as BM25 and TF-IDF, and created a search engine to sift through these documents and rank them, given a user's query. Our final search engine consists of a ranker and a re-ranker, which adapts itself to the query. We fine-tune our models using Semantic Textual Similarity and evaluate them with standard task metrics. Our final method outperforms the rest by a considerable margin.
△ Less
Submitted 14 July, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
A Broad Evaluation of the Tor English Content Ecosystem
Authors:
Mahdieh Zabihimayvan,
Reza Sadeghi,
Derek Doran,
Mehdi Allahyari
Abstract:
Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular doma…
▽ More
Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.
△ Less
Submitted 18 February, 2019;
originally announced February 2019.
-
Graph-based Ontology Summarization: A Survey
Authors:
Seyedamin Pouriyeh,
Mehdi Allahyari,
Qingxia Liu,
Gong Cheng,
Hamid Reza Arabnia,
Yuzhong Qu,
Krys Kochut
Abstract:
Ontologies have been widely used in numerous and varied applications, e.g., to support data modeling, information integration, and knowledge management. With the increasing size of ontologies, ontology understanding, which is playing an important role in different tasks, is becoming more difficult. Consequently, ontology summarization, as a way to distill key information from an ontology and gener…
▽ More
Ontologies have been widely used in numerous and varied applications, e.g., to support data modeling, information integration, and knowledge management. With the increasing size of ontologies, ontology understanding, which is playing an important role in different tasks, is becoming more difficult. Consequently, ontology summarization, as a way to distill key information from an ontology and generate an abridged version to facilitate a better understanding, is getting growing attention. In this survey paper, we review existing ontology summarization techniques and focus mainly on graph-based methods, which represent an ontology as a graph and apply centrality-based and other measures to identify the most important elements of an ontology as its summary. After analyzing their strengths and weaknesses, we highlight a few potential directions for future research.
△ Less
Submitted 15 May, 2018;
originally announced May 2018.
-
A Comprehensive Survey of Ontology Summarization: Measures and Methods
Authors:
Seyedamin Pouriyeh,
Mehdi Allahyari,
Krys Kochut,
Hamid Reza Arabnia
Abstract:
The Semantic Web is becoming a large scale framework that enables data to be published, shared, and reused in the form of ontologies. The ontology which is considered as basic building block of semantic web consists of two layers including data and schema layer. With the current exponential development of ontologies in both data size and complexity of schemas, ontology understanding which is playi…
▽ More
The Semantic Web is becoming a large scale framework that enables data to be published, shared, and reused in the form of ontologies. The ontology which is considered as basic building block of semantic web consists of two layers including data and schema layer. With the current exponential development of ontologies in both data size and complexity of schemas, ontology understanding which is playing an important role in different tasks such as ontology engineering, ontology learning, etc., is becoming more difficult. Ontology summarization as a way to distill knowledge from an ontology and generate an abridge version to facilitate a better understanding is getting more attention recently. There are various approaches available for ontology summarization which are focusing on different measures in order to produce a proper summary for a given ontology. In this paper, we mainly focus on the common metrics which are using for ontology summarization and meet the state-of-the-art in ontology summarization.
△ Less
Submitted 5 January, 2018;
originally announced January 2018.
-
A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques
Authors:
Mehdi Allahyari,
Seyedamin Pouriyeh,
Mehdi Assefi,
Saied Safaei,
Elizabeth D. Trippe,
Juan B. Gutierrez,
Krys Kochut
Abstract:
The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in r…
▽ More
The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.
△ Less
Submitted 28 July, 2017; v1 submitted 10 July, 2017;
originally announced July 2017.
-
Text Summarization Techniques: A Brief Survey
Authors:
Mehdi Allahyari,
Seyedamin Pouriyeh,
Mehdi Assefi,
Saeid Safaei,
Elizabeth D. Trippe,
Juan B. Gutierrez,
Krys Kochut
Abstract:
In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shor…
▽ More
In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.
△ Less
Submitted 28 July, 2017; v1 submitted 7 July, 2017;
originally announced July 2017.