-
A Clustering-Based Combinatorial Approach to Unsupervised Matching of Product Titles
Authors:
Leonidas Akritidis,
Athanasios Fevgas,
Panayiotis Bozanis,
Christos Makris
Abstract:
The constant growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information increase quickly. These factors make it difficult for the users to identify and compare the features of their desired products. Recent studies proved that the standa…
▽ More
The constant growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information increase quickly. These factors make it difficult for the users to identify and compare the features of their desired products. Recent studies proved that the standard similarity metrics cannot effectively identify identical products, since similar titles often refer to different products and vice-versa. Other studies employed external data sources (search engines) to enrich the titles; these solutions are rather impractical mainly because the external data fetching is slow. In this paper we introduce UPM, an unsupervised algorithm for matching products by their titles. UPM is independent of any external sources, since it analyzes the titles and extracts combinations of words out of them. These combinations are evaluated according to several criteria, and the most appropriate of them constitutes the cluster where a product is classified into. UPM is also parameter-free, it avoids product pairwise comparisons, and includes a post-processing verification stage which corrects the erroneous matches. The experimental evaluation of UPM demonstrated its superiority against the state-of-the-art approaches in terms of both efficiency and effectiveness.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
Identifying Influential Bloggers: Time Does Matter
Authors:
Leonidas Akritidis,
Dimitrios Katsaros,
Panayiotis Bozanis
Abstract:
Blogs have recently become one of the most favored services on the Web. Many users maintain a blog and write posts to express their opinion, experience and knowledge about a product, an event and every subject of general or specific interest. More users visit blogs to read these posts and comment them. This "participatory journalism" of blogs has such an impact upon the masses that Keller and Be…
▽ More
Blogs have recently become one of the most favored services on the Web. Many users maintain a blog and write posts to express their opinion, experience and knowledge about a product, an event and every subject of general or specific interest. More users visit blogs to read these posts and comment them. This "participatory journalism" of blogs has such an impact upon the masses that Keller and Berry argued that through blogging "one American in tens tells the other nine how to vote, where to eat and what to buy" \cite{keller1}. Therefore, a significant issue is how to identify such influential bloggers. This problem is very new and the relevant literature lacks sophisticated solutions, but most importantly these solutions have not taken into account temporal aspects for identifying influential bloggers, even though the time is the most critical aspect of the Blogosphere. This article investigates the issue of identifying influential bloggers by proposing two easily computed blogger ranking methods, which incorporate temporal aspects of the blogging activity. Each method is based on a specific metric to score the blogger's posts. The first metric, termed MEIBI, takes into consideration the number of the blog post's inlinks and its comments, along with the publication date of the post. The second metric, MEIBIX, is used to score a blog post according to the number and age of the blog post's inlinks and its comments. These methods are evaluated against the state-of-the-art influential blogger identification method utilizing data collected from a real-world community blog site. The obtained results attest that the new methods are able to better identify significant temporal patterns in the blogging behaviour.
△ Less
Submitted 14 May, 2009;
originally announced May 2009.
-
Spam: It's Not Just for Inboxes and Search Engines! Making Hirsch h-index Robust to Scientospam
Authors:
Dimitrios Katsaros,
Leonidas Akritidis,
Panayiotis Bozanis
Abstract:
What is the 'level of excellence' of a scientist and the real impact of his/her work upon the scientific thinking and practising? How can we design a fair, an unbiased metric -- and most importantly -- a metric robust to manipulation?
What is the 'level of excellence' of a scientist and the real impact of his/her work upon the scientific thinking and practising? How can we design a fair, an unbiased metric -- and most importantly -- a metric robust to manipulation?
△ Less
Submitted 2 January, 2008;
originally announced January 2008.