-
Hybrid Movie Recommender System based on Resource Allocation
Authors:
Mostafa Khalaji,
Chitra Dadkhah,
Joobin Gharibshah
Abstract:
Recommender Systems are inevitable to personalize user's experiences on the Internet. They are using different approaches to recommend the Top-K items to users according to their preferences. Nowadays recommender systems have become one of the most important parts of largescale data mining techniques. In this paper, we propose a Hybrid Movie Recommender System (HMRS) based on Resource Allocation t…
▽ More
Recommender Systems are inevitable to personalize user's experiences on the Internet. They are using different approaches to recommend the Top-K items to users according to their preferences. Nowadays recommender systems have become one of the most important parts of largescale data mining techniques. In this paper, we propose a Hybrid Movie Recommender System (HMRS) based on Resource Allocation to improve the accuracy of recommendation and solve the cold start problem for a new movie. HMRS-RA uses a self-organizing mapping neural network to clustering the users into N clusters. The users' preferences are different according to their age and gender, therefore HMRS-RA is a combination of a Content-Based Method for solving the cold start problem for a new movie and a Collaborative Filtering model besides the demographic information of users. The experimental results based on the MovieLens dataset show that the HMRS-RA increases the accuracy of recommendation compared to the state-of-art and similar works.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
Authors:
Joobin Gharibshah,
Evangelos E. Papalexakis,
Michalis Faloutsos
Abstract:
How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promisi…
▽ More
How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest.
△ Less
Submitted 30 March, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
TrollSpot: Detecting misbehavior in commenting platforms
Authors:
Tai Ching Li,
Joobin Gharibshah,
Evangelos E. Papalexakis,
Michalis Faloutsos
Abstract:
Commenting platforms, such as Disqus, have emerged as a major online communication platform with millions of users and posts. Their popularity has also attracted parasitic and malicious behav- iors, such as trolling and spamming. There has been relatively little research on modeling and safeguarding these platforms. As our key contribution, we develop a systematic approach to detect malicious user…
▽ More
Commenting platforms, such as Disqus, have emerged as a major online communication platform with millions of users and posts. Their popularity has also attracted parasitic and malicious behav- iors, such as trolling and spamming. There has been relatively little research on modeling and safeguarding these platforms. As our key contribution, we develop a systematic approach to detect malicious users on commenting platforms focusing on having: (a) interpretable, and (b) fine-grained classification of malice. Our work has two key novelties: (a) we propose two classifications methods, with one following a two stage approach, which first maps observ- able features to behaviors and then maps these behaviors to user roles, and (b) we use a comprehensive set of 73 features that span four dimensions of information. We use 7 million comments during a 9 month period, and we show that our classification methods can distinguish between benign, and malicious roles (spammers, trollers, and fanatics) with a 0.904 AUC. Our work is a solid step to- wards ensuring that commenting platforms are a safe and pleasant medium for the exchange of ideas.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.
-
Mining actionable information from security forums: the case of malicious IP addresses
Authors:
Joobin Gharibshah,
Tai Ching Li,
Andre Castro,
Konstantinos Pelechrinis,
Evangelos E. Papalexakis,
Michalis Faloutsos
Abstract:
The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus…
▽ More
The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of identifying malicious IP addresses, among the IP addresses which are reported in the forums. We develop a method to automate the identification of malicious IP addresses with the design goal of being independent of external sources. A key novelty is that we use a matrix decomposition method to extract latent features of the behavioral information of the users, which we combine with textual information from the related posts. A key design feature of our technique is that it can be readily applied to different language forums, since it does not require a sophisticated Natural Language Processing approach. In particular, our solution only needs a small number of keywords in the new language plus the users behavior captured by specific features. We also develop a tool to automate the data collection from security forums. Using our tool, we collect approximately 600K posts from 3 different forums. Our method exhibits high classification accuracy, while the precision of identifying malicious IP in post is greater than 88% in all three forums. We argue that our method can provide significantly more information: we find up to 3 times more potentially malicious IP address compared to the reference blacklist VirusTotal. As the cyber-wars are becoming more intense, having early accesses to useful information becomes more imperative to remove the hackers first-move advantage, and our work is a solid step towards this direction.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
RIPEx: Extracting malicious IP addresses from security forums using cross-forum learning
Authors:
Joobin Gharibshah,
Evangelos E. Papalexakis,
Michalis Faloutsos
Abstract:
Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx…
▽ More
Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx, a systematic approach to identify and label IP addresses in security forums by utilizing a cross-forum learning method. In more detail, the challenge is twofold: (a) identifying IP addresses from other numerical entities, such as software version numbers, and (b) classifying the IP address as benign or malicious. We propose an integrated solution that tackles both these problems. A novelty of our approach is that it does not require training data for each new forum. Our approach does knowledge transfer across forums: we use a classifier from our source forums to identify seed information for training a classifier on the target forum. We evaluate our method using data collected from five security forums with a total of 31K users and 542K posts. First, RIPEx can distinguish IP address from other numeric expressions with 95% precision and above 93% recall on average. Second, RIPEx identifies malicious IP addresses with an average precision of 88% and over 78% recall, using our cross-forum learning. Our work is a first step towards harnessing the wealth of useful information that can be found in security forums.
△ Less
Submitted 12 April, 2018;
originally announced April 2018.