Showing 1–2 of 2 results for author: Almeida, R B
-
Improving Spam Detection Based on Structural Similarity
Authors:
Luiz H. Gomes,
Fernando D. O. Castro,
Rodrigo B. Almeida,
Luis M. A. Bettencourt,
Virgilio A. F. Almeida,
Jussara M. Almeida
Abstract:
We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of…
▽ More
We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of spam or legitimate mail, comming from an auxiliary detection algorithm, is then used to label these clusters probabilistically. This knowledge comes from an auxiliary algorithm. The measure of similarity between the sender and receiver sets of a new message to the center vector of clusters is then used to asses the possibility of that message being legitimate or spam. We show that the proposed algorithm is able to correct part of the false positives (legitimate messages classified as spam) using a testbed of one week smtp log.
△ Less
Submitted 5 April, 2005;
originally announced April 2005.
-
Local Community Identification through User Access Patterns
Authors:
Rodrigo B. Almeida,
Virgilio A. F. Almeida
Abstract:
Community identification algorithms have been used to enhance the quality of the services perceived by its users. Although algorithms for community have a widespread use in the Web, their application to portals or specific subsets of the Web has not been much studied. In this paper, we propose a technique for local community identification that takes into account user access behavior derived fro…
▽ More
Community identification algorithms have been used to enhance the quality of the services perceived by its users. Although algorithms for community have a widespread use in the Web, their application to portals or specific subsets of the Web has not been much studied. In this paper, we propose a technique for local community identification that takes into account user access behavior derived from access logs of servers in the Web. The technique takes a departure from the existing community algorithms since it changes the focus of in terest, moving from authors to users. Our approach does not use relations imposed by authors (e.g. hyperlinks in the case of Web pages). It uses information derived from user accesses to a service in order to infer relationships. The communities identified are of great interest to content providers since they can be used to improve quality of their services. We also propose an evaluation methodology for analyzing the results obtained by the algorithm. We present two case studies based on actual data from two services: an online bookstore and an online radio. The case of the online radio is particularly relevant, because it emphasizes the contribution of the proposed algorithm to find out communities in an environment (i.e., streaming media service) without links, that represent the relations imposed by authors (e.g. hyperlinks in the case of Web pages).
△ Less
Submitted 16 December, 2002;
originally announced December 2002.