-
Anytime Discovery of a Diverse Set of Patterns with Monte Carlo Tree Search
Authors:
Guillaume Bosc,
Jean-François Boulicaut,
Chedy Raïssi,
Mehdi Kaytoue
Abstract:
The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting hypotheses from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing…
▽ More
The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting hypotheses from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling and genetic algorithms for discovering a pattern set that is non-redundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It outperforms other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS settings can be used for SD but also for many other pattern mining tasks.
△ Less
Submitted 6 December, 2017; v1 submitted 28 September, 2016;
originally announced September 2016.
-
Identifying Avatar Aliases in Starcraft 2
Authors:
Olivier Cavadenti,
Victor Codocedo,
Jean-François Boulicaut,
Mehdi Kaytoue
Abstract:
In electronic sports, cyberathletes conceal their online training using different avatars (virtual identities), allowing them not being recognized by the opponents they may face in future competitions. In this article, we propose a method to tackle this avatar aliases identification problem. Our method trains a classifier on behavioural data and processes the confusion matrix to output label pairs…
▽ More
In electronic sports, cyberathletes conceal their online training using different avatars (virtual identities), allowing them not being recognized by the opponents they may face in future competitions. In this article, we propose a method to tackle this avatar aliases identification problem. Our method trains a classifier on behavioural data and processes the confusion matrix to output label pairs which concentrate confusion. We experimented with Starcraft 2 and report our first results.
△ Less
Submitted 4 August, 2015;
originally announced August 2015.
-
Interpreting communities based on the evolution of a dynamic attributed network
Authors:
Günce Orman,
Vincent Labatut,
Marc Plantevit,
Jean-François Boulicaut
Abstract:
Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackl…
▽ More
Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either by the type of data they handle, or by the nature of the results they output. In this work, we see the interpretation of communities as a problem independent from the detection process, consisting in identifying the most characteristic features of communities. We give a formal definition of this problem and propose a method to solve it. To this aim, we first define a sequence-based representation of networks, combining temporal information, community structure, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We study the performance of our method on artificially generated dynamic attributed networks. We also empirically validate our framework on real-world systems: a DBLP network of scientific collaborations, and a LastFM network of social and musical interactions.
△ Less
Submitted 15 June, 2015;
originally announced June 2015.
-
A Method for Characterizing Communities in Dynamic Attributed Complex Networks
Authors:
Günce Keziban Orman,
Vincent Labatut,
Marc Plantevit,
Jean-François Boulicaut
Abstract:
Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most o…
▽ More
Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most of the existing works focus on defining methods for the detection of communities, only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either in the type of data they handle, or by the nature of the results they output. In this work, we propose a method to efficiently support such a characterization task. We first define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We also show how to detect unusual behavior in a community, and highlight outliers. Finally, as an illustration, we apply our method to a network of scientific collaborations.
△ Less
Submitted 25 June, 2014;
originally announced June 2014.
-
Une méthode pour caractériser les communautés des réseaux dynamiques à attributs
Authors:
Günce Keziban Orman,
Vincent Labatut,
Marc Plantevit,
Jean-François Boulicaut
Abstract:
Many complex systems are modeled through complex networks whose analysis reveals typical topological properties. Amongst those, the community structure is one of the most studied. Many methods are proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic networks. A community structure takes the form of a partition of the node set, which must then be chara…
▽ More
Many complex systems are modeled through complex networks whose analysis reveals typical topological properties. Amongst those, the community structure is one of the most studied. Many methods are proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic networks. A community structure takes the form of a partition of the node set, which must then be characterized relatively to the properties of the studied system. We propose a method to support such a characterization task. We define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then characterize communities using the most representative emerging sequential patterns of its nodes. This also allows detecting unusual behavior in a community. We describe an empirical study of a network of scientific collaborations.---De nombreux systèmes complexes sont étudiés via l'analyse de réseaux dits complexes ayant des propriétés topologiques typiques. Parmi cellesci, les structures de communautés sont particulièrement étudiées. De nombreuses méthodes permettent de les détecter, y compris dans des réseaux contenant des attributs nodaux, des liens orientés ou évoluant dans le temps. La détection prend la forme d'une partition de l'ensemble des noeuds, qu'il faut ensuite caractériser relativement au système modélisé. Nous travaillons sur l'assistance à cette tâche de caractérisation. Nous proposons une représentation des réseaux sous la forme de séquences de descripteurs de noeuds, qui combinent les informations temporelles, les mesures topologiques, et les valeurs des attributs nodaux. Les communautés sont caractérisées au moyen des motifs séquentiels émergents les plus représentatifs issus de leurs noeuds. Ceci permet notamment la détection de comportements inhabituels au sein d'une communauté. Nous décrivons une étude empirique sur un réseau de collaboration scientifique.
△ Less
Submitted 17 December, 2013;
originally announced December 2013.