Search | arXiv e-print repository

Anytime Discovery of a Diverse Set of Patterns with Monte Carlo Tree Search

Authors: Guillaume Bosc, Jean-François Boulicaut, Chedy Raïssi, Mehdi Kaytoue

Abstract: The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting hypotheses from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing… ▽ More The discovery of patterns that accurately discriminate one class label from another remains a challenging data mining task. Subgroup discovery (SD) is one of the frameworks that enables to elicit such interesting hypotheses from labeled data. A question remains fairly open: How to select an accurate heuristic search technique when exhaustive enumeration of the pattern space is infeasible? Existing approaches make use of beam-search, sampling and genetic algorithms for discovering a pattern set that is non-redundant and of high quality w.r.t. a pattern quality measure. We argue that such approaches produce pattern sets that lack of diversity: Only few patterns of high quality, and different enough, are discovered. Our main contribution is then to formally define pattern mining as a game and to solve it with Monte Carlo tree search (MCTS). It can be seen as an exhaustive search guided by random simulations which can be stopped early (limited budget) by virtue of its best-first search property. We show through a comprehensive set of experiments how MCTS enables the anytime discovery of a diverse pattern set of high quality. It outperforms other approaches when dealing with a large pattern search space and for different quality measures. Thanks to its genericity, our MCTS settings can be used for SD but also for many other pattern mining tasks. △ Less

Submitted 6 December, 2017; v1 submitted 28 September, 2016; originally announced September 2016.

Comments: This article has been accepted for publication in the journal \textit{Data Mining and Knowledge Discovery} (December 5th, 2017)

arXiv:1508.00801 [pdf, other]

Identifying Avatar Aliases in Starcraft 2

Authors: Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue

Abstract: In electronic sports, cyberathletes conceal their online training using different avatars (virtual identities), allowing them not being recognized by the opponents they may face in future competitions. In this article, we propose a method to tackle this avatar aliases identification problem. Our method trains a classifier on behavioural data and processes the confusion matrix to output label pairs… ▽ More In electronic sports, cyberathletes conceal their online training using different avatars (virtual identities), allowing them not being recognized by the opponents they may face in future competitions. In this article, we propose a method to tackle this avatar aliases identification problem. Our method trains a classifier on behavioural data and processes the confusion matrix to output label pairs which concentrate confusion. We experimented with Starcraft 2 and report our first results. △ Less

Submitted 4 August, 2015; originally announced August 2015.

Comments: Machine Learning and Data Mining for Sports Analytics ECML/PKDD 2015 workshop, 11 September 2015, Porto, Portugal

arXiv:1506.04693 [pdf, other]

doi 10.1007/s13278-015-0262-4

Interpreting communities based on the evolution of a dynamic attributed network

Authors: Günce Orman, Vincent Labatut, Marc Plantevit, Jean-François Boulicaut

Abstract: Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackl… ▽ More Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either by the type of data they handle, or by the nature of the results they output. In this work, we see the interpretation of communities as a problem independent from the detection process, consisting in identifying the most characteristic features of communities. We give a formal definition of this problem and propose a method to solve it. To this aim, we first define a sequence-based representation of networks, combining temporal information, community structure, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We study the performance of our method on artificially generated dynamic attributed networks. We also empirically validate our framework on real-world systems: a DBLP network of scientific collaborations, and a LastFM network of social and musical interactions. △ Less

Submitted 15 June, 2015; originally announced June 2015.

Journal ref: Social Network Analysis and Mining Journal (SNAM), 2015, 5, pp.20. \<http://link.springer.com/article/10.1007%2Fs13278-015-0262-4\>. \<10.1007/s13278-015-0262-4\>

arXiv:1406.6597 [pdf]

doi 10.1109/ASONAM.2014.6921629

A Method for Characterizing Communities in Dynamic Attributed Complex Networks

Authors: Günce Keziban Orman, Vincent Labatut, Marc Plantevit, Jean-François Boulicaut

Abstract: Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most o… ▽ More Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most of the existing works focus on defining methods for the detection of communities, only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either in the type of data they handle, or by the nature of the results they output. In this work, we propose a method to efficiently support such a characterization task. We first define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We also show how to detect unusual behavior in a community, and highlight outliers. Finally, as an illustration, we apply our method to a network of scientific collaborations. △ Less

Submitted 25 June, 2014; originally announced June 2014.

Comments: IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), Pékin : China (2014)

arXiv:1312.4676 [pdf, ps, other]

Une méthode pour caractériser les communautés des réseaux dynamiques à attributs

Authors: Günce Keziban Orman, Vincent Labatut, Marc Plantevit, Jean-François Boulicaut

Abstract: Many complex systems are modeled through complex networks whose analysis reveals typical topological properties. Amongst those, the community structure is one of the most studied. Many methods are proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic networks. A community structure takes the form of a partition of the node set, which must then be chara… ▽ More Many complex systems are modeled through complex networks whose analysis reveals typical topological properties. Amongst those, the community structure is one of the most studied. Many methods are proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic networks. A community structure takes the form of a partition of the node set, which must then be characterized relatively to the properties of the studied system. We propose a method to support such a characterization task. We define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then characterize communities using the most representative emerging sequential patterns of its nodes. This also allows detecting unusual behavior in a community. We describe an empirical study of a network of scientific collaborations.---De nombreux systèmes complexes sont étudiés via l'analyse de réseaux dits complexes ayant des propriétés topologiques typiques. Parmi cellesci, les structures de communautés sont particulièrement étudiées. De nombreuses méthodes permettent de les détecter, y compris dans des réseaux contenant des attributs nodaux, des liens orientés ou évoluant dans le temps. La détection prend la forme d'une partition de l'ensemble des noeuds, qu'il faut ensuite caractériser relativement au système modélisé. Nous travaillons sur l'assistance à cette tâche de caractérisation. Nous proposons une représentation des réseaux sous la forme de séquences de descripteurs de noeuds, qui combinent les informations temporelles, les mesures topologiques, et les valeurs des attributs nodaux. Les communautés sont caractérisées au moyen des motifs séquentiels émergents les plus représentatifs issus de leurs noeuds. Ceci permet notamment la détection de comportements inhabituels au sein d'une communauté. Nous décrivons une étude empirique sur un réseau de collaboration scientifique. △ Less

Submitted 17 December, 2013; originally announced December 2013.

Comments: in French

Journal ref: Une méthode pour caractériser les communautés des réseaux dynamiques à attributs, Rennes : France (2014)

Showing 1–5 of 5 results for author: Boulicaut, J