Search | arXiv e-print repository

arXiv:2104.06893 [pdf, other]

I Wish I Would Have Loved This One, But I Didn't -- A Multilingual Dataset for Counterfactual Detection in Product Reviews

Authors: James O'Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, Danushka Bollegala

Abstract: Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple language… ▽ More Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple languages, covers a new application area of e-commerce reviews, and provides high quality professional annotations. We train CFD models using different text representation methods and classifiers. We find that these models are robust against the selectional biases introduced due to cue phrase-based sentence selection. Moreover, our CFD dataset is compatible with prior datasets and can be merged to learn accurate CFD models. Applying machine translation on English counterfactual examples to create multilingual data performs poorly, demonstrating the language-specificity of this problem, which has been ignored so far. △ Less

Submitted 15 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted to EMNLP 2021

arXiv:2103.00451 [pdf, other]

Discovering Dense Correlated Subgraphs in Dynamic Networks

Authors: Giulia Preti, Polina Rozenshtein, Aristides Gionis, Yannis Velegrakis

Abstract: Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binar… ▽ More Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binary time series that represent the activity of the edges. For the density, we study two variants based on the average degree. For these problem variants we enumerate the maximal subgraphs and compute a compact subset of subgraphs that have limited overlap. We propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework on both real and synthetic datasets. The results of the synthetic data demonstrate the high accuracy of the approximation and show the scalability of the framework. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: Full version of the paper included in the proceedings of the PAKDD 2021 conference

Journal ref: PAKDD 2021

arXiv:2007.03950 [pdf, other]

Mining Dense Subgraphs with Similar Edges

Authors: Polina Rozenshtein, Giulia Preti, Aristides Gionis, Yannis Velegrakis

Abstract: When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edg… ▽ More When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MinCut, and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach. △ Less

Submitted 8 July, 2020; originally announced July 2020.

arXiv:1902.01832 [pdf, other]

doi 10.1145/3097983.3098199

Inferring the strength of social ties: a community-driven approach

Authors: Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Abstract: Online social networks are growing and becoming denser. The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in… ▽ More Online social networks are growing and becoming denser. The social connections of a given person may have very high variability: from close friends and relatives to acquaintances to people who hardly know. Inferring the strength of social ties is an important ingredient for modeling the interaction of users in a network and understanding their behavior. Furthermore, the problem has applications in computational social science, viral marketing, and people recommendation. In this paper we study the problem of inferring the strength of social ties in a given network. Our work is motivated by a recent approach [27], which leverages the strong triadic closure (STC) principle, a hypothesis rooted in social psychology [13]. To guide our inference process, in addition to the network structure, we also consider as input a collection of tight communities. Those are sets of vertices that we expect to be connected via strong ties. Such communities appear in different situations, e.g., when being part of a community implies a strong connection to one of the existing members. We consider two related problem formalizations that reflect the assumptions of our setting: small number of STC violations and strong-tie connectivity in the input communities. We show that both problem formulations are NP-hard. We also show that one problem formulation is hard to approximate, while for the second we develop an algorithm with approximation guarantee. We validate the proposed method on real-world datasets by comparing with baselines that optimize STC violations and community connectivity separately. △ Less

Submitted 5 February, 2019; originally announced February 2019.

arXiv:1808.09317 [pdf, other]

Finding events in temporal networks: Segmentation meets densest-subgraph discovery

Authors: Polina Rozenshtein, Francesco Bonchi, Aristides Gionis, Mauro Sozio, Nikolaj Tatti

Abstract: In this paper we study the problem of discovering a timeline of events in a temporal network. We model events as dense subgraphs that occur within intervals of network activity. We formulate the event-discovery task as an optimization problem, where we search for a partition of the network timeline into k non-overlapping intervals, such that the intervals span subgraphs with maximum total density.… ▽ More In this paper we study the problem of discovering a timeline of events in a temporal network. We model events as dense subgraphs that occur within intervals of network activity. We formulate the event-discovery task as an optimization problem, where we search for a partition of the network timeline into k non-overlapping intervals, such that the intervals span subgraphs with maximum total density. The output is a sequence of dense subgraphs along with corresponding time intervals, capturing the most interesting events during the network lifetime. A naive solution to our optimization problem has polynomial but prohibitively high running time complexity. We adapt existing recent work on dynamic densest-subgraph discovery and approximate dynamic programming to design a fast approximation algorithm. Next, to ensure richer structure, we adjust the problem formulation to encourage coverage of a larger set of nodes. This problem is NP-hard even for static graphs. However, on static graphs a simple greedy algorithm leads to approximate solution due to submodularity. We extended this greedy approach for the case of temporal networks. However, the approximation guarantee does not hold. Nevertheless, according to the experiments, the algorithm finds good quality solutions. △ Less

Submitted 13 September, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

arXiv:1802.03549 [pdf, other]

doi 10.1007/s10618-020-00673-0

From acquaintance to best friend forever: robust and fine-grained inference of social tie strengths

Authors: Florian Adriaens, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt, Polina Rozenshtein

Abstract: Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveragi… ▽ More Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveraging the Strong Triadic Closure (STC) property. The STC property states that if person A has strong social ties with persons B and C, B and C must be connected to each other as well (whether with a weak or strong tie). Sintos and Tsaparas exploited this to formulate the inference of the strength of social ties as NP-hard optimization problem, and proposed two approximation algorithms. We refine and improve upon this landmark paper, by developing a sequence of linear relaxations of this problem that can be solved exactly in polynomial time. Usefully, these relaxations infer more fine-grained levels of tie strength (beyond strong and weak), which also allows to avoid making arbitrary strong/weak strength assignments when the network topology provides inconclusive evidence. One of the relaxations simultaneously infers the presence of a limited number of STC violations. An extensive theoretical analysis leads to two efficient algorithmic approaches. Finally, our experimental results elucidate the strengths of the proposed approach, and sheds new light on the validity of the STC property in practice. △ Less

Submitted 18 September, 2018; v1 submitted 10 February, 2018; originally announced February 2018.

Journal ref: Data Min. Knowl. Discov. 34(3): 611-651 (2020)

arXiv:1801.08586 [pdf, other]

doi 10.1137/1.9781611975321.75

Reconstructing a cascade from temporal observations

Authors: Han Xiao, Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Abstract: Given a subset of active nodes in a network can we re- construct the cascade that has generated these observa- tions? This is a problem that has been studied in the literature, but here we focus in the case that tempo- ral information is available about the active nodes. In particular, we assume that in addition to the subset of active nodes we also know their activation time. We formulate this ca… ▽ More Given a subset of active nodes in a network can we re- construct the cascade that has generated these observa- tions? This is a problem that has been studied in the literature, but here we focus in the case that tempo- ral information is available about the active nodes. In particular, we assume that in addition to the subset of active nodes we also know their activation time. We formulate this cascade-reconstruction problem as a variant of a Steiner-tree problem: we ask to find a tree that spans all reported active nodes while satisfying temporal-consistency constraints. We present three approximation algorithms. The best algorithm in terms of quality achieves a O(\sqrt{k})-approximation guarantee, where k is the number of active nodes, while the most efficient algorithm has linearithmic running time, making it scalable to very large graphs. We evaluate our algorithms on real-world networks with both simulated and real cascades. Our results in- dicate that utilizing the available temporal information allows for more accurate cascade reconstruction. Fur- thermore, our objective leads to finding the "backbone" of the cascade and it gives solutions of high precision. △ Less

Submitted 25 January, 2018; originally announced January 2018.

Comments: 12 pages, 7 figures, to appear in SDM 2018

arXiv:1701.07221 [pdf, other]

Community-aware network sparsification

Authors: Aristides Gionis, Polina Rozenshtein, Nikolaj Tatti, Evimaria Terzi

Abstract: Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges. In this paper we consider a novel f… ▽ More Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges. In this paper we consider a novel formulation of the network-sparsification problem. In addition to the network, we also consider as input a set of communities. The goal is to sparsify the network so as to preserve the network structure with respect to the given communities. We introduce two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties. From the technical point of view, we prove hardness results and devise effective approximation algorithms. Our experimental results on a large collection of datasets demonstrate the effectiveness of our algorithms. △ Less

Submitted 25 January, 2017; originally announced January 2017.

arXiv:1606.09446 [pdf, other]

Discovering topically- and temporally-coherent events in interaction networks

Authors: Han Xiao, Polina Rozenshtein, Aristides Gionis

Abstract: With the increasing use of online communication platforms, such as email, twitter, and messaging applications, we are faced with a growing amount of data that combine content (what is said), time (when), and user (by whom) information. An important computational challenge is to analyze these data, discover meaningful patterns, and understand what is happening. We consider the problem of mining onl… ▽ More With the increasing use of online communication platforms, such as email, twitter, and messaging applications, we are faced with a growing amount of data that combine content (what is said), time (when), and user (by whom) information. An important computational challenge is to analyze these data, discover meaningful patterns, and understand what is happening. We consider the problem of mining online communication data and finding top-k temporal events. We define a temporal event to be a coherent topic that is discussed frequently, in a relatively short time span, while the information ow of the event respects the underlying network structure. We construct our model for detecting temporal events in two steps. We first introduce the notion of interaction meta-graph, which connects associated interactions. Using this notion, we define a temporal event to be a subset of interactions that (i) are topically and temporally close and (ii) correspond to a tree that captures the information ow. Finding the best temporal event leads to budget version of the prize-collecting Steiner-tree (PCST) problem, which we solve using three different methods: a greedy approach, a dynamic-programming algorithm, and an adaptation to an existing approximation algorithm. The problem of finding the top- k events among a set of candidate events maps to maximum set-cover problem, and thus, solved by greedy. We compare and analyze our algorithms in both synthetic and real datasets, such as twitter and email communication. The results show that our methods are able to detect meaningful temporal events. △ Less

Submitted 3 July, 2016; v1 submitted 30 June, 2016; originally announced June 2016.

Showing 1–9 of 9 results for author: Rozenshtein, P