-
GTEA: Inductive Representation Learning on Temporal Interaction Graphs via Temporal Edge Aggregation
Authors:
Siyue Xie,
Yiming Li,
Da Sun Handason Tam,
Xiaxin Liu,
Qiu Fang Ying,
Wing Cheong Lau,
Dah Ming Chiu,
Shou Zhi Chen
Abstract:
In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence mode…
▽ More
In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence model with a time encoder to learn pairwise interactional dynamics between two adjacent nodes.This helps capture complex temporal interactional patterns of a node pair along the history, which generates edge embeddings that can be fed into a GNN backbone. By aggregating features of neighboring nodes and the corresponding edge embeddings, GTEA jointly learns both topological and temporal dependencies of a TIG. In addition, a sparsity-inducing self-attention scheme is incorporated for neighbor aggregation, which highlights more important neighbors and suppresses trivial noises for GTEA. By jointly optimizing the sequence model and the GNN backbone, GTEA learns more comprehensive node representations capturing both temporal and graph structural characteristics. Extensive experiments on five large-scale real-world datasets demonstrate the superiority of GTEA over other inductive models.
△ Less
Submitted 3 May, 2023; v1 submitted 11 September, 2020;
originally announced September 2020.
-
Identifying Illicit Accounts in Large Scale E-payment Networks -- A Graph Representation Learning Approach
Authors:
Da Sun Handason Tam,
Wing Cheong Lau,
Bin Hu,
Qiu Fang Ying,
Dah Ming Chiu,
Hong Liu
Abstract:
Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we…
▽ More
Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we propose an end-to-end Graph Convolution Network (GCN)-based algorithm to learn the embeddings of the nodes and edges of a large-scale time-evolving graph. In the context of e-payment transaction graphs, the resultant node and edge embeddings can effectively characterize the user-background as well as the financial transaction patterns of individual account holders. As such, we can use the graph embedding results to drive downstream graph mining tasks such as node-classification to identify illicit accounts within the payment networks. Our algorithm outperforms state-of-the-art schemes including GraphSAGE, Gradient Boosting Decision Tree and Random Forest to deliver considerably higher accuracy (94.62% and 86.98% respectively) in classifying user accounts within 2 practical e-payment transaction datasets. It also achieves outstanding accuracy (97.43%) for another biomedical entity identification task while using only edge-related information.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Modeling and Quantifying the Forces Driving Online Video Popularity Evolution
Authors:
Jiqiang Wu,
Yipeng Zhou,
Dah Ming Chiu
Abstract:
Video popularity is an essential reference for optimizing resource allocation and video recommendation in online video services. However, there is still no convincing model that can accurately depict a video's popularity evolution. In this paper, we propose a dynamic popularity model by modeling the video information diffusion process driven by various forms of recommendation. Through fitting the…
▽ More
Video popularity is an essential reference for optimizing resource allocation and video recommendation in online video services. However, there is still no convincing model that can accurately depict a video's popularity evolution. In this paper, we propose a dynamic popularity model by modeling the video information diffusion process driven by various forms of recommendation. Through fitting the model with real traces collected from a practical system, we can quantify the strengths of the recommendation forces. Such quantification can lead to characterizing video popularity patterns, user behaviors and recommendation strategies, which is illustrated by a case study of TV episodes.
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
Who are Like-minded: Mining User Interest Similarity in Online Social Networks
Authors:
Chunfeng Yang,
Yipeng Zhou,
Dah Ming Chiu
Abstract:
In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justi…
▽ More
In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justify why it is used instead of video-based methods, or many latent topic models such as LDA and Collaborative Filtering approaches. We then show the effectiveness of the different demographic and social features, and their combinations and derivatives, in predicting user interest similarity, based on different machine-learning methods for combining multiple features. We propose a hybrid tree-encoded linear model for combining the features, and show that it out-performs other linear and treebased models. Our methods can be used to predict user interest similarity when the ground-truth is not available, e.g. for new users, or inactive users whose interests may have changed from old access data, and is useful for video recommendation. Our study is based on a rich dataset from Tencent, a popular service provider of social networks, video services, and various other services in China.
△ Less
Submitted 7 March, 2016;
originally announced March 2016.
-
A Population Model for the Academic Ecosystem
Authors:
Yan Wu,
Srinivasan Venkatramanan,
Dah Ming Chiu
Abstract:
In recent times, the academic ecosystem has seen a tremendous growth in number of authors and publications. While most temporal studies in this area focus on evolution of co-author and citation network structure, this systemic inflation has received very little attention. In this paper, we address this issue by proposing a population model for academia, derived from publication records in the Comp…
▽ More
In recent times, the academic ecosystem has seen a tremendous growth in number of authors and publications. While most temporal studies in this area focus on evolution of co-author and citation network structure, this systemic inflation has received very little attention. In this paper, we address this issue by proposing a population model for academia, derived from publication records in the Computer Science domain. We use a generalized branching process as an overarching framework, which enables us to describe the evolution and composition of the research community in a systematic manner. Further, the observed patterns allow us to shed light on researchers' lifecycle encompassing arrival, academic life expectancy, activity, productivity and offspring distribution in the ecosystem. We believe such a study will help develop better bibliometric indices which account for the inflation, and also provide insights into sustainable and efficient resource management for academia.
△ Less
Submitted 28 March, 2015;
originally announced March 2015.
-
Modeling and Analysis of Scholar Mobility on Scientific Landscape
Authors:
Qiu Fang Ying,
Srinivasan Venkatramanan,
Dah Ming Chiu
Abstract:
Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science followi…
▽ More
Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science following different strategies, seeking different utilities. We use this model to study whether strategies widely used in current scientific community can provide a balance between individual scientific success and the efficiency and diversity of the whole academic society. Through extensive simulations, we provide insights into the roles of different strategies, such as choosing topics according to research potential or the popularity. Our model provides a conceptual framework and a computational approach to analyze scholars' behavior and its impact on scientific production. We also discuss how such an agent-based modeling approach can be integrated with big real-world scholarly data.
△ Less
Submitted 10 March, 2015; v1 submitted 2 February, 2015;
originally announced February 2015.
-
Modeling Dynamics of Online Video Popularity
Authors:
Jiqiang Wu,
Yipeng Zhou,
Dah Ming Chiu,
Youwei Hua,
Zirong Zhu
Abstract:
Large Internet video delivery systems serve millions of videos to tens of millions of users on daily basis, via Video-on-Demand and live streaming. Video popularity evolves over time. It represents the workload, as welll as business value, of the video to the overall system. The ability to predict video popularity is very helpful for improving service quality and operating efficiency. Previous stu…
▽ More
Large Internet video delivery systems serve millions of videos to tens of millions of users on daily basis, via Video-on-Demand and live streaming. Video popularity evolves over time. It represents the workload, as welll as business value, of the video to the overall system. The ability to predict video popularity is very helpful for improving service quality and operating efficiency. Previous studies adopted simple models for video popularity, or directly adopted patterns from measurement studies. In this paper, we develop a stochastic fluid model that tries to capture two hidden processes that give rise to different patterns of a given video's popularity evolution: the information spreading process, and the user reaction process. Specifically, these processes model how the video is recommended to the user, the videos inherent attractiveness, and users reaction rate, and yield specific popularity evolution patterns. We then validate our model by matching the predictions of the model with observed patterns from our collaborator, a large content provider in China. This model thus gives us the insight to explain the common and different video popularity evolution patterns and why.
△ Less
Submitted 7 December, 2014;
originally announced December 2014.
-
Fake View Analytics in Online Video Services
Authors:
Liang Chen,
Yipeng Zhou,
Dah Ming Chiu
Abstract:
Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? C…
▽ More
Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.
-
Smart Streaming for Online Video Services
Authors:
Liang Chen,
Yipeng Zhou,
Dah Ming Chiu
Abstract:
Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure…
▽ More
Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure behavior can be predicted, we develop smart streaming that can be used to improve user QoE with limited server bandwidth or save bandwidth cost with unlimited server bandwidth. Through measurement, we extract certain user behavior properties for implementing such smart streaming, and demonstrate its advantage using prototype implementation as well as simulations.
△ Less
Submitted 9 April, 2014; v1 submitted 17 July, 2013;
originally announced July 2013.
-
The Academic Social Network
Authors:
Tom Z. J. Fu,
Qianqian Song,
Dah Ming Chiu
Abstract:
Through academic publications, the authors of these publications form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors pick co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Libra and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type…
▽ More
Through academic publications, the authors of these publications form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors pick co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Libra and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to find out, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities -authors, publication venues and institutions. We go beyond traditional metrics such as paper counts, citations and h-index. Specifically, we define metrics such as influence, connections and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his/her connections by co-authoring with other authors, and specially from other authors with high connections. An author receives exposure by publishing in selective venues where publications received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors' rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.
△ Less
Submitted 17 February, 2014; v1 submitted 19 June, 2013;
originally announced June 2013.
-
MYE: Missing Year Estimation in Academic Social Networks
Authors:
Tom Z. J. Fu,
Qiufang Ying,
Dah Ming Chiu
Abstract:
In bibliometrics studies, a common challenge is how to deal with incorrect or incomplete data. However, given a large volume of data, there often exists certain relationships between the data items that can allow us to recover missing data items and correct erroneous data. In this paper, we study a particular problem of this sort - estimating the missing year information associated with publicatio…
▽ More
In bibliometrics studies, a common challenge is how to deal with incorrect or incomplete data. However, given a large volume of data, there often exists certain relationships between the data items that can allow us to recover missing data items and correct erroneous data. In this paper, we study a particular problem of this sort - estimating the missing year information associated with publications (and hence authors' years of active publication). We first propose a simple algorithm that only makes use of the "direct" information, such as paper citation/reference relationships or paper-author relationships. The result of this simple algorithm is used as a benchmark for comparison. Our goal is to develop algorithms that increase both the coverage (the percentage of missing year papers recovered) and accuracy (mean absolute error of the estimated year to the real year). We propose some advanced algorithms that extend inference by information propagation. For each algorithm, we propose three versions according to the given academic social network type: a) Homogeneous (only contains paper citation links), b) Bipartite (only contains paper-author relations), and, c) Heterogeneous (both paper citation and paper-author relations). We carry out experiments on the three public data sets (MSR Libra, DBLP and APS), and evaluated by applying the K-fold cross validation method. We show that the advanced algorithms can improve both coverage and accuracy.
△ Less
Submitted 22 July, 2014; v1 submitted 18 June, 2013;
originally announced June 2013.
-
Exploring Network Economics
Authors:
Dah Ming Chiu,
Wai Yin Ng
Abstract:
In this paper, we explore what \emph{network economics} is all about, focusing on the interesting topics brought about by the Internet. Our intent is make this a brief survey, useful as an outline for a course on this topic, with an extended list of references. We try to make it as intuitive and readable as possible. We also deliberately try to be critical at times, and hope our interpretation of…
▽ More
In this paper, we explore what \emph{network economics} is all about, focusing on the interesting topics brought about by the Internet. Our intent is make this a brief survey, useful as an outline for a course on this topic, with an extended list of references. We try to make it as intuitive and readable as possible. We also deliberately try to be critical at times, and hope our interpretation of the topic will lead to interests for further discussions by those doing research in the same field.
△ Less
Submitted 7 June, 2011;
originally announced June 2011.
-
Reciprocating Preferences Stablize Matching: College Admissions Revisited
Authors:
Jian Liu,
Dah Ming Chiu
Abstract:
In considering the college admissions problem, almost fifty years ago, Gale and Shapley came up with a simple abstraction based on preferences of students and colleges. They introduced the concept of stability and optimality; and proposed the deferred acceptance (DA) algorithm that is proven to lead to a stable and optimal solution. This algorithm is simple and computationally efficient. Furthermo…
▽ More
In considering the college admissions problem, almost fifty years ago, Gale and Shapley came up with a simple abstraction based on preferences of students and colleges. They introduced the concept of stability and optimality; and proposed the deferred acceptance (DA) algorithm that is proven to lead to a stable and optimal solution. This algorithm is simple and computationally efficient. Furthermore, in subsequent studies it is shown that the DA algorithm is also strategy-proof, which means, when the algorithm is played out as a mechanism for matching two sides (e.g. colleges and students), the parties (colleges or students) have no incentives to act other than according to their true preferences. Yet, in practical college admission systems, the DA algorithm is often not adopted. Instead, an algorithm known as the Boston Mechanism (BM) or its variants are widely adopted. In BM, colleges accept students without deferral (considering other colleges' decisions), which is exactly the opposite of Gale-Shapley's DA algorithm. To explain and rationalize this reality, we introduce the notion of reciprocating preference to capture the influence of a student's interest on a college's decision. This model is inspired by the actual mechanism used to match students to universities in Hong Kong. The notion of reciprocating preference defines a class of matching algorithms, allowing different degrees of reciprocating preferences by the students and colleges. DA and BM are but two extreme cases (with zero and a hundred percent reciprocation) of this set. This model extends the notion of stability and optimality as well. As in Gale-Shapley's original paper, we discuss how the analogy can be carried over to the stable marriage problem, thus demonstrating the model's general applicability.
△ Less
Submitted 3 May, 2011; v1 submitted 4 November, 2010;
originally announced November 2010.
-
Mathematical Modeling of Competition in Sponsored Search Market
Authors:
Jian Liu,
Dah Ming Chiu
Abstract:
Sponsored search mechanisms have drawn much attention from both academic community and industry in recent years since the seminal papers of [13] and [14]. However, most of the existing literature concentrates on the mechanism design and analysis within the scope of only one search engine in the market. In this paper we propose a mathematical framework for modeling the interaction of publishers, ad…
▽ More
Sponsored search mechanisms have drawn much attention from both academic community and industry in recent years since the seminal papers of [13] and [14]. However, most of the existing literature concentrates on the mechanism design and analysis within the scope of only one search engine in the market. In this paper we propose a mathematical framework for modeling the interaction of publishers, advertisers and end users in a competitive market. We first consider the monopoly market model and provide optimal solutions for both ex ante and ex post cases, which represents the long-term and short-term revenues of search engines respectively. We then analyze the strategic behaviors of end users and advertisers under duopoly and prove the existence of equilibrium for both search engines to co-exist from ex-post perspective. To show the more general ex ante results, we carry out extensive simulations under different parameter settings. Our analysis and observation in this work can provide useful insight in regulating the sponsored search market and protecting the interests of advertisers and end users.
△ Less
Submitted 31 August, 2010; v1 submitted 5 June, 2010;
originally announced June 2010.
-
Club Formation by Rational Sharing : Content, Viability and Community Structure
Authors:
W. -Y. Ng,
D. M. Chiu,
W. K. Lin
Abstract:
A sharing community prospers when participation and contribution are both high. We suggest the two, while being related decisions every peer makes, should be given separate rational bases. Considered as such, a basic issue is the viability of club formation, which necessitates the modelling of two major sources of heterogeneity, namely, peers and shared content. This viability perspective clearl…
▽ More
A sharing community prospers when participation and contribution are both high. We suggest the two, while being related decisions every peer makes, should be given separate rational bases. Considered as such, a basic issue is the viability of club formation, which necessitates the modelling of two major sources of heterogeneity, namely, peers and shared content. This viability perspective clearly explains why rational peers contribute (or free-ride when they don't) and how their collective action determines viability as well as the size of the club formed. It also exposes another fundamental source of limitation to club formation apart from free-riding, in the community structure in terms of the relation between peers' interest (demand) and sharing (supply).
△ Less
Submitted 18 September, 2005;
originally announced September 2005.
-
Statistical Modelling of Information Sharing: Community, Membership and Content
Authors:
W. -Y. Ng,
W. K. Lin,
D. M. Chiu
Abstract:
File-sharing systems, like many online and traditional information sharing communities (e.g. newsgroups, BBS, forums, interest clubs), are dynamical systems in nature. As peers get in and out of the system, the information content made available by the prevailing membership varies continually in amount as well as composition, which in turn affects all peers' join/leave decisions. As a result, th…
▽ More
File-sharing systems, like many online and traditional information sharing communities (e.g. newsgroups, BBS, forums, interest clubs), are dynamical systems in nature. As peers get in and out of the system, the information content made available by the prevailing membership varies continually in amount as well as composition, which in turn affects all peers' join/leave decisions. As a result, the dynamics of membership and information content are strongly coupled, suggesting interesting issues about growth, sustenance and stability.
In this paper, we propose to study such communities with a simple statistical model of an information sharing club. Carrying their private payloads of information goods as potential supply to the club, peers join or leave on the basis of whether the information they demand is currently available. Information goods are chunked and typed, as in a file sharing system where peers contribute different files, or a forum where messages are grouped by topics or threads. Peers' demand and supply are then characterized by statistical distributions over the type domain.
This model reveals interesting critical behaviour with multiple equilibria. A sharp growth threshold is derived: the club may grow towards a sustainable equilibrium only if the value of an order parameter is above the threshold, or shrink to emptiness otherwise. The order parameter is composite and comprises the peer population size, the level of their contributed supply, the club's efficiency in information search, the spread of supply and demand over the type domain, as well as the goodness of match between them.
△ Less
Submitted 1 July, 2005; v1 submitted 28 March, 2005;
originally announced March 2005.