Search | arXiv e-print repository

Understanding Online Behaviors through a Temporal Lens

Authors: Tai-Quan Peng, Jonathan J. H. Zhu

Abstract: Timestamps in digital traces include significant detailed information on when human behaviors occur, which is universally available and standardized in all types of digital traces. Nevertheless, the concept of time is under-explicated in empirical studies of online behaviors. This paper discusses the (un)desirable properties of timestamps in digital traces and summarizes how timestamps in digital… ▽ More Timestamps in digital traces include significant detailed information on when human behaviors occur, which is universally available and standardized in all types of digital traces. Nevertheless, the concept of time is under-explicated in empirical studies of online behaviors. This paper discusses the (un)desirable properties of timestamps in digital traces and summarizes how timestamps in digital traces have been utilized in existing studies of online behaviors. The paper argues that time-in-behaviors perspective can provide a microscope with a renovated temporal lens to observe and understand online behaviors. Going beyond the traditional behaviors-in-time perspective, time-in-behaviors perspective enables empirical examination of online behaviors from multiple units of analysis (e.g., discrete behaviors, behavioral sessions, and behavioral trajectories) and from multiple dimensions (e.g., duration, order, transition, rhythm). The paper shows the potentials of the time-in-behaviors perspective with several empirical cases and proposes future directions in explicating the concept of time in computational social science. △ Less

Submitted 17 January, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

arXiv:2212.13139 [pdf, ps, other]

Universality of preference behaviors in online music-listener bipartite networks: A Big Data analysis

Authors: Xiao-Pu Han, Fen Lin, Jonathan J. H. Zhu, Tarik Hadzibeganovic

Abstract: We investigate the formation of musical preferences of millions of users of the NetEase Cloud Music (NCM), one of the largest online music platforms in China. We combine the methods from complex networks theory and information sciences within the context of Big Data analysis to unveil statistical patterns and community structures underlying the formation and evolution of musical preference behavio… ▽ More We investigate the formation of musical preferences of millions of users of the NetEase Cloud Music (NCM), one of the largest online music platforms in China. We combine the methods from complex networks theory and information sciences within the context of Big Data analysis to unveil statistical patterns and community structures underlying the formation and evolution of musical preference behaviors. Our analyses address the decay patterns of music influence, users' sensitivity to music, age and gender differences, and their relationship to regional economic indicators. Employing community detection in user-music bipartite networks, we identified eight major cultural communities in the population of NCM users. Female users exhibited higher within-group variability in preference behavior than males, with a major transition occurring around the age of 25. Moreveor, the musical tastes and the preference diversity measures of women were also more strongly associated with economic factors. However, in spite of the highly variable popularity of music tracks and the identified cultural and demographic differences, we observed that the evolution of musical preferences over time followed a power-law-like decaying function, and that NCM listeners showed the highest sensitivity to music released in their adolescence, peaking at the age of 13. Our findings suggest the existence of universal properties in the formation of musical tastes but also their culture-specific relationship to demographic factors, with wide-ranging implications for community detection and recommendation system design in online music platforms. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 23 pages, 15 Figures, 4 Tables

arXiv:2010.09905 [pdf, other]

SmartTriage: A system for personalized patient data capture, documentation generation, and decision support

Authors: Ilya Valmianski, Nave Frost, Navdeep Sood, Yang Wang, Baodong Liu, James J. Zhu, Sunil Karumuri, Ian M. Finn, Daniel S. Zisook

Abstract: Symptom checkers have emerged as an important tool for collecting symptoms and diagnosing patients, minimizing the involvement of clinical personnel. We developed a machine-learning-backed system, SmartTriage, which goes beyond conventional symptom checking through a tight bi-directional integration with the electronic medical record (EMR). Conditioned on EMR-derived patient history, our system id… ▽ More Symptom checkers have emerged as an important tool for collecting symptoms and diagnosing patients, minimizing the involvement of clinical personnel. We developed a machine-learning-backed system, SmartTriage, which goes beyond conventional symptom checking through a tight bi-directional integration with the electronic medical record (EMR). Conditioned on EMR-derived patient history, our system identifies the patient's chief complaint from a free-text entry and then asks a series of discrete questions to obtain relevant symptomatology. The patient-specific data are used to predict detailed ICD-10-CM codes as well as medication, laboratory, and imaging orders. Patient responses and clinical decision support (CDS) predictions are then inserted back into the EMR. To train the machine learning components of SmartTriage, we employed novel data sets of over 25 million primary care encounters and 1 million patient free-text reason-for-visit entries. These data sets were used to construct: (1) a long short-term memory (LSTM) based patient history representation, (2) a fine-tuned transformer model for chief complaint extraction, (3) a random forest model for question sequencing, and (4) a feed-forward network for CDS predictions. In total, our system supports 337 patient chief complaints, which together make up $>90\%$ of all primary care encounters at Kaiser Permanente. △ Less

Submitted 11 November, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted as a proceeding for ML4H 2021

ACM Class: J.3; I.2.7

arXiv:2006.01027 [pdf, other]

doi 10.1209/0295-5075/131/28002

Go viral or go broadcast? Characterizing the virality and growth of cascades

Authors: Yafei Zhang, Lin Wang, Jonathan J. H. Zhu, Xiaofan Wang

Abstract: Quantifying the virality of cascades is an important question across disciplines such as the transmission of disease, the spread of information and the diffusion of innovations. An appropriate virality metric should be able to disambiguate between a shallow, broadcast-like diffusion process and a deep, multi-generational branching process. Although several valuable works have been dedicated to thi… ▽ More Quantifying the virality of cascades is an important question across disciplines such as the transmission of disease, the spread of information and the diffusion of innovations. An appropriate virality metric should be able to disambiguate between a shallow, broadcast-like diffusion process and a deep, multi-generational branching process. Although several valuable works have been dedicated to this field, most of them fail to take the position of the diffusion source into consideration, which makes them fall into the trap of graph isomorphism and would result in imprecise estimation of cascade virality inevitably under certain circumstances. In this paper, we propose a root-aware approach to quantifying the virality of cascades with proper consideration of the root node in a diffusion tree. With applications on synthetic and empirical cascades, we show the properties and potential utility of the proposed virality measure. Based on preferential attachment mechanisms, we further introduce a model to mimic the growth of cascades. The proposed model enables the interpolation between broadcast and viral spreading during the growth of cascades. Through numerical simulations, we demonstrate the effectiveness of the proposed model in characterizing the virality of growing cascades. Our work contributes to the understanding of cascade virality and growth, and could offer practical implications in a range of policy domains including viral marketing, infectious disease and information diffusion. △ Less

Submitted 28 June, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 10 pages, 15 figures, 2 tables

Journal ref: EPL, 131 (2020) 28002

arXiv:2006.00765 [pdf, other]

doi 10.1007/s11280-021-00862-x

Conspiracy vs science: A large-scale analysis of online discussion cascades

Authors: Yafei Zhang, Lin Wang, Jonathan J. H. Zhu, Xiaofan Wang

Abstract: With the emergence and rapid proliferation of social media platforms and social networking sites, recent years have witnessed a surge of misinformation spreading in our daily life. Drawing on a large-scale dataset which covers more than 1.4M posts and 18M comments, we investigate the propagation of two distinct narratives--(i) conspiracy information, whose claims are generally unsubstantiated and… ▽ More With the emergence and rapid proliferation of social media platforms and social networking sites, recent years have witnessed a surge of misinformation spreading in our daily life. Drawing on a large-scale dataset which covers more than 1.4M posts and 18M comments, we investigate the propagation of two distinct narratives--(i) conspiracy information, whose claims are generally unsubstantiated and thus referred as misinformation to some extent, and (ii) scientific information, whose origins are generally readily identifiable and verifiable--in an online social media platform. We find that conspiracy cascades tend to propagate in a multigenerational branching process while science cascades are more likely to grow in a breadth-first manner. Specifically, conspiracy information triggers larger cascades, involves more users and generations, persists longer, is more viral and bursty than science information. Content analysis reveals that conspiracy cascades contain more negative words and emotional words which convey anger, fear, disgust, surprise and trust. We also find that conspiracy cascades are more concerned with political and controversial topics. After applying machine learning models, we achieve an AUC score of nearly 90% in discriminating conspiracy from science narratives. We find that conspiracy cascades are more likely to be controlled by a broader set of users than science cascades, imposing new challenges on the management of misinformation. Although political affinity is thought to affect the consumption of misinformation, there is very little evidence that political orientation of the information source plays a role during the propagation of conspiracy information. Our study provides complementing evidence to current misinformation research and has practical policy implications to stem the propagation and mitigate the influence of misinformation online. △ Less

Submitted 6 April, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 24 pages, 9 figures, 3 tables

Journal ref: World Wide Web 24, 585-606 (2021)

arXiv:1911.01264 [pdf]

Global Regularity and Individual Variability in Dynamic Behaviors of Human Communication

Authors: Jonathan J. H. Zhu, Tai-Quan Peng

Abstract: A new model, called "Human Dynamics", has been recently proposed that individuals execute activities based on a perceived priority of tasks, which can be characterized by a power-law distribution of waiting time between consecutive tasks (Barabasi, 2005). This power-law distribution has been found to exist in diverse human behaviors, such as mail correspondence, e-mail communication, webpage brows… ▽ More A new model, called "Human Dynamics", has been recently proposed that individuals execute activities based on a perceived priority of tasks, which can be characterized by a power-law distribution of waiting time between consecutive tasks (Barabasi, 2005). This power-law distribution has been found to exist in diverse human behaviors, such as mail correspondence, e-mail communication, webpage browsing, video-on-demand, and mobile phone calls. However, the pattern has been observed at the global (i.e., aggregated) level without considering individual differences. To guard against ecological fallacy, it is necessary to test the model at the individual level. The current study aims to address the following questions: Is the power-law uniform across individuals? What distribution do individual behaviors follow? We examine these questions with a client log file of nearly 4,000 Internet users' web browsing behavior and a server log file of 2,300,000 users' file-sharing behaviors in a P2P system. The results confirm the human dynamic model at the aggregate-level both in webpage browsing and P2P usage behavior. We have also found that there is detectable variability across the individuals in the decaying rate (i.e., the exponent gamma) of the power-law distribution, which follows well-known distributions (i.e., Gaussian, Weibull, and log-normal). △ Less

Submitted 4 November, 2019; originally announced November 2019.

Comments: Paper presented at the Fifth Chinese Conference of Complex Networks (CCCN09), Qindao, China

arXiv:1910.01290 [pdf]

doi 10.1093/jcmc/zmz029

Mobile Phone Use as Sequential Processes: From Discrete Behaviors to Sessions of Behaviors and Trajectories of Sessions

Authors: Tai-Quan Peng, Jonathan J. H. Zhu

Abstract: Mobile phone use is an unfolding process by nature. In this study, it is explicated as two sequential processes: mobile sessions composed of an uninterrupted set of behaviors and mobile trajectories composed of mobile sessions and mobile-off time. A dataset of a five-month behavioral logfile of mobile application use by approximately 2,500 users in Hong Kong is used. Mobile sessions are constructe… ▽ More Mobile phone use is an unfolding process by nature. In this study, it is explicated as two sequential processes: mobile sessions composed of an uninterrupted set of behaviors and mobile trajectories composed of mobile sessions and mobile-off time. A dataset of a five-month behavioral logfile of mobile application use by approximately 2,500 users in Hong Kong is used. Mobile sessions are constructed and mined to uncover sequential characteristics and patterns in mobile phone use. Mobile trajectories are analyzed to examine intraindividual change and interindividual differences on mobile re-engagement as indicators of behavioral dynamics in mobile phone use. The study provides empirical support for and expands the boundaries of existing theories about combinatorial use of information and communication technologies (ICTs). Finally, the understanding on mobile temporality is enhanced, that is, mobile temporality is homogeneous across social sectors. Furthermore, mobile phones redefine, rather than blur, the boundary between private and public time. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: 36 pages, 4 figures

arXiv:1906.00756 [pdf, ps, other]

doi 10.34133/2021/9831621

The Strength of Structural Diversity in Online Social Networks

Authors: Yafei Zhang, Lin Wang, Jonathan J. H. Zhu, Xiaofan Wang, Alex 'Sandy' Pentland

Abstract: Understanding the way individuals are interconnected in social networks is of prime significance to predict their collective outcomes. Leveraging a large-scale dataset from a knowledge-sharing website, this paper presents an exploratory investigation of the way to depict structural diversity in directed networks and how it can be utilized to predict one's online social reputation. To capture the s… ▽ More Understanding the way individuals are interconnected in social networks is of prime significance to predict their collective outcomes. Leveraging a large-scale dataset from a knowledge-sharing website, this paper presents an exploratory investigation of the way to depict structural diversity in directed networks and how it can be utilized to predict one's online social reputation. To capture the structural diversity of an individual, we first consider the number of weakly and strongly connected components in one's contact neighborhood and further take the coexposure network of social neighbors into consideration. We show empirical evidence that the structural diversity of an individual is able to provide valuable insights to predict personal online social reputation, and the inclusion of a coexposure network provides an additional ingredient to achieve that goal. After synthetically controlling several possible confounding factors through matching experiments, structural diversity still plays a nonnegligible role in the prediction of personal online social reputation. Our work constitutes one of the first attempts to empirically study structural diversity in directed networks and has practical implications for a range of domains, such as social influence and collective intelligence studies. △ Less

Submitted 28 June, 2022; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: 16 pages, 6 figures

Journal ref: Research (2021) 9831621

arXiv:1711.09408 [pdf]

How to Measure Sessions of Mobile Device Use? Quantification, Evaluation, and Applications

Authors: Jonathan J. H. Zhu, Hexin Chen, Tai-Quan Peng, Xiao Fan Liu, Haixing Dai

Abstract: Research on mobile phone use often starts with a question of "How much time users spend on using their phones?". The question involves an equal-length measure that captures the duration of mobile phone use but does not tackle the other temporal characteristics of user behavior, such as frequency, timing, and sequence. In the study, we proposed a variable-length measure called "session" to uncover… ▽ More Research on mobile phone use often starts with a question of "How much time users spend on using their phones?". The question involves an equal-length measure that captures the duration of mobile phone use but does not tackle the other temporal characteristics of user behavior, such as frequency, timing, and sequence. In the study, we proposed a variable-length measure called "session" to uncover the unmeasured temporal characteristics. We use an open source data to demonstrate how to quantify sessions, aggregate the sessions to higher units of analysis within and across users, evaluate the results, and apply the measure for theoretical or practical purposes. △ Less

Submitted 26 November, 2017; originally announced November 2017.

Comments: Preprint of forthcoming article in Mobile Media & Communication

arXiv:1601.03094 [pdf, other]

A metric for sets of trajectories that is practical and mathematically consistent

Authors: José Bento, Jia Jie Zhu

Abstract: Metrics on the space of sets of trajectories are important for scientists in the field of computer vision, machine learning, robotics, and general artificial intelligence. However, existing notions of closeness between sets of trajectories are either mathematically inconsistent or of limited practical use. In this paper, we outline the limitations in the current mathematically-consistent metrics,… ▽ More Metrics on the space of sets of trajectories are important for scientists in the field of computer vision, machine learning, robotics, and general artificial intelligence. However, existing notions of closeness between sets of trajectories are either mathematically inconsistent or of limited practical use. In this paper, we outline the limitations in the current mathematically-consistent metrics, which are based on OSPA (Schuhmacher et al. 2008); and the inconsistencies in the heuristic notions of closeness used in practice, whose main ideas are common to the CLEAR MOT measures (Keni and Rainer 2008) widely used in computer vision. In two steps, we then propose a new intuitive metric between sets of trajectories and address these limitations. First, we explain a solution that leads to a metric that is hard to compute. Then we modify this formulation to obtain a metric that is easy to compute while keeping the useful properties of the previous metric. Our notion of closeness is the first demonstrating the following three features: the metric 1) can be quickly computed, 2) incorporates confusion of trajectories' identity in an optimal way, and 3) is a metric in the mathematical sense. △ Less

Submitted 14 November, 2020; v1 submitted 12 January, 2016; originally announced January 2016.

Comments: Submitted to IEEE Transactions on Signal Processing

arXiv:1506.03932 [pdf, ps, other]

Mutual Feedback Between Epidemic Spreading and Information Diffusion

Authors: Xiu-Xiu Zhan, Chuang Liu, Ge Zhou, Zi-Ke Zhang, Gui-Quan Sun, Jonathan J. H. Zhu

Abstract: The impact that information diffusion has on epidemic spreading has recently attracted much attention. As a disease begins to spread in the population, information about the disease is transmitted to others, which in turn has an effect on the spread of disease. In this paper, using empirical results of the propagation of H7N9 and information about the disease, we clearly show that the spreading dy… ▽ More The impact that information diffusion has on epidemic spreading has recently attracted much attention. As a disease begins to spread in the population, information about the disease is transmitted to others, which in turn has an effect on the spread of disease. In this paper, using empirical results of the propagation of H7N9 and information about the disease, we clearly show that the spreading dynamics of the two-types of processes influence each other. We build a mathematical model in which both types of spreading dynamics are described using the SIS process in order to illustrate the influence of information diffusion on epidemic spreading. Both the simulation results and the pairwise analysis reveal that information diffusion can increase the threshold of an epidemic outbreak, decrease the final fraction of infected individuals and significantly decrease the rate at which the epidemic propagates. Additionally, we find that the multi-outbreak phenomena of epidemic spreading, along with the impact of information diffusion, is consistent with the empirical results. These findings highlight the requirement to maintain social awareness of diseases even when the epidemics seem to be under control in order to prevent a subsequent outbreak. These results may shed light on the in-depth understanding of the interplay between the dynamics of epidemic spreading and information diffusion. △ Less

Submitted 12 June, 2015; originally announced June 2015.

Showing 1–11 of 11 results for author: Zhu, J J