Search | arXiv e-print repository

arXiv:2509.20623 [pdf, ps, other]

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

Authors: Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav S. Sukhatme

Abstract: Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activatio… ▽ More Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies. △ Less

Submitted 24 September, 2025; originally announced September 2025.

arXiv:2404.10166 [pdf, other]

Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification

Authors: Luffina C. Huang, Darren J. Chiu, Manish Mehta

Abstract: Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models… ▽ More Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models and two TL models in treatable retinal diseases classification using small-scale Optical Coherence Tomography (OCT) images ranging from 125 to 4000 with balanced or imbalanced distribution for training. The proposed SSL model achieves the state-of-art accuracy of 98.84% using only 4,000 training images. Our results suggest the SSL models provide superior performance under both the balanced and imbalanced training scenarios. The SSL model with MoCo-v2 scheme has consistent good performance under the imbalanced scenario and, especially, surpasses the other models when the training set is less than 500 images. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08390 [pdf, ps, other]

doi 10.1007/978-3-031-70932-6_5

Collective Bayesian Decision-Making in a Swarm of Miniaturized Robots for Surface Inspection

Authors: Thiemen Siemensma, Darren Chiu, Sneha Ramshanker, Radhika Nagpal, Bahar Haghighat

Abstract: Robot swarms can effectively serve a variety of sensing and inspection applications. Certain inspection tasks require a binary classification decision. This work presents an experimental setup for a surface inspection task based on vibration sensing and studies a Bayesian two-outcome decision-making algorithm in a swarm of miniaturized wheeled robots. The robots are tasked with individually inspec… ▽ More Robot swarms can effectively serve a variety of sensing and inspection applications. Certain inspection tasks require a binary classification decision. This work presents an experimental setup for a surface inspection task based on vibration sensing and studies a Bayesian two-outcome decision-making algorithm in a swarm of miniaturized wheeled robots. The robots are tasked with individually inspecting and collectively classifying a 1mx1m tiled surface consisting of vibrating and non-vibrating tiles based on the majority type of tiles. The robots sense vibrations using onboard IMUs and perform collision avoidance using a set of IR sensors. We develop a simulation and optimization framework leveraging the Webots robotic simulator and a Particle Swarm Optimization (PSO) method. We consider two existing information sharing strategies and propose a new one that allows the swarm to rapidly reach accurate classification decisions. We first find optimal parameters that allow efficient sampling in simulation and then evaluate our proposed strategy against the two existing ones using 100 randomized simulation and 10 real experiments. We find that our proposed method compels the swarm to make decisions at an accelerated rate, with an improvement of up to 20.52% in mean decision time at only 0.78% loss in accuracy. △ Less

Submitted 10 July, 2025; v1 submitted 12 April, 2024; originally announced April 2024.

Journal ref: In: Hamann, H., et al. (eds) Swarm Intelligence. ANTS 2024, Lecture Notes in Computer Science, vol. 14987, Springer, Cham, 2024, pp. 57-70

arXiv:2404.05249 [pdf, other]

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Authors: Yusuf Umut Ciftci, Darren Chiu, Zeyuan Feng, Gaurav S. Sukhatme, Somil Bansal

Abstract: Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversa… ▽ More Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher. See our website here: https://y-u-c.github.io/safegil/ △ Less

Submitted 18 November, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2310.03172 [pdf, other]

Optimization and Evaluation of Multi Robot Surface Inspection Through Particle Swarm Optimization

Authors: Darren Chiu, Radhika Nagpal, Bahar Haghighat

Abstract: Robot swarms can be tasked with a variety of automated sensing and inspection applications in aerial, aquatic, and surface environments. In this paper, we study a simplified two-outcome surface inspection task. We task a group of robots to inspect and collectively classify a 2D surface section based on a binary pattern projected on the surface. We use a decentralized Bayesian decision-making algor… ▽ More Robot swarms can be tasked with a variety of automated sensing and inspection applications in aerial, aquatic, and surface environments. In this paper, we study a simplified two-outcome surface inspection task. We task a group of robots to inspect and collectively classify a 2D surface section based on a binary pattern projected on the surface. We use a decentralized Bayesian decision-making algorithm and deploy a swarm of miniature 3-cm sized wheeled robots to inspect randomized black and white tiles of $1m\times 1m$. We first describe the model parameters that characterize our simulated environment, the robot swarm, and the inspection algorithm. We then employ a noise-resistant heuristic optimization scheme based on the Particle Swarm Optimization (PSO) using a fitness evaluation that combines decision accuracy and decision time. We use our fitness measure definition to asses the optimized parameters through 100 randomized simulations that vary surface pattern and initial robot poses. The optimized algorithm parameters show up to a 55% improvement in median of fitness evaluations against an empirically chosen parameter set. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 6 pages, 8 figures

arXiv:2009.05266 [pdf, other]

GTEA: Inductive Representation Learning on Temporal Interaction Graphs via Temporal Edge Aggregation

Authors: Siyue Xie, Yiming Li, Da Sun Handason Tam, Xiaxin Liu, Qiu Fang Ying, Wing Cheong Lau, Dah Ming Chiu, Shou Zhi Chen

Abstract: In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence mode… ▽ More In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence model with a time encoder to learn pairwise interactional dynamics between two adjacent nodes.This helps capture complex temporal interactional patterns of a node pair along the history, which generates edge embeddings that can be fed into a GNN backbone. By aggregating features of neighboring nodes and the corresponding edge embeddings, GTEA jointly learns both topological and temporal dependencies of a TIG. In addition, a sparsity-inducing self-attention scheme is incorporated for neighbor aggregation, which highlights more important neighbors and suppresses trivial noises for GTEA. By jointly optimizing the sequence model and the GNN backbone, GTEA learns more comprehensive node representations capturing both temporal and graph structural characteristics. Extensive experiments on five large-scale real-world datasets demonstrate the superiority of GTEA over other inductive models. △ Less

Submitted 3 May, 2023; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: accepted by PAKDD2023

arXiv:2006.07737 [pdf, other]

Generalization by Recognizing Confusion

Authors: Daniel Chiu, Franklyn Wang, Scott Duke Kominers

Abstract: A recently-proposed technique called self-adaptive training augments modern neural networks by allowing them to adjust training labels on the fly, to avoid overfitting to samples that may be mislabeled or otherwise non-representative. By combining the self-adaptive objective with mixup, we further improve the accuracy of self-adaptive models for image recognition; the resulting classifier obtains… ▽ More A recently-proposed technique called self-adaptive training augments modern neural networks by allowing them to adjust training labels on the fly, to avoid overfitting to samples that may be mislabeled or otherwise non-representative. By combining the self-adaptive objective with mixup, we further improve the accuracy of self-adaptive models for image recognition; the resulting classifier obtains state-of-the-art accuracies on datasets corrupted with label noise. Robustness to label noise implies a lower generalization gap; thus, our approach also leads to improved generalizability. We find evidence that the Rademacher complexity of these algorithms is low, suggesting a new path towards provable generalization for this type of deep learning model. Last, we highlight a novel connection between difficulties accounting for rare classes and robustness under noise, as rare classes are in a sense indistinguishable from label noise. Our code can be found at https://github.com/Tuxianeer/generalizationconfusion. △ Less

Submitted 13 June, 2020; originally announced June 2020.

Comments: 12 pages, 3 tables, 2 figures

arXiv:1906.05546 [pdf, ps, other]

Identifying Illicit Accounts in Large Scale E-payment Networks -- A Graph Representation Learning Approach

Authors: Da Sun Handason Tam, Wing Cheong Lau, Bin Hu, Qiu Fang Ying, Dah Ming Chiu, Hong Liu

Abstract: Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we… ▽ More Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we propose an end-to-end Graph Convolution Network (GCN)-based algorithm to learn the embeddings of the nodes and edges of a large-scale time-evolving graph. In the context of e-payment transaction graphs, the resultant node and edge embeddings can effectively characterize the user-background as well as the financial transaction patterns of individual account holders. As such, we can use the graph embedding results to drive downstream graph mining tasks such as node-classification to identify illicit accounts within the payment networks. Our algorithm outperforms state-of-the-art schemes including GraphSAGE, Gradient Boosting Decision Tree and Random Forest to deliver considerably higher accuracy (94.62% and 86.98% respectively) in classifying user accounts within 2 practical e-payment transaction datasets. It also achieves outstanding accuracy (97.43%) for another biomedical entity identification task while using only edge-related information. △ Less

Submitted 13 June, 2019; originally announced June 2019.

arXiv:1709.07130 [pdf, ps, other]

Modeling and Quantifying the Forces Driving Online Video Popularity Evolution

Authors: Jiqiang Wu, Yipeng Zhou, Dah Ming Chiu

Abstract: Video popularity is an essential reference for optimizing resource allocation and video recommendation in online video services. However, there is still no convincing model that can accurately depict a video's popularity evolution. In this paper, we propose a dynamic popularity model by modeling the video information diffusion process driven by various forms of recommendation. Through fitting the… ▽ More Video popularity is an essential reference for optimizing resource allocation and video recommendation in online video services. However, there is still no convincing model that can accurately depict a video's popularity evolution. In this paper, we propose a dynamic popularity model by modeling the video information diffusion process driven by various forms of recommendation. Through fitting the model with real traces collected from a practical system, we can quantify the strengths of the recommendation forces. Such quantification can lead to characterizing video popularity patterns, user behaviors and recommendation strategies, which is illustrated by a case study of TV episodes. △ Less

Submitted 20 September, 2017; originally announced September 2017.

Comments: 6 pages, 3 figures

arXiv:1603.02175 [pdf, ps, other]

Who are Like-minded: Mining User Interest Similarity in Online Social Networks

Authors: Chunfeng Yang, Yipeng Zhou, Dah Ming Chiu

Abstract: In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justi… ▽ More In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justify why it is used instead of video-based methods, or many latent topic models such as LDA and Collaborative Filtering approaches. We then show the effectiveness of the different demographic and social features, and their combinations and derivatives, in predicting user interest similarity, based on different machine-learning methods for combining multiple features. We propose a hybrid tree-encoded linear model for combining the features, and show that it out-performs other linear and treebased models. Our methods can be used to predict user interest similarity when the ground-truth is not available, e.g. for new users, or inactive users whose interests may have changed from old access data, and is useful for video recommendation. Our study is based on a rich dataset from Tencent, a popular service provider of social networks, video services, and various other services in China. △ Less

Submitted 7 March, 2016; originally announced March 2016.

arXiv:1507.02132 [pdf, other]

doi 10.1145/2663492

Economic Viability of Paris Metro Pricing for Digital Services

Authors: Chi-Kin Chau, Qian Wang, Dah-Ming Chiu

Abstract: Nowadays digital services, such as cloud computing and network access services, allow dynamic resource allocation and virtual resource isolation. This trend can create a new paradigm of flexible pricing schemes. A simple pricing scheme is to allocate multiple isolated service classes with differentiated prices, namely Paris Metro Pricing (PMP). The benefits of PMP are its simplicity and applicabil… ▽ More Nowadays digital services, such as cloud computing and network access services, allow dynamic resource allocation and virtual resource isolation. This trend can create a new paradigm of flexible pricing schemes. A simple pricing scheme is to allocate multiple isolated service classes with differentiated prices, namely Paris Metro Pricing (PMP). The benefits of PMP are its simplicity and applicability to a wide variety of general digital services, without considering specific performance guarantees for different service classes. The central issue of our study is whether PMP is economically viable, namely whether it will produce more profit for the service provider and whether it will achieve more social welfare. Prior studies had only considered specific models and arrived at conflicting conclusions. In this article, we identify unifying principles in a general setting and derive general sufficient conditions that can guarantee the viability of PMP. We further apply the results to analyze various examples of digital services. △ Less

Submitted 7 July, 2015; originally announced July 2015.

Comments: This paper appears in ACM Transactions on Internet Technology (ToIT), Special Issue on Pricing and Incentives in Networks and Systems, Vol. 14, No. 12, Issue 2-3, pp12:1-12:21, Oct 2014. A preliminary version has been presented at IEEE INFOCOM 2010. in C-K Chau (2014)

Journal ref: ACM Transactions on Internet Technology, Special Issue on Pricing and Incentives in Networks and Systems, Vol. 14, No. 12, Issue 2-3, pp12:1-12:21, Oct 2014

arXiv:1503.08312 [pdf]

A Population Model for the Academic Ecosystem

Authors: Yan Wu, Srinivasan Venkatramanan, Dah Ming Chiu

Abstract: In recent times, the academic ecosystem has seen a tremendous growth in number of authors and publications. While most temporal studies in this area focus on evolution of co-author and citation network structure, this systemic inflation has received very little attention. In this paper, we address this issue by proposing a population model for academia, derived from publication records in the Comp… ▽ More In recent times, the academic ecosystem has seen a tremendous growth in number of authors and publications. While most temporal studies in this area focus on evolution of co-author and citation network structure, this systemic inflation has received very little attention. In this paper, we address this issue by proposing a population model for academia, derived from publication records in the Computer Science domain. We use a generalized branching process as an overarching framework, which enables us to describe the evolution and composition of the research community in a systematic manner. Further, the observed patterns allow us to shed light on researchers' lifecycle encompassing arrival, academic life expectancy, activity, productivity and offspring distribution in the ecosystem. We believe such a study will help develop better bibliometric indices which account for the inflation, and also provide insights into sustainable and efficient resource management for academia. △ Less

Submitted 28 March, 2015; originally announced March 2015.

arXiv:1502.00523 [pdf, ps, other]

Modeling and Analysis of Scholar Mobility on Scientific Landscape

Authors: Qiu Fang Ying, Srinivasan Venkatramanan, Dah Ming Chiu

Abstract: Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science followi… ▽ More Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science following different strategies, seeking different utilities. We use this model to study whether strategies widely used in current scientific community can provide a balance between individual scientific success and the efficiency and diversity of the whole academic society. Through extensive simulations, we provide insights into the roles of different strategies, such as choosing topics according to research potential or the popularity. Our model provides a conceptual framework and a computational approach to analyze scholars' behavior and its impact on scientific production. We also discuss how such an agent-based modeling approach can be integrated with big real-world scholarly data. △ Less

Submitted 10 March, 2015; v1 submitted 2 February, 2015; originally announced February 2015.

Comments: To appear in BigScholar, WWW 2015

arXiv:1501.04038 [pdf, other]

A Backend Framework for the Efficient Management of Power System Measurements

Authors: Ben McCamish, Rich Meier, Jordan Landford, Robert Bass, Eduardo Cotilla-Sanchez, David Chiu

Abstract: Increased adoption and deployment of phasor measurement units (PMU) has provided valuable fine-grained data over the grid. Analysis over these data can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data. This paper describes a PMU data management… ▽ More Increased adoption and deployment of phasor measurement units (PMU) has provided valuable fine-grained data over the grid. Analysis over these data can provide insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data. This paper describes a PMU data management system that supports input from multiple PMU data streams, features an event-detection algorithm, and provides an efficient method for retrieving archival data. The event-detection algorithm rapidly correlates multiple PMU data streams, providing details on events occurring within the power system. The event-detection algorithm feeds into a visualization component, allowing operators to recognize events as they occur. The indexing and data retrieval mechanism facilitates fast access to archived PMU data. Using this method, we achieved over 30x speedup for queries with high selectivity. With the development of these two components, we have developed a system that allows efficient analysis of multiple time-aligned PMU data streams. △ Less

Submitted 25 May, 2016; v1 submitted 16 December, 2014; originally announced January 2015.

Comments: Published in Electric Power Systems Research (2016), not available yet

arXiv:1412.2326 [pdf, ps, other]

Modeling Dynamics of Online Video Popularity

Authors: Jiqiang Wu, Yipeng Zhou, Dah Ming Chiu, Youwei Hua, Zirong Zhu

Abstract: Large Internet video delivery systems serve millions of videos to tens of millions of users on daily basis, via Video-on-Demand and live streaming. Video popularity evolves over time. It represents the workload, as welll as business value, of the video to the overall system. The ability to predict video popularity is very helpful for improving service quality and operating efficiency. Previous stu… ▽ More Large Internet video delivery systems serve millions of videos to tens of millions of users on daily basis, via Video-on-Demand and live streaming. Video popularity evolves over time. It represents the workload, as welll as business value, of the video to the overall system. The ability to predict video popularity is very helpful for improving service quality and operating efficiency. Previous studies adopted simple models for video popularity, or directly adopted patterns from measurement studies. In this paper, we develop a stochastic fluid model that tries to capture two hidden processes that give rise to different patterns of a given video's popularity evolution: the information spreading process, and the user reaction process. Specifically, these processes model how the video is recommended to the user, the videos inherent attractiveness, and users reaction rate, and yield specific popularity evolution patterns. We then validate our model by matching the predictions of the model with observed patterns from our collaborator, a large content provider in China. This model thus gives us the insight to explain the common and different video popularity evolution patterns and why. △ Less

Submitted 7 December, 2014; originally announced December 2014.

Comments: 9 pages, technical report

arXiv:1411.0778 [pdf, other]

Detecting Suicidal Ideation in Chinese Microblogs with Psychological Lexicons

Authors: Xiaolei Huang, Lei Zhang, Tianli Liu, David Chiu, Tingshao Zhu, Xin Li

Abstract: Suicide is among the leading causes of death in China. However, technical approaches toward preventing suicide are challenging and remaining under development. Recently, several actual suicidal cases were preceded by users who posted microblogs with suicidal ideation to Sina Weibo, a Chinese social media network akin to Twitter. It would therefore be desirable to detect suicidal ideations from mic… ▽ More Suicide is among the leading causes of death in China. However, technical approaches toward preventing suicide are challenging and remaining under development. Recently, several actual suicidal cases were preceded by users who posted microblogs with suicidal ideation to Sina Weibo, a Chinese social media network akin to Twitter. It would therefore be desirable to detect suicidal ideations from microblogs in real-time, and immediately alert appropriate support groups, which may lead to successful prevention. In this paper, we propose a real-time suicidal ideation detection system deployed over Weibo, using machine learning and known psychological techniques. Currently, we have identified 53 known suicidal cases who posted suicide notes on Weibo prior to their deaths.We explore linguistic features of these known cases using a psychological lexicon dictionary, and train an effective suicidal Weibo post detection model. 6714 tagged posts and several classifiers are used to verify the model. By combining both machine learning and psychological knowledge, SVM classifier has the best performance of different classifiers, yielding an F-measure of 68:3%, a Precision of 78:9%, and a Recall of 60:3%. △ Less

Submitted 3 November, 2014; originally announced November 2014.

Comments: 6 pages

arXiv:1312.5050 [pdf, ps, other]

Fake View Analytics in Online Video Services

Authors: Liang Chen, Yipeng Zhou, Dah Ming Chiu

Abstract: Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? C… ▽ More Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem. △ Less

Submitted 18 December, 2013; originally announced December 2013.

Comments: 25 pages, 15 figures

arXiv:1307.4581 [pdf, ps, other]

Smart Streaming for Online Video Services

Authors: Liang Chen, Yipeng Zhou, Dah Ming Chiu

Abstract: Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure… ▽ More Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure behavior can be predicted, we develop smart streaming that can be used to improve user QoE with limited server bandwidth or save bandwidth cost with unlimited server bandwidth. Through measurement, we extract certain user behavior properties for implementing such smart streaming, and demonstrate its advantage using prototype implementation as well as simulations. △ Less

Submitted 9 April, 2014; v1 submitted 17 July, 2013; originally announced July 2013.

Comments: This paper has been updated after checking the possible issues

arXiv:1306.4623 [pdf, ps, other]

The Academic Social Network

Authors: Tom Z. J. Fu, Qianqian Song, Dah Ming Chiu

Abstract: Through academic publications, the authors of these publications form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors pick co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Libra and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type… ▽ More Through academic publications, the authors of these publications form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors pick co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Libra and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to find out, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities -authors, publication venues and institutions. We go beyond traditional metrics such as paper counts, citations and h-index. Specifically, we define metrics such as influence, connections and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his/her connections by co-authoring with other authors, and specially from other authors with high connections. An author receives exposure by publishing in selective venues where publications received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors' rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors. △ Less

Submitted 17 February, 2014; v1 submitted 19 June, 2013; originally announced June 2013.

Comments: A number of modifications have been made according to the reviewer's comments

arXiv:1306.4066 [pdf, ps, other]

MYE: Missing Year Estimation in Academic Social Networks

Authors: Tom Z. J. Fu, Qiufang Ying, Dah Ming Chiu

Abstract: In bibliometrics studies, a common challenge is how to deal with incorrect or incomplete data. However, given a large volume of data, there often exists certain relationships between the data items that can allow us to recover missing data items and correct erroneous data. In this paper, we study a particular problem of this sort - estimating the missing year information associated with publicatio… ▽ More In bibliometrics studies, a common challenge is how to deal with incorrect or incomplete data. However, given a large volume of data, there often exists certain relationships between the data items that can allow us to recover missing data items and correct erroneous data. In this paper, we study a particular problem of this sort - estimating the missing year information associated with publications (and hence authors' years of active publication). We first propose a simple algorithm that only makes use of the "direct" information, such as paper citation/reference relationships or paper-author relationships. The result of this simple algorithm is used as a benchmark for comparison. Our goal is to develop algorithms that increase both the coverage (the percentage of missing year papers recovered) and accuracy (mean absolute error of the estimated year to the real year). We propose some advanced algorithms that extend inference by information propagation. For each algorithm, we propose three versions according to the given academic social network type: a) Homogeneous (only contains paper citation links), b) Bipartite (only contains paper-author relations), and, c) Heterogeneous (both paper citation and paper-author relations). We carry out experiments on the three public data sets (MSR Libra, DBLP and APS), and evaluated by applying the K-fold cross validation method. We show that the advanced algorithms can improve both coverage and accuracy. △ Less

Submitted 22 July, 2014; v1 submitted 18 June, 2013; originally announced June 2013.

Comments: Some typos are corrected

arXiv:1108.6293 [pdf]

Buffer Map Message Compression Based on Relevant Window in P2P Streaming Media System

Authors: Chunxi Li, Changjia Chen, DahMing Chiu

Abstract: Popular peer to peer streaming media systems such as PPLive and UUSee rely on periodic buffer-map exchange between peers for proper operation. The buffer-map exchange contains redundant information which causes non-negligible overhead. In this paper we present a theoretical framework to study how the overhead can be lowered. Differentiating from the traditional data compression approach, we do not… ▽ More Popular peer to peer streaming media systems such as PPLive and UUSee rely on periodic buffer-map exchange between peers for proper operation. The buffer-map exchange contains redundant information which causes non-negligible overhead. In this paper we present a theoretical framework to study how the overhead can be lowered. Differentiating from the traditional data compression approach, we do not treat each buffer-map as an isolated data block, but consider the correlations between the sequentially exchanged buffer-maps. Under this framework, two buffer-map compression schemes are proposed and the correctness of the schemes is proved mathematically. Moreover, we derive the theoretical limit of compression gain based on probability theory and information theory. Based on the system parameters of UUSee (a popular P2P streaming platform), our simulations show that the buffer-map sizes are reduced by 86% and 90% (from 456 bits down to only 66 bits and 46 bits) respectively after applying our schemes. Furthermore, by combining with the traditional compression methods (on individual blocks), the sizes are decreased by 91% and 95% (to 42 bits and 24 bits) respectively. Our study provides a guideline for developing practical compression algorithms. △ Less

Submitted 28 September, 2011; v1 submitted 31 August, 2011; originally announced August 2011.

Comments: 12 pages,5 figures

arXiv:1108.6290 [pdf]

Compression and Quantitative Analysis of Buffer Map Message in P2P Streaming System

Authors: Chunxi Li, Changjia Chen, DahMing Chiu

Abstract: BM compression is a straightforward and operable way to reduce buffer message length as well as to improve system performance. In this paper, we thoroughly discuss the principles and protocol progress of different compression schemes, and for the first time present an original compression scheme which can nearly remove all redundant information from buffer message. Theoretical limit of compression… ▽ More BM compression is a straightforward and operable way to reduce buffer message length as well as to improve system performance. In this paper, we thoroughly discuss the principles and protocol progress of different compression schemes, and for the first time present an original compression scheme which can nearly remove all redundant information from buffer message. Theoretical limit of compression rates are deduced in the theory of information. Through the analysis of information content and simulation with our measured BM trace of UUSee, the validity and superiority of our compression scheme are validated in term of compression ratio. △ Less

Submitted 1 September, 2011; v1 submitted 31 August, 2011; originally announced August 2011.

Comments: 13pages,12 figures

arXiv:1106.1282 [pdf, ps, other]

Exploring Network Economics

Authors: Dah Ming Chiu, Wai Yin Ng

Abstract: In this paper, we explore what \emph{network economics} is all about, focusing on the interesting topics brought about by the Internet. Our intent is make this a brief survey, useful as an outline for a course on this topic, with an extended list of references. We try to make it as intuitive and readable as possible. We also deliberately try to be critical at times, and hope our interpretation of… ▽ More In this paper, we explore what \emph{network economics} is all about, focusing on the interesting topics brought about by the Internet. Our intent is make this a brief survey, useful as an outline for a course on this topic, with an extended list of references. We try to make it as intuitive and readable as possible. We also deliberately try to be critical at times, and hope our interpretation of the topic will lead to interests for further discussions by those doing research in the same field. △ Less

Submitted 7 June, 2011; originally announced June 2011.

Comments: It is a position paper, about what we might teach in a Network Economics course, and the type of research we found useful. Therefore, it is not an extensive survey paper

arXiv:1011.1135 [pdf, ps, other]

Reciprocating Preferences Stablize Matching: College Admissions Revisited

Authors: Jian Liu, Dah Ming Chiu

Abstract: In considering the college admissions problem, almost fifty years ago, Gale and Shapley came up with a simple abstraction based on preferences of students and colleges. They introduced the concept of stability and optimality; and proposed the deferred acceptance (DA) algorithm that is proven to lead to a stable and optimal solution. This algorithm is simple and computationally efficient. Furthermo… ▽ More In considering the college admissions problem, almost fifty years ago, Gale and Shapley came up with a simple abstraction based on preferences of students and colleges. They introduced the concept of stability and optimality; and proposed the deferred acceptance (DA) algorithm that is proven to lead to a stable and optimal solution. This algorithm is simple and computationally efficient. Furthermore, in subsequent studies it is shown that the DA algorithm is also strategy-proof, which means, when the algorithm is played out as a mechanism for matching two sides (e.g. colleges and students), the parties (colleges or students) have no incentives to act other than according to their true preferences. Yet, in practical college admission systems, the DA algorithm is often not adopted. Instead, an algorithm known as the Boston Mechanism (BM) or its variants are widely adopted. In BM, colleges accept students without deferral (considering other colleges' decisions), which is exactly the opposite of Gale-Shapley's DA algorithm. To explain and rationalize this reality, we introduce the notion of reciprocating preference to capture the influence of a student's interest on a college's decision. This model is inspired by the actual mechanism used to match students to universities in Hong Kong. The notion of reciprocating preference defines a class of matching algorithms, allowing different degrees of reciprocating preferences by the students and colleges. DA and BM are but two extreme cases (with zero and a hundred percent reciprocation) of this set. This model extends the notion of stability and optimality as well. As in Gale-Shapley's original paper, we discuss how the analogy can be carried over to the stable marriage problem, thus demonstrating the model's general applicability. △ Less

Submitted 3 May, 2011; v1 submitted 4 November, 2010; originally announced November 2010.

arXiv:1006.1019 [pdf, ps, other]

Mathematical Modeling of Competition in Sponsored Search Market

Authors: Jian Liu, Dah Ming Chiu

Abstract: Sponsored search mechanisms have drawn much attention from both academic community and industry in recent years since the seminal papers of [13] and [14]. However, most of the existing literature concentrates on the mechanism design and analysis within the scope of only one search engine in the market. In this paper we propose a mathematical framework for modeling the interaction of publishers, ad… ▽ More Sponsored search mechanisms have drawn much attention from both academic community and industry in recent years since the seminal papers of [13] and [14]. However, most of the existing literature concentrates on the mechanism design and analysis within the scope of only one search engine in the market. In this paper we propose a mathematical framework for modeling the interaction of publishers, advertisers and end users in a competitive market. We first consider the monopoly market model and provide optimal solutions for both ex ante and ex post cases, which represents the long-term and short-term revenues of search engines respectively. We then analyze the strategic behaviors of end users and advertisers under duopoly and prove the existence of equilibrium for both search engines to co-exist from ex-post perspective. To show the more general ex ante results, we carry out extensive simulations under different parameter settings. Our analysis and observation in this work can provide useful insight in regulating the sponsored search market and protecting the interests of advertisers and end users. △ Less

Submitted 31 August, 2010; v1 submitted 5 June, 2010; originally announced June 2010.

Comments: A short version would appear at 2010 Workshop on the Economics of Networks, Systems, and Computation (NetEcon '10)

arXiv:0903.3278 [pdf, ps, other]

doi 10.1016/j.comnet.2009.11.018

On Oligopoly Spectrum Allocation Game in Cognitive Radio Networks with Capacity Constraints

Authors: Yuedong Xu, John C. S. Lui, Dah-Ming Chiu

Abstract: Dynamic spectrum sharing is a promising technology to improve spectrum utilization in the future wireless networks. The flexible spectrum management provides new opportunities for licensed primary user and unlicensed secondary users to reallocate the spectrum resource efficiently. In this paper, we present an oligopoly pricing framework for dynamic spectrum allocation in which the primary users… ▽ More Dynamic spectrum sharing is a promising technology to improve spectrum utilization in the future wireless networks. The flexible spectrum management provides new opportunities for licensed primary user and unlicensed secondary users to reallocate the spectrum resource efficiently. In this paper, we present an oligopoly pricing framework for dynamic spectrum allocation in which the primary users sell excessive spectrum to the secondary users for monetary return. We present two approaches, the strict constraints (type-I) and the QoS penalty (type-II), to model the realistic situation that the primary users have limited capacities. In the oligopoly model with strict constraints, we propose a low-complexity searching method to obtain the Nash Equilibrium and prove its uniqueness. When reduced to a duopoly game, we analytically show the interesting gaps in the leader-follower pricing strategy. In the QoS penalty based oligopoly model, a novel variable transformation method is developed to derive the unique Nash Equilibrium. When the market information is limited, we provide three myopically optimal algorithms "StrictBEST", "StrictBR" and "QoSBEST" that enable price adjustment for duopoly primary users based on the Best Response Function (BRF) and the bounded rationality (BR) principles. Numerical results validate the effectiveness of our analysis and demonstrate the fast convergence of "StrictBEST" as well as "QoSBEST" to the Nash Equilibrium. For the "StrictBR" algorithm, we reveal the chaotic behaviors of dynamic price adaptation in response to the learning rates. △ Less

Submitted 15 June, 2009; v1 submitted 19 March, 2009; originally announced March 2009.

Comments: 40 pages, 22 figures

Journal ref: Elsevier, Computer Networks, 2010

arXiv:0806.3215 [pdf]

MOHCS: Towards Mining Overlapping Highly Connected Subgraphs

Authors: Xiahong Lin, Lin Gao, Kefei Chen, David K. Y. Chiu

Abstract: Many networks in real-life typically contain parts in which some nodes are more highly connected to each other than the other nodes of the network. The collection of such nodes are usually called clusters, communities, cohesive groups or modules. In graph terminology, it is called highly connected graph. In this paper, we first prove some properties related to highly connected graph. Based on th… ▽ More Many networks in real-life typically contain parts in which some nodes are more highly connected to each other than the other nodes of the network. The collection of such nodes are usually called clusters, communities, cohesive groups or modules. In graph terminology, it is called highly connected graph. In this paper, we first prove some properties related to highly connected graph. Based on these properties, we then redefine the highly connected subgraph which results in an algorithm that determines whether a given graph is highly connected in linear time. Then we present a computationally efficient algorithm, called MOHCS, for mining overlapping highly connected subgraphs. We have evaluated experimentally the performance of MOHCS using real and synthetic data sets from computer-generated graph and yeast protein network. Our results show that MOHCS is effective and reliable in finding overlapping highly connected subgraphs. Keywords-component; Highly connected subgraph, clustering algorithms, minimum cut, minimum degree △ Less

Submitted 19 June, 2008; originally announced June 2008.

arXiv:0801.4592 [pdf, ps, other]

doi 10.1109/T-WC.2009.080142

Understanding the Paradoxical Effects of Power Control on the Capacity of Wireless Networks

Authors: Yue Wang, John C. S. Lui, Dah-Ming Chiu

Abstract: Recent works show conflicting results: network capacity may increase or decrease with higher transmission power under different scenarios. In this work, we want to understand this paradox. Specifically, we address the following questions: (1)Theoretically, should we increase or decrease transmission power to maximize network capacity? (2) Theoretically, how much network capacity gain can we achi… ▽ More Recent works show conflicting results: network capacity may increase or decrease with higher transmission power under different scenarios. In this work, we want to understand this paradox. Specifically, we address the following questions: (1)Theoretically, should we increase or decrease transmission power to maximize network capacity? (2) Theoretically, how much network capacity gain can we achieve by power control? (3) Under realistic situations, how do power control, link scheduling and routing interact with each other? Under which scenarios can we expect a large capacity gain by using higher transmission power? To answer these questions, firstly, we prove that the optimal network capacity is a non-decreasing function of transmission power. Secondly, we prove that the optimal network capacity can be increased unlimitedly by higher transmission power in some network configurations. However, when nodes are distributed uniformly, the gain of optimal network capacity by higher transmission power is upper-bounded by a positive constant. Thirdly, we discuss why network capacity in practice may increase or decrease with higher transmission power under different scenarios using carrier sensing and the minimum hop-count routing. Extensive simulations are carried out to verify our analysis. △ Less

Submitted 25 September, 2008; v1 submitted 29 January, 2008; originally announced January 2008.

Comments: I refined the previous version in many places, including the title. to appear in IEEE Transactions on Wireless Communications

arXiv:cs/0509052 [pdf, ps, other]

Club Formation by Rational Sharing : Content, Viability and Community Structure

Authors: W. -Y. Ng, D. M. Chiu, W. K. Lin

Abstract: A sharing community prospers when participation and contribution are both high. We suggest the two, while being related decisions every peer makes, should be given separate rational bases. Considered as such, a basic issue is the viability of club formation, which necessitates the modelling of two major sources of heterogeneity, namely, peers and shared content. This viability perspective clearl… ▽ More A sharing community prospers when participation and contribution are both high. We suggest the two, while being related decisions every peer makes, should be given separate rational bases. Considered as such, a basic issue is the viability of club formation, which necessitates the modelling of two major sources of heterogeneity, namely, peers and shared content. This viability perspective clearly explains why rational peers contribute (or free-ride when they don't) and how their collective action determines viability as well as the size of the club formed. It also exposes another fundamental source of limitation to club formation apart from free-riding, in the community structure in terms of the relation between peers' interest (demand) and sharing (supply). △ Less

Submitted 18 September, 2005; originally announced September 2005.

Comments: accepted in WINE2005, Hong Kong, December 15-17, 2005

arXiv:cs/0503075 [pdf, ps, other]

Statistical Modelling of Information Sharing: Community, Membership and Content

Authors: W. -Y. Ng, W. K. Lin, D. M. Chiu

Abstract: File-sharing systems, like many online and traditional information sharing communities (e.g. newsgroups, BBS, forums, interest clubs), are dynamical systems in nature. As peers get in and out of the system, the information content made available by the prevailing membership varies continually in amount as well as composition, which in turn affects all peers' join/leave decisions. As a result, th… ▽ More File-sharing systems, like many online and traditional information sharing communities (e.g. newsgroups, BBS, forums, interest clubs), are dynamical systems in nature. As peers get in and out of the system, the information content made available by the prevailing membership varies continually in amount as well as composition, which in turn affects all peers' join/leave decisions. As a result, the dynamics of membership and information content are strongly coupled, suggesting interesting issues about growth, sustenance and stability. In this paper, we propose to study such communities with a simple statistical model of an information sharing club. Carrying their private payloads of information goods as potential supply to the club, peers join or leave on the basis of whether the information they demand is currently available. Information goods are chunked and typed, as in a file sharing system where peers contribute different files, or a forum where messages are grouped by topics or threads. Peers' demand and supply are then characterized by statistical distributions over the type domain. This model reveals interesting critical behaviour with multiple equilibria. A sharp growth threshold is derived: the club may grow towards a sustainable equilibrium only if the value of an order parameter is above the threshold, or shrink to emptiness otherwise. The order parameter is composite and comprises the peer population size, the level of their contributed supply, the club's efficiency in information search, the spread of supply and demand over the type domain, as well as the goodness of match between them. △ Less

Submitted 1 July, 2005; v1 submitted 28 March, 2005; originally announced March 2005.

Comments: accepted in International Symposium on Computer Performance, Modeling, Measurements and Evaluation, Juan-les-Pins, France, October-2005

arXiv:cs/9809099 [pdf]

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

Authors: R. Jain, D. Chiu, W. Hawe

Abstract: Fairness is an important performance criterion in all resource allocation schemes, including those in distributed computer systems. However, it is often specified only qualitatively. The quantitative measures proposed in the literature are either too specific to a particular application, or suffer from some undesirable characteristics. In this paper, we have introduced a quantitative measure cal… ▽ More Fairness is an important performance criterion in all resource allocation schemes, including those in distributed computer systems. However, it is often specified only qualitatively. The quantitative measures proposed in the literature are either too specific to a particular application, or suffer from some undesirable characteristics. In this paper, we have introduced a quantitative measure called Indiex of FRairness. The index is applicable to any resource sharing or allocation problem. It is independent of the amount of the resource. The fairness index always lies between 0 and 1. This boundedness aids intuitive understanding of the fairness index. For example, a distribution algorithm with a fairness of 0.10 means that it is unfair to 90% of the users. Also, the discrimination index can be defined as 1 - fairness index. △ Less

Submitted 24 September, 1998; originally announced September 1998.

Comments: DEC Research Report TR-301, September 1984

Report number: TR-301 ACM Class: C.2.1

arXiv:cs/9809094 [pdf]

Congestion Avoidance in Computer Networks with a Connectionless Network Layer

Authors: R. Jain, K. Ramakrishnan, D. Chiu

Abstract: Widespread use of computer networks and the use of varied technology for the interconnection of computers has made congestion a significant problem. In this report, we summarize our research on congestion avoidance. We compare the concept of congestion avoidance with that of congestion control. Briefly, congestion control is a recovery mechanism, while congestion avoidance is a prevention me… ▽ More Widespread use of computer networks and the use of varied technology for the interconnection of computers has made congestion a significant problem. In this report, we summarize our research on congestion avoidance. We compare the concept of congestion avoidance with that of congestion control. Briefly, congestion control is a recovery mechanism, while congestion avoidance is a prevention mechanism. A congestion control scheme helps the network to recover from the congestion state while a congestion avoidance scheme allows a network to operate in the region of low delay and high throughput with minimal queuing, thereby preventing it from entering the congested state in which packets are lost due to buffer shortage. A number of possible alternatives for congestion avoidance were identified. From these alternatives we selected one called the binary feedback scheme in which the network uses a single bit in the network layer header to feed back the congestion information to its users, which then increase or decrease their load to make optimal use of the resources. The concept of global optimality in a distributed system is defined in terms of efficiency and fairness such that they can be independently quantified and apply to any number of resources and users. The proposed scheme has been simulated and shown to be globally efficient, fair, responsive, convergent, robust, distributed, and configuration-independent. △ Less

Submitted 24 September, 1998; originally announced September 1998.

Comments: DEC-TR-506, reprinted in C. Partridge, Ed., "Innovations in Internetworking," published by Artech House, October 1988

ACM Class: C.2.1

Showing 1–32 of 32 results for author: Chiu, D