-
Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism
Authors:
Ruichu Cai,
Kaitao Zheng,
Junxian Huang,
Zijian Li,
Zhengming Chen,
Boyan Xu,
Zhifeng Hao
Abstract:
Time series imputation is one of the most challenge problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the generation process from the observed time series data. In real-world scenarios, different types of missing mechanisms, like MAR (Missing At Random), and MNAR (Missing No…
▽ More
Time series imputation is one of the most challenge problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the generation process from the observed time series data. In real-world scenarios, different types of missing mechanisms, like MAR (Missing At Random), and MNAR (Missing Not At Random) can occur in time series data. However, existing methods often overlook the difference among the aforementioned missing mechanisms and use a single model for time series imputation, which can easily lead to misleading results due to mechanism mismatching. In this paper, we propose a framework for time series imputation problem by exploring Different Missing Mechanisms (DMM in short) and tailoring solutions accordingly. Specifically, we first analyze the data generation processes with temporal latent states and missing cause variables for different mechanisms. Sequentially, we model these generation processes via variational inference and estimate prior distributions of latent variables via normalizing flow-based neural architecture. Furthermore, we establish identifiability results under the nonlinear independent component analysis framework to show that latent variables are identifiable. Experimental results show that our method surpasses existing time series imputation techniques across various datasets with different missing mechanisms, demonstrating its effectiveness in real-world applications.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Conformal Uncertainty Quantification of Electricity Price Predictions for Risk-Averse Storage Arbitrage
Authors:
Saud Alghumayjan,
Ming Yi,
Bolun Xu
Abstract:
This paper proposes a risk-averse approach to energy storage price arbitrage, leveraging conformal uncertainty quantification for electricity price predictions. The method addresses the significant challenges posed by the inherent volatility and uncertainty of real-time electricity prices, which create substantial risks of financial losses for energy storage participants relying on future price fo…
▽ More
This paper proposes a risk-averse approach to energy storage price arbitrage, leveraging conformal uncertainty quantification for electricity price predictions. The method addresses the significant challenges posed by the inherent volatility and uncertainty of real-time electricity prices, which create substantial risks of financial losses for energy storage participants relying on future price forecasts to plan their operations. The framework comprises a two-layer prediction model to quantify real-time price uncertainty confidence intervals with high coverage. The framework is distribution-free and can work with any underlying point prediction model. We evaluate the quantification effectiveness through storage price arbitrage application by managing the risk of participating in the real-time market. We design a risk-averse policy for profit-maximization of energy storage arbitrage to find the safest storage schedule with very minimal losses. Using historical data from New York State and synthetic price predictions, our evaluations demonstrate that this framework can achieve good profit margins with less than $35\%$ purchases.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Integrating Dual Prototypes for Task-Wise Adaption in Pre-Trained Model-Based Class-Incremental Learning
Authors:
Zhiming Xu,
Suorong Yang,
Baile Xu,
Jian Zhao,
Furao Shen
Abstract:
Class-incremental learning (CIL) aims to acquire new classes while conserving historical knowledge incrementally. Despite existing pre-trained model (PTM) based methods performing excellently in CIL, it is better to fine-tune them on downstream incremental tasks with massive patterns unknown to PTMs. However, using task streams for fine-tuning could lead to catastrophic forgetting that will erase…
▽ More
Class-incremental learning (CIL) aims to acquire new classes while conserving historical knowledge incrementally. Despite existing pre-trained model (PTM) based methods performing excellently in CIL, it is better to fine-tune them on downstream incremental tasks with massive patterns unknown to PTMs. However, using task streams for fine-tuning could lead to catastrophic forgetting that will erase the knowledge in PTMs. This paper proposes the Dual Prototype network for Task-wise Adaption (DPTA) of PTM-based CIL. For each incremental learning task, a task-wise adapter module is built to fine-tune the PTM, where the center-adapt loss forces the representation to be more centrally clustered and class separable. The dual prototype network improves the prediction process by enabling test-time adapter selection, where the raw prototypes deduce several possible task indexes of test samples to select suitable adapter modules for PTM, and the augmented prototypes that could separate highly correlated classes are utilized to determine the final result. Experiments on several benchmark datasets demonstrate the state-of-the-art performance of DPTA. The code will be open-sourced after the paper is published.
△ Less
Submitted 9 March, 2025; v1 submitted 26 November, 2024;
originally announced November 2024.
-
Covariate Shift Corrected Conditional Randomization Test
Authors:
Bowen Xu,
Yiwen Huang,
Chuan Hong,
Shuangning Li,
Molei Liu
Abstract:
Conditional independence tests are crucial across various disciplines in determining the independence of an outcome variable $Y$ from a treatment variable $X$, conditioning on a set of confounders $Z$. The Conditional Randomization Test (CRT) offers a powerful framework for such testing by assuming known distributions of $X \mid Z$; it controls the Type-I error exactly, allowing for the use of fle…
▽ More
Conditional independence tests are crucial across various disciplines in determining the independence of an outcome variable $Y$ from a treatment variable $X$, conditioning on a set of confounders $Z$. The Conditional Randomization Test (CRT) offers a powerful framework for such testing by assuming known distributions of $X \mid Z$; it controls the Type-I error exactly, allowing for the use of flexible, black-box test statistics. In practice, testing for conditional independence often involves using data from a source population to draw conclusions about a target population. This can be challenging due to covariate shift -- differences in the distribution of $X$, $Z$, and surrogate variables, which can affect the conditional distribution of $Y \mid X, Z$ -- rendering traditional CRT approaches invalid. To address this issue, we propose a novel Covariate Shift Corrected Pearson Chi-squared Conditional Randomization (csPCR) test. This test adapts to covariate shifts by integrating importance weights and employing the control variates method to reduce variance in the test statistics and thus enhance power. Theoretically, we establish that the csPCR test controls the Type-I error asymptotically. Empirically, through simulation studies, we demonstrate that our method not only maintains control over Type-I errors but also exhibits superior power, confirming its efficacy and practical utility in real-world scenarios where covariate shifts are prevalent. Finally, we apply our methodology to a real-world dataset to assess the impact of a COVID-19 treatment on the 90-day mortality rate among patients.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Efficient Fraud Detection Using Deep Boosting Decision Trees
Authors:
Biao Xu,
Yao Wang,
Xiuwu Liao,
Kaidong Wang
Abstract:
Fraud detection is to identify, monitor, and prevent potentially fraudulent activities from complex data. The recent development and success in AI, especially machine learning, provides a new data-driven way to deal with fraud. From a methodological point of view, machine learning based fraud detection can be divided into two categories, i.e., conventional methods (decision tree, boosting...) and…
▽ More
Fraud detection is to identify, monitor, and prevent potentially fraudulent activities from complex data. The recent development and success in AI, especially machine learning, provides a new data-driven way to deal with fraud. From a methodological point of view, machine learning based fraud detection can be divided into two categories, i.e., conventional methods (decision tree, boosting...) and deep learning, both of which have significant limitations in terms of the lack of representation learning ability for the former and interpretability for the latter. Furthermore, due to the rarity of detected fraud cases, the associated data is usually imbalanced, which seriously degrades the performance of classification algorithms. In this paper, we propose deep boosting decision trees (DBDT), a novel approach for fraud detection based on gradient boosting and neural networks. In order to combine the advantages of both conventional methods and deep learning, we first construct soft decision tree (SDT), a decision tree structured model with neural networks as its nodes, and then ensemble SDTs using the idea of gradient boosting. In this way we embed neural networks into gradient boosting to improve its representation learning capability and meanwhile maintain the interpretability. Furthermore, aiming at the rarity of detected fraud cases, in the model training phase we propose a compositional AUC maximization approach to deal with data imbalances at algorithm level. Extensive experiments on several real-life fraud detection datasets show that DBDT can significantly improve the performance and meanwhile maintain good interpretability. Our code is available at https://github.com/freshmanXB/DBDT.
△ Less
Submitted 18 May, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Robust Causal Graph Representation Learning against Confounding Effects
Authors:
Hang Gao,
Jiangmeng Li,
Wenwen Qiang,
Lingyu Si,
Bing Xu,
Changwen Zheng,
Fuchun Sun
Abstract:
The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with…
▽ More
The prevailing graph neural network models have achieved significant progress in graph representation learning. However, in this paper, we uncover an ever-overlooked phenomenon: the pre-trained graph representation learning model tested with full graphs underperforms the model tested with well-pruned graphs. This observation reveals that there exist confounders in graphs, which may interfere with the model learning semantic information, and current graph representation learning methods have not eliminated their influence. To tackle this issue, we propose Robust Causal Graph Representation Learning (RCGRL) to learn robust graph representations against confounding effects. RCGRL introduces an active approach to generate instrumental variables under unconditional moment restrictions, which empowers the graph representation learning model to eliminate confounders, thereby capturing discriminative information that is causally related to downstream predictions. We offer theorems and proofs to guarantee the theoretical effectiveness of the proposed approach. Empirically, we conduct extensive experiments on a synthetic dataset and multiple benchmark datasets. The results demonstrate that compared with state-of-the-art methods, RCGRL achieves better prediction performance and generalization ability.
△ Less
Submitted 10 February, 2023; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection
Authors:
Keyu Nie,
Zezhong Zhang,
Bingquan Xu,
Tao Yuan
Abstract:
eBay's experimentation platform runs hundreds of A/B tests on any given day. The platform integrates with the tracking infrastructure and customer experience servers, provides the sampling service for experiments, and has the responsibility to monitor the progress of each A/B test. There are many challenges especially when it is required to ensure experiment quality at the large scale. We discuss…
▽ More
eBay's experimentation platform runs hundreds of A/B tests on any given day. The platform integrates with the tracking infrastructure and customer experience servers, provides the sampling service for experiments, and has the responsibility to monitor the progress of each A/B test. There are many challenges especially when it is required to ensure experiment quality at the large scale. We discuss two automated test quality monitoring processes and methodologies, namely randomization validation using population stability index (PSI) and sample ratio mismatch (a.k.a. sample delta) detection using sequential analysis. The automated processes assist the experimentation platform to run high quality and trustworthy tests not only effectively on a large scale, but also efficiently by minimizing false positive monitoring alarms to experimenters.
△ Less
Submitted 8 March, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning
Authors:
Renjue Li,
Pengfei Yang,
Cheng-Chao Huang,
Youcheng Sun,
Bai Xue,
Lijun Zhang
Abstract:
To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness prope…
▽ More
To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
△ Less
Submitted 13 April, 2022; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Seasonal association between viral causes of hospitalised acute lower respiratory infections and meteorological factors in China: a retrospective study
Authors:
Bing Xu,
Jinfeng Wang,
Zhongjie Li,
Chengdong Xu,
Yilan Liao,
Maogui Hu,
Jing Yang,
Shengjie Lai,
Liping Wang,
Weizhong Yang
Abstract:
Acute lower respiratory infections caused by respiratory viruses are common and persistent infectious diseases worldwide and in China, which have pronounced seasonal patterns. Meteorological factors have important roles in the seasonality of some major viruses. Our aim was to identify the dominant meteorological factors and to model their effects on common respiratory viruses in different regions…
▽ More
Acute lower respiratory infections caused by respiratory viruses are common and persistent infectious diseases worldwide and in China, which have pronounced seasonal patterns. Meteorological factors have important roles in the seasonality of some major viruses. Our aim was to identify the dominant meteorological factors and to model their effects on common respiratory viruses in different regions of China. We analysed monthly virus data on patients from 81 sentinel hospitals in 22 provinces in mainland China from 2009 to 2013. The geographical detector method was used to quantify the explanatory power of each meteorological factor, individually and interacting in pairs. 28369 hospitalised patients with ALRI were tested, 10387 were positive for at least one virus, including RSV, influenza virus, PIV, ADV, hBoV, hCoV and hMPV. RSV and influenza virus had annual peaks in the north and biannual peaks in the south. PIV and hBoV had higher positive rates in the spring summer months. hMPV had an annual peak in winter spring, especially in the north. ADV and hCoV exhibited no clear annual seasonality. Temperature, atmospheric pressure, vapour pressure, and rainfall had most explanatory power on most respiratory viruses in each region. Relative humidity was only dominant in the north, but had no significant explanatory power for most viruses in the south. Hours of sunlight had significant explanatory power for RSV and influenza virus in the north, and for most viruses in the south. Wind speed was the only factor with significant explanatory power for human coronavirus in the south. For all viruses, interactions between any two of the paired factors resulted in enhanced explanatory power, either bivariately or non-linearly.
△ Less
Submitted 15 April, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Isolation Distributional Kernel: A New Tool for Point & Group Anomaly Detection
Authors:
Kai Ming Ting,
Bi-Cun Xu,
Takashi Washio,
Zhi-Hua Zhou
Abstract:
We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is {\em data independent}. This paper shows that Isolation Distributional K…
▽ More
We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is {\em data independent}. This paper shows that Isolation Distributional Kernel (IDK), which is based on a {\em data dependent} point kernel, addresses both key issues. We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies. Without explicit learning, using IDK alone outperforms existing kernel based point anomaly detector OCSVM and other kernel mean embedding methods that rely on Gaussian kernel. For group anomaly detection,we introduce an IDK based detector called IDK$^2$. It reformulates the problem of group anomaly detection in input space into the problem of point anomaly detection in Hilbert space, without the need for learning. IDK$^2$ runs orders of magnitude faster than group anomaly detector OCSMM.We reveal for the first time that an effective kernel based anomaly detector based on kernel mean embedding must employ a characteristic kernel which is data dependent.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Multivariate Time-series Anomaly Detection via Graph Attention Network
Authors:
Hang Zhao,
Yujing Wang,
Juanyong Duan,
Congrui Huang,
Defu Cao,
Yunhai Tong,
Bixiong Xu,
Jing Bai,
Jie Tong,
Qi Zhang
Abstract:
Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications. Recent approaches have achieved significant progress in this topic, but there is remaining limitations. One major limitation is that they do not capture the relationships between different time-series explicitly, resulting in inevitable false alarms. In this paper, we prop…
▽ More
Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications. Recent approaches have achieved significant progress in this topic, but there is remaining limitations. One major limitation is that they do not capture the relationships between different time-series explicitly, resulting in inevitable false alarms. In this paper, we propose a novel self-supervised framework for multivariate time-series anomaly detection to address this issue. Our framework considers each univariate time-series as an individual feature and includes two graph attention layers in parallel to learn the complex dependencies of multivariate time-series in both temporal and feature dimensions. In addition, our approach jointly optimizes a forecasting-based model and are construction-based model, obtaining better time-series representations through a combination of single-timestamp prediction and reconstruction of the entire time-series. We demonstrate the efficacy of our model through extensive experiments. The proposed method outperforms other state-of-the-art models on three real-world datasets. Further analysis shows that our method has good interpretability and is useful for anomaly diagnosis.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
Shifu2: A Network Representation Learning Based Model for Advisor-advisee Relationship Mining
Authors:
Jiaying Liu,
Feng Xia,
Lei Wang,
Bo Xu,
Xiangjie Kong,
Hanghang Tong,
Irwin King
Abstract:
The advisor-advisee relationship represents direct knowledge heritage, and such relationship may not be readily available from academic libraries and search engines. This work aims to discover advisor-advisee relationships hidden behind scientific collaboration networks. For this purpose, we propose a novel model based on Network Representation Learning (NRL), namely Shifu2, which takes the collab…
▽ More
The advisor-advisee relationship represents direct knowledge heritage, and such relationship may not be readily available from academic libraries and search engines. This work aims to discover advisor-advisee relationships hidden behind scientific collaboration networks. For this purpose, we propose a novel model based on Network Representation Learning (NRL), namely Shifu2, which takes the collaboration network as input and the identified advisor-advisee relationship as output. In contrast to existing NRL models, Shifu2 considers not only the network structure but also the semantic information of nodes and edges. Shifu2 encodes nodes and edges into low-dimensional vectors respectively, both of which are then utilized to identify advisor-advisee relationships. Experimental results illustrate improved stability and effectiveness of the proposed model over state-of-the-art methods. In addition, we generate a large-scale academic genealogy dataset by taking advantage of Shifu2.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Label-Consistency based Graph Neural Networks for Semi-supervised Node Classification
Authors:
Bingbing Xu,
Junjie Huang,
Liang Hou,
Huawei Shen,
Jinhua Gao,
Xueqi Cheng
Abstract:
Graph neural networks (GNNs) achieve remarkable success in graph-based semi-supervised node classification, leveraging the information from neighboring nodes to improve the representation learning of target node. The success of GNNs at node classification depends on the assumption that connected nodes tend to have the same label. However, such an assumption does not always work, limiting the perfo…
▽ More
Graph neural networks (GNNs) achieve remarkable success in graph-based semi-supervised node classification, leveraging the information from neighboring nodes to improve the representation learning of target node. The success of GNNs at node classification depends on the assumption that connected nodes tend to have the same label. However, such an assumption does not always work, limiting the performance of GNNs at node classification. In this paper, we propose label-consistency based graph neural network(LC-GNN), leveraging node pairs unconnected but with the same labels to enlarge the receptive field of nodes in GNNs. Experiments on benchmark datasets demonstrate the proposed LC-GNN outperforms traditional GNNs in graph-based semi-supervised node classification.We further show the superiority of LC-GNN in sparse scenarios with only a handful of labeled nodes.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Authors:
Bo Xue,
Guanghui Wang,
Yimu Wang,
Lijun Zhang
Abstract:
In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+ε$ moments for some $ε\in(0,1]$. Through median of means and…
▽ More
In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+ε$ moments for some $ε\in(0,1]$. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{1+ε}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Meanwhile, we provide an $Ω(d^{\fracε{1+ε}}T^{\frac{1}{1+ε}})$ lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of $d$ and $T$ when $ε=1$. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
MCFlow: Monte Carlo Flow Models for Data Imputation
Authors:
Trevor W. Richardson,
Wencheng Wu,
Lei Lin,
Beilei Xu,
Edgar A. Bernal
Abstract:
We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. We address the causality dilemma that arises when training models with incomplete data by introducing an iterative learning scheme which…
▽ More
We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. We address the causality dilemma that arises when training models with incomplete data by introducing an iterative learning scheme which alternately updates the density estimate and the values of the missing entries in the training data. We provide extensive empirical validation of the effectiveness of the proposed method on standard multivariate and image datasets, and benchmark its performance against state-of-the-art alternatives. We demonstrate that MCFlow is superior to competing methods in terms of the quality of the imputed data, as well as with regards to its ability to preserve the semantic structure of the data.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Improving generalisation of AutoML systems with dynamic fitness evaluations
Authors:
Benjamin Patrick Evans,
Bing Xue,
Mengjie Zhang
Abstract:
A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iterat…
▽ More
A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often \textit{k}-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated \textit{k}-fold cross-validation, at little extra cost over single \textit{k}-fold, and far lower cost than typical repeated \textit{k}-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single \textit{k}-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
Disentanglement Challenge: From Regularization to Reconstruction
Authors:
Jie Qiao,
Zijian Li,
Boyan Xu,
Ruichu Cai,
Kun Zhang
Abstract:
The challenge of learning disentangled representation has recently attracted much attention and boils down to a competition using a new real world disentanglement dataset (Gondal et al., 2019). Various methods based on variational auto-encoder have been proposed to solve this problem, by enforcing the independence between the representation and modifying the regularization term in the variational…
▽ More
The challenge of learning disentangled representation has recently attracted much attention and boils down to a competition using a new real world disentanglement dataset (Gondal et al., 2019). Various methods based on variational auto-encoder have been proposed to solve this problem, by enforcing the independence between the representation and modifying the regularization term in the variational lower bound. However recent work by Locatello et al. (2018) has demonstrated that the proposed methods are heavily influenced by randomness and the choice of the hyper-parameter. In this work, instead of designing a new regularization term, we adopt the FactorVAE but improve the reconstruction performance and increase the capacity of network and the training step. The strategy turns out to be very effective and achieve the 1st place in the challenge.
△ Less
Submitted 30 November, 2019;
originally announced December 2019.
-
KerGM: Kernelized Graph Matching
Authors:
Zhen Zhang,
Yijian Xiang,
Lingfei Wu,
Bing Xue,
Arye Nehorai
Abstract:
Graph matching plays a central role in such fields as computer vision, pattern recognition, and bioinformatics. Graph matching problems can be cast as two types of quadratic assignment problems (QAPs): Koopmans-Beckmann's QAP or Lawler's QAP. In our paper, we provide a unifying view for these two problems by introducing new rules for array operations in Hilbert spaces. Consequently, Lawler's QAP c…
▽ More
Graph matching plays a central role in such fields as computer vision, pattern recognition, and bioinformatics. Graph matching problems can be cast as two types of quadratic assignment problems (QAPs): Koopmans-Beckmann's QAP or Lawler's QAP. In our paper, we provide a unifying view for these two problems by introducing new rules for array operations in Hilbert spaces. Consequently, Lawler's QAP can be considered as the Koopmans-Beckmann's alignment between two arrays in reproducing kernel Hilbert spaces (RKHS), making it possible to efficiently solve the problem without computing a huge affinity matrix. Furthermore, we develop the entropy-regularized Frank-Wolfe (EnFW) algorithm for optimizing QAPs, which has the same convergence rate as the original FW algorithm while dramatically reducing the computational burden for each outer iteration. We conduct extensive experiments to evaluate our approach, and show that our algorithm significantly outperforms the state-of-the-art in both matching accuracy and scalability.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Bounding Regression Errors in Data-driven Power Grid Steady-state Models
Authors:
Yuxiao Liu,
Bolun Xu,
Audun Botterud,
Ning Zhang,
Chongqing Kang
Abstract:
Data-driven models analyze power grids under incomplete physical information, and their accuracy has been mostly validated empirically using certain training and testing datasets. This paper explores error bounds for data-driven models under all possible training and testing scenarios, and proposes an evaluation implementation based on Rademacher complexity theory. We answer key questions for data…
▽ More
Data-driven models analyze power grids under incomplete physical information, and their accuracy has been mostly validated empirically using certain training and testing datasets. This paper explores error bounds for data-driven models under all possible training and testing scenarios, and proposes an evaluation implementation based on Rademacher complexity theory. We answer key questions for data-driven models: how much training data is required to guarantee a certain error bound, and how partial physical knowledge can be utilized to reduce the required amount of data. Our results are crucial for the evaluation and application of data-driven models in power grid analysis. We demonstrate the proposed method by finding generalization error bounds for two applications, i.e. branch flow linearization and external network equivalent under different degrees of physical knowledge. Results identify how the bounds decrease with additional power grid physical knowledge or more training data.
△ Less
Submitted 26 May, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
FAKTA: An Automatic End-to-End Fact Checking System
Authors:
Moin Nadeem,
Wei Fang,
Brian Xu,
Mitra Mohtarami,
James Glass
Abstract:
We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis. FAKTA predicts the factuality of given claims and provides evidence at the document and sentence level to explai…
▽ More
We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis. FAKTA predicts the factuality of given claims and provides evidence at the document and sentence level to explain its predictions
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Time-Series Anomaly Detection Service at Microsoft
Authors:
Hansheng Ren,
Bixiong Xu,
Yujing Wang,
Chao Yi,
Congrui Huang,
Xiaoyu Kou,
Tony Xing,
Mao Yang,
Jie Tong,
Qi Zhang
Abstract:
Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which…
▽ More
Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Label Mapping Neural Networks with Response Consolidation for Class Incremental Learning
Authors:
Xu Zhang,
Yang Yao,
Baile Xu,
Lekun Mao,
Furao Shen,
Jian Zhao,
Qingwei Lin
Abstract:
Class incremental learning refers to a special multi-class classification task, in which the number of classes is not fixed but is increasing with the continual arrival of new data. Existing researches mainly focused on solving catastrophic forgetting problem in class incremental learning. To this end, however, these models still require the old classes cached in the auxiliary data structure or mo…
▽ More
Class incremental learning refers to a special multi-class classification task, in which the number of classes is not fixed but is increasing with the continual arrival of new data. Existing researches mainly focused on solving catastrophic forgetting problem in class incremental learning. To this end, however, these models still require the old classes cached in the auxiliary data structure or models, which is inefficient in space or time. In this paper, it is the first time to discuss the difficulty without support of old classes in class incremental learning, which is called as softmax suppression problem. To address these challenges, we develop a new model named Label Mapping with Response Consolidation (LMRC), which need not access the old classes anymore. We propose the Label Mapping algorithm combined with the multi-head neural network for mitigating the softmax suppression problem, and propose the Response Consolidation method to overcome the catastrophic forgetting problem. Experimental results on the benchmark datasets show that our proposed method achieves much better performance compared to the related methods in different scenarios.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
Reference-Based Sequence Classification
Authors:
Zengyou He,
Guangyao Xu,
Chaohua Sheng,
Bo Xu,
Quan Zou
Abstract:
Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classificat…
▽ More
Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classification framework, which can unify existing pattern-based sequence classification methods under the same umbrella. More importantly, this framework can be used as a general platform for developing new sequence classification algorithms. By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions. Experimental results show that new methods developed under the proposed framework are capable of achieving comparable classification accuracy to those state-of-the-art sequence classification algorithms.
△ Less
Submitted 13 December, 2020; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Model reconstruction from temporal data for coupled oscillator networks
Authors:
Mark J Panaggio,
Maria-Veronica Ciocanel,
Lauren Lazarus,
Chad M Topaz,
Bin Xu
Abstract:
In a complex system, the interactions between individual agents often lead to emergent collective behavior like spontaneous synchronization, swarming, and pattern formation. The topology of the network of interactions can have a dramatic influence over those dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network,…
▽ More
In a complex system, the interactions between individual agents often lead to emergent collective behavior like spontaneous synchronization, swarming, and pattern formation. The topology of the network of interactions can have a dramatic influence over those dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network, and attempt to learn about the dynamics that can be observed in the model. Here we consider the inverse problem: given the dynamics of a system, can one learn about the underlying network? We investigate arbitrary networks of coupled phase-oscillators whose dynamics are characterized by synchronization. We demonstrate that, given sufficient observational data on the transient evolution of each oscillator, one can use machine learning methods to reconstruct the interaction network and simultaneously identify the parameters of a model for the intrinsic dynamics of the oscillators and their coupling.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
Operation-aware Neural Networks for User Response Prediction
Authors:
Yi Yang,
Baile Xu,
Furao Shen,
Jian Zhao
Abstract:
User response prediction makes a crucial contribution to the rapid development of online advertising system and recommendation system. The importance of learning feature interactions has been emphasized by many works. Many deep models are proposed to automatically learn high-order feature interactions. Since most features in advertising system and recommendation system are high-dimensional sparse…
▽ More
User response prediction makes a crucial contribution to the rapid development of online advertising system and recommendation system. The importance of learning feature interactions has been emphasized by many works. Many deep models are proposed to automatically learn high-order feature interactions. Since most features in advertising system and recommendation system are high-dimensional sparse features, deep models usually learn a low-dimensional distributed representation for each feature in the bottom layer. Besides traditional fully-connected architectures, some new operations, such as convolutional operations and product operations, are proposed to learn feature interactions better. In these models, the representation is shared among different operations. However, the best representation for different operations may be different. In this paper, we propose a new neural model named Operation-aware Neural Networks (ONN) which learns different representations for different operations. Our experimental results on two large-scale real-world ad click/conversion datasets demonstrate that ONN consistently outperforms the state-of-the-art models in both offline-training environment and online-training environment.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Graph Wavelet Neural Network
Authors:
Bingbing Xu,
Huawei Shen,
Qi Cao,
Yunqi Qiu,
Xueqi Cheng
Abstract:
We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. Different from graph Fourier transform, graph wavelet transform can be obtained via a fast algorithm without requiring matrix eigendecomposition with high compu…
▽ More
We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. Different from graph Fourier transform, graph wavelet transform can be obtained via a fast algorithm without requiring matrix eigendecomposition with high computational cost. Moreover, graph wavelets are sparse and localized in vertex domain, offering high efficiency and good interpretability for graph convolution. The proposed GWNN significantly outperforms previous spectral graph CNNs in the task of graph-based semi-supervised classification on three benchmark datasets: Cora, Citeseer and Pubmed.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Adversarial Domain Adaptation for Stance Detection
Authors:
Brian Xu,
Mitra Mohtarami,
James Glass
Abstract:
This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim. Stance detection is a major component of automated fact checking. As annotating stances in different domains is a tedious and costly task, automatic methods based on machine learning are viable alternatives. In this paper, we focus on adversarial d…
▽ More
This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim. Stance detection is a major component of automated fact checking. As annotating stances in different domains is a tedious and costly task, automatic methods based on machine learning are viable alternatives. In this paper, we focus on adversarial domain adaptation for stance detection where we assume there exists sufficient labeled data in the source domain and limited labeled data in the target domain. Extensive experiments on publicly available datasets show the effectiveness of our domain adaption model in transferring knowledge for accurate stance detection across domains.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Three Mechanisms of Weight Decay Regularization
Authors:
Guodong Zhang,
Chaoqi Wang,
Bowen Xu,
Roger Grosse
Abstract:
Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization. Literal weight decay has been shown to outperform $L_2$ regularization for optimizers for which they differ. We empirically investigate weight decay for t…
▽ More
Weight decay is one of the standard tricks in the neural network toolbox, but the reasons for its regularization effect are poorly understood, and recent results have cast doubt on the traditional interpretation in terms of $L_2$ regularization. Literal weight decay has been shown to outperform $L_2$ regularization for optimizers for which they differ. We empirically investigate weight decay for three optimization algorithms (SGD, Adam, and K-FAC) and a variety of network architectures. We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization. Our results provide insight into how to improve the regularization of neural networks.
△ Less
Submitted 29 October, 2018;
originally announced October 2018.
-
Robustness of Maximum Correntropy Estimation Against Large Outliers
Authors:
Badong Chen,
Lei Xing,
Haiquan Zhao,
Bin Xu,
Jose C. Principe
Abstract:
The maximum correntropy criterion (MCC) has recently been successfully applied in robust regression, classification and adaptive filtering, where the correntropy is maximized instead of minimizing the well-known mean square error (MSE) to improve the robustness with respect to outliers (or impulsive noises). Considerable efforts have been devoted to develop various robust adaptive algorithms under…
▽ More
The maximum correntropy criterion (MCC) has recently been successfully applied in robust regression, classification and adaptive filtering, where the correntropy is maximized instead of minimizing the well-known mean square error (MSE) to improve the robustness with respect to outliers (or impulsive noises). Considerable efforts have been devoted to develop various robust adaptive algorithms under MCC, but so far little insight has been gained as to how the optimal solution will be affected by outliers. In this work, we study this problem in the context of parameter estimation for a simple linear errors-in-variables (EIV) model where all variables are scalar. Under certain conditions, we derive an upper bound on the absolute value of the estimation error and show that the optimal solution under MCC can be very close to the true value of the unknown parameter even with outliers (whose values can be arbitrarily large) in both input and output variables. Illustrative examples are presented to verify and clarify the theory.
△ Less
Submitted 23 November, 2017; v1 submitted 23 March, 2017;
originally announced March 2017.
-
Maximum Correntropy Unscented Filter
Authors:
Xi Liu,
Badong Chen,
Bin Xu,
Zongze Wu,
Paul Honeine
Abstract:
The unscented transformation (UT) is an efficient method to solve the state estimation problem for a non-linear dynamic system, utilizing a derivative-free higher-order approximation by approximating a Gaussian distribution rather than approximating a non-linear function. Applying the UT to a Kalman filter type estimator leads to the well-known unscented Kalman filter (UKF). Although the UKF works…
▽ More
The unscented transformation (UT) is an efficient method to solve the state estimation problem for a non-linear dynamic system, utilizing a derivative-free higher-order approximation by approximating a Gaussian distribution rather than approximating a non-linear function. Applying the UT to a Kalman filter type estimator leads to the well-known unscented Kalman filter (UKF). Although the UKF works very well in Gaussian noises, its performance may deteriorate significantly when the noises are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises. To improve the robustness of the UKF against impulsive noises, a new filter for nonlinear systems is proposed in this work, namely the maximum correntropy unscented filter (MCUF). In MCUF, the UT is applied to obtain the prior estimates of the state and covariance matrix, and a robust statistical linearization regression based on the maximum correntropy criterion (MCC) is then used to obtain the posterior estimates of the state and covariance. The satisfying performance of the new algorithm is confirmed by two illustrative examples.
△ Less
Submitted 26 August, 2016;
originally announced August 2016.
-
Kernel Risk-Sensitive Loss: Definition, Properties and Application to Robust Adaptive Filtering
Authors:
Badong Chen,
Lei Xing,
Bin Xu,
Haiquan Zhao,
Nanning Zheng,
Jose C. Principe
Abstract:
Nonlinear similarity measures defined in kernel space, such as correntropy, can extract higher-order statistics of data and offer potentially significant performance improvement over their linear counterparts especially in non-Gaussian signal processing and machine learning. In this work, we propose a new similarity measure in kernel space, called the kernel risk-sensitive loss (KRSL), and provide…
▽ More
Nonlinear similarity measures defined in kernel space, such as correntropy, can extract higher-order statistics of data and offer potentially significant performance improvement over their linear counterparts especially in non-Gaussian signal processing and machine learning. In this work, we propose a new similarity measure in kernel space, called the kernel risk-sensitive loss (KRSL), and provide some important properties. We apply the KRSL to adaptive filtering and investigate the robustness, and then develop the MKRSL algorithm and analyze the mean square convergence performance. Compared with correntropy, the KRSL can offer a more efficient performance surface, thereby enabling a gradient based method to achieve faster convergence speed and higher accuracy while still maintaining the robustness to outliers. Theoretical analysis results and superior performance of the new algorithm are confirmed by simulation.
△ Less
Submitted 1 August, 2016;
originally announced August 2016.
-
Empirical Evaluation of Rectified Activations in Convolutional Network
Authors:
Bing Xu,
Naiyan Wang,
Tianqi Chen,
Mu Li
Abstract:
In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU). We evaluate these activation function on standard image classification task. Our experim…
▽ More
In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU). We evaluate these activation function on standard image classification task. Our experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results. Thus our findings are negative on the common belief that sparsity is the key of good performance in ReLU. Moreover, on small scale dataset, using deterministic negative slope or learning it are both prone to overfitting. They are not as effective as using their randomized counterpart. By using RReLU, we achieved 75.68\% accuracy on CIFAR-100 test set without multiple test or ensemble.
△ Less
Submitted 27 November, 2015; v1 submitted 4 May, 2015;
originally announced May 2015.
-
Generative Adversarial Networks
Authors:
Ian J. Goodfellow,
Jean Pouget-Abadie,
Mehdi Mirza,
Bing Xu,
David Warde-Farley,
Sherjil Ozair,
Aaron Courville,
Yoshua Bengio
Abstract:
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This fram…
▽ More
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
△ Less
Submitted 10 June, 2014;
originally announced June 2014.
-
A Comment on "Cycles and Instability in a Rock-Paper-Scissors Population Game: A Continuous Time Experiment"
Authors:
Zhijian Wang,
Siqian Zhu,
Bin Xu
Abstract:
The authors (Cason, Friedman and Hopkins, Reviews of Economics Studies, 2014) claimed a result that the treatments (using simultaneous matching in discrete time) replicate previous results that exhibit weak or no cycles. After correct two mathematical mistakes in their cycles tripwire algorithm, we research the cycles by scanning the tripwire in the full strategy space of the games and we find sig…
▽ More
The authors (Cason, Friedman and Hopkins, Reviews of Economics Studies, 2014) claimed a result that the treatments (using simultaneous matching in discrete time) replicate previous results that exhibit weak or no cycles. After correct two mathematical mistakes in their cycles tripwire algorithm, we research the cycles by scanning the tripwire in the full strategy space of the games and we find significant cycles missed by the authors. So we suggest that, all of the treatments (using simultaneous matching in discrete time) exhibit significant cycles.
△ Less
Submitted 19 November, 2013; v1 submitted 11 November, 2013;
originally announced November 2013.
-
Challenges in Representation Learning: A report on three machine learning contests
Authors:
Ian J. Goodfellow,
Dumitru Erhan,
Pierre Luc Carrier,
Aaron Courville,
Mehdi Mirza,
Ben Hamner,
Will Cukierski,
Yichuan Tang,
David Thaler,
Dong-Hyun Lee,
Yingbo Zhou,
Chetan Ramaiah,
Fangxiang Feng,
Ruifan Li,
Xiaojie Wang,
Dimitris Athanasakis,
John Shawe-Taylor,
Maxim Milakov,
John Park,
Radu Ionescu,
Marius Popescu,
Cristian Grozea,
James Bergstra,
Jingjing Xie,
Lukasz Romaszko
, et al. (3 additional authors not shown)
Abstract:
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin…
▽ More
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.
△ Less
Submitted 1 July, 2013;
originally announced July 2013.
-
Horizontal and Vertical Ensemble with Deep Representation for Classification
Authors:
Jingjing Xie,
Bing Xu,
Zhang Chuang
Abstract:
Representation learning, especially which by using deep learning, has been widely applied in classification. However, how to use limited size of labeled data to achieve good classification performance with deep neural network, and how can the learned features further improve classification remain indefinite. In this paper, we propose Horizontal Voting Vertical Voting and Horizontal Stacked Ensembl…
▽ More
Representation learning, especially which by using deep learning, has been widely applied in classification. However, how to use limited size of labeled data to achieve good classification performance with deep neural network, and how can the learned features further improve classification remain indefinite. In this paper, we propose Horizontal Voting Vertical Voting and Horizontal Stacked Ensemble methods to improve the classification performance of deep neural networks. In the ICML 2013 Black Box Challenge, via using these methods independently, Bing Xu achieved 3rd in public leaderboard, and 7th in private leaderboard; Jingjing Xie achieved 4th in public leaderboard, and 5th in private leaderboard.
△ Less
Submitted 12 June, 2013;
originally announced June 2013.
-
Do cycles dissipate when subjects must choose simultaneously?
Authors:
Bin Xu,
Zhijian Wang
Abstract:
This question is raised by Cason, Friedman and Hopkins (CFH, 2012) after they firstly found and indexed quantitatively the cycles in a continuous time experiment. To answer this question, we use the data from standard RPS experiment. Our experiments are of the traditional setting - in each of repeated rounds, the subjects are paired with random matching, using pure strategy and must choose simulta…
▽ More
This question is raised by Cason, Friedman and Hopkins (CFH, 2012) after they firstly found and indexed quantitatively the cycles in a continuous time experiment. To answer this question, we use the data from standard RPS experiment. Our experiments are of the traditional setting - in each of repeated rounds, the subjects are paired with random matching, using pure strategy and must choose simultaneously, and after each round, each subject obtains only private information. This economics environment is a decartelized and low-information one.
Using the cycle rotation indexes (CRI, developed by CFH) method, we find, the cycles not only exist but also persist in our experiment. Meanwhile, the cycles' direction are consistent with 'standard' learning models. That is the answer to the CHF question: Cycles do not dissipate in the simultaneously choose game.
In addtion, we discuss three questions (1) why significant cycles are uneasy to be obtained in traditional setting experiments; (2) why CRI can be an iconic indexing-method for 'standard' evolution dynamics; and (3) where more cycles could be expected.
△ Less
Submitted 12 August, 2012;
originally announced August 2012.
-
Test MaxEnt in Social Strategy Transitions with Experimental Two-Person Constant Sum 2$\times$2 Games
Authors:
Bin Xu,
Zhijian Wang
Abstract:
By using laboratory experimental data, we test the uncertainty of social strategy transitions in various competing environments of fixed paired two-person constant sum $2 \times 2$ games. It firstly shows that, the distributions of social strategy transitions are not erratic but obey the principle of the maximum entropy (MaxEnt). This finding indicates that human subject social systems and natural…
▽ More
By using laboratory experimental data, we test the uncertainty of social strategy transitions in various competing environments of fixed paired two-person constant sum $2 \times 2$ games. It firstly shows that, the distributions of social strategy transitions are not erratic but obey the principle of the maximum entropy (MaxEnt). This finding indicates that human subject social systems and natural systems could have wider common backgrounds.
△ Less
Submitted 7 September, 2012; v1 submitted 3 July, 2012;
originally announced July 2012.
-
Maxent in Experimental 2$\times$2 Population Games
Authors:
Bin Xu,
Zhijian Wang
Abstract:
In mixed strategy 2\times2 population games, the realization of maximum entropy (Maxent) is of the theoretical expectation. We evaluate this theoretical prediction in the experimental economics game data. The data includes 12 treatments and 108 experimental sessions in which the random match human subjects pairs make simultaneous strategy moves repeated 200 rounds. Main results are (1) We confirm…
▽ More
In mixed strategy 2\times2 population games, the realization of maximum entropy (Maxent) is of the theoretical expectation. We evaluate this theoretical prediction in the experimental economics game data. The data includes 12 treatments and 108 experimental sessions in which the random match human subjects pairs make simultaneous strategy moves repeated 200 rounds. Main results are (1) We confirm that experimental entropy value fit the prediction from Maxent well; and (2) In small proportion samples, distributions are deviated from Maxent expectations; interesting is that, the deviated patterns are significant more concentrated. These experimental findings could enhance the understanding of social game behavior with the natural science rule --- Maxent.
△ Less
Submitted 15 June, 2012;
originally announced June 2012.
-
Evolutionary Rotation in Switching Incentive Zero-Sum Games
Authors:
Zhijian Wang,
Bin Xu
Abstract:
In a laboratory experiment, round by round, individual interactions should lead to the social evolutionary rotation in population strategy state space. Successive switching the incentive parameter should lead to successive change of the rotation ---- both of its direction and its strength. In data from a switching payoff matrix experiment of extended 2x2 games (Binmore, Swierzbinski and Proulx, 20…
▽ More
In a laboratory experiment, round by round, individual interactions should lead to the social evolutionary rotation in population strategy state space. Successive switching the incentive parameter should lead to successive change of the rotation ---- both of its direction and its strength. In data from a switching payoff matrix experiment of extended 2x2 games (Binmore, Swierzbinski and Proulx, 2001 [1]), we find the changing of the social evolutionary rotation can be distinguished quantitatively. The evolutionary rotation can be captured by evolutionary dynamics. With eigenvalue from the Jacobian of a constrained replicator dynamics model, an interpretation for observed rotation strength is given. In addition, equality-of-populations rank test shows that relative response coefficient of a group could persist cross the switching parameter games. The data has successively been used to support Von Neumann's minimax theory. Using the old data, with observed evolutionary rotation, this report provides a new insight into evolutionary game theory and experimental social dynamics.
△ Less
Submitted 24 July, 2012; v1 submitted 12 March, 2012;
originally announced March 2012.
-
Measurement and Application of Entropy Production Rate in Human Subject Social Interaction Systems
Authors:
Bin Xu,
Zhijian Wang
Abstract:
This paper illustrates the measurement and the applications of the observable, entropy production rate (EPR), in human subject social interaction systems. To this end, we show (1) how to test the minimax randomization model with experimental economics' 2$\times$2 games data and with the Wimbledon Tennis data; (2) how to identify the Edgeworth price cycle in experimental market data; and (3) the re…
▽ More
This paper illustrates the measurement and the applications of the observable, entropy production rate (EPR), in human subject social interaction systems. To this end, we show (1) how to test the minimax randomization model with experimental economics' 2$\times$2 games data and with the Wimbledon Tennis data; (2) how to identify the Edgeworth price cycle in experimental market data; and (3) the relationship within EPR and motion in data. As a result, in human subject social interaction systems, EPR can be measured practically and can be employed to test models and to search for facts efficiently.
△ Less
Submitted 29 July, 2011;
originally announced July 2011.
-
Detecting the optimal number of communities in complex networks
Authors:
Zhifang Li,
Yanqing Hu,
Beishan Xu,
Zengru Di,
Ying Fan
Abstract:
To obtain the optimal number of communities is an important problem in detecting community structure. In this paper, we extend the measurement of community detecting algorithms to find the optimal community number. Based on the normalized mutual information index, which has been used as a measure for similarity of communities, a statistic $Ω(c)$ is proposed to detect the optimal number of communit…
▽ More
To obtain the optimal number of communities is an important problem in detecting community structure. In this paper, we extend the measurement of community detecting algorithms to find the optimal community number. Based on the normalized mutual information index, which has been used as a measure for similarity of communities, a statistic $Ω(c)$ is proposed to detect the optimal number of communities. In general, when $Ω(c)$ reaches its local maximum, especially the first one, the corresponding number of communities \emph{c} is likely to be optimal in community detection. Moreover, the statistic $Ω(c)$ can also measure the significance of community structures in complex networks, which has been paid more attention recently. Numerical and empirical results show that the index $Ω(c)$ is effective in both artificial and real world networks.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.