-
Identifiability Analysis of Linear ODE Systems with Hidden Confounders
Authors:
Yuanyuan Wang,
Biwei Huang,
Wei Huang,
Xi Geng,
Mingming Gong
Abstract:
The identifiability analysis of linear Ordinary Differential Equation (ODE) systems is a necessary prerequisite for making reliable causal inferences about these systems. While identifiability has been well studied in scenarios where the system is fully observable, the conditions for identifiability remain unexplored when latent variables interact with the system. This paper aims to address this g…
▽ More
The identifiability analysis of linear Ordinary Differential Equation (ODE) systems is a necessary prerequisite for making reliable causal inferences about these systems. While identifiability has been well studied in scenarios where the system is fully observable, the conditions for identifiability remain unexplored when latent variables interact with the system. This paper aims to address this gap by presenting a systematic analysis of identifiability in linear ODE systems incorporating hidden confounders. Specifically, we investigate two cases of such systems. In the first case, latent confounders exhibit no causal relationships, yet their evolution adheres to specific functional forms, such as polynomial functions of time $t$. Subsequently, we extend this analysis to encompass scenarios where hidden confounders exhibit causal dependencies, with the causal structure of latent variables described by a Directed Acyclic Graph (DAG). The second case represents a more intricate variation of the first case, prompting a more comprehensive identifiability analysis. Accordingly, we conduct detailed identifiability analyses of the second system under various observation conditions, including both continuous and discrete observations from single or multiple trajectories. To validate our theoretical results, we perform a series of simulations, which support and substantiate our findings.
△ Less
Submitted 30 October, 2024; v1 submitted 29 October, 2024;
originally announced October 2024.
-
Sample size calculation based on the difference in restricted mean time lost for clinical trials with competing risks
Authors:
Xiang Geng,
Zhaojin Li,
Chengfeng Zhang,
Yanjie Wang,
Haoning Shen,
Zhiheng Huang,
Yawen Hou,
Zheng Chen
Abstract:
Computation of sample size is important when designing clinical trials. The presence of competing risks makes the design of clinical trials with time-to-event endpoints cumbersome. A model based on the subdistribution hazard ratio (SHR) is commonly used for trials under competing risks. However, this approach has some limitations related to model assumptions and clinical interpretation. Considerin…
▽ More
Computation of sample size is important when designing clinical trials. The presence of competing risks makes the design of clinical trials with time-to-event endpoints cumbersome. A model based on the subdistribution hazard ratio (SHR) is commonly used for trials under competing risks. However, this approach has some limitations related to model assumptions and clinical interpretation. Considering such limitations, the difference in restricted mean time lost (RMTLd) is recommended as an alternative indicator. In this paper, we propose a sample size calculation method based on the RMTLd for the Weibull distribution (RMTLdWeibull) for clinical trials, which considers experimental conditions such as equal allocation, uniform accrual, uniform loss to follow-up, and administrative censoring. Simulation results show that sample size calculation based on the RMTLdWeibull can generally achieve a predefined power level and maintain relative robustness. Moreover, the performance of the sample size calculation based on the RMTLdWeibull is similar or superior to that based on the SHR. Even if the event time does not follow the Weibull distribution, the sample size calculation based on the RMTLdWeibull still performs well. The results also verify the performance of the sample size calculation method based on the RMTLdWeibull. From the perspective of the results of this study, clinical interpretation, application conditions and statistical performance, we recommend that when designing clinical trials in the presence of competing risks, the RMTLd indicator be applied for sample size calculation and subsequent effect size measurement.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Generator Identification for Linear SDEs with Additive and Multiplicative Noise
Authors:
Yuanyuan Wang,
Xi Geng,
Wei Huang,
Biwei Huang,
Mingming Gong
Abstract:
In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifica…
▽ More
In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.
△ Less
Submitted 21 January, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations
Authors:
Yuanyuan Wang,
Wei Huang,
Mingming Gong,
Xi Geng,
Tongliang Liu,
Kun Zhang,
Dacheng Tao
Abstract:
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. However, the theoretical aspects, e.g., identifiability and asymptotic properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a…
▽ More
Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. However, the theoretical aspects, e.g., identifiability and asymptotic properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory. When observations are disturbed by measurement noise, we prove that under mild conditions, the parameter estimator based on the Nonlinear Least Squares (NLS) method is consistent and asymptotic normal with $n^{-1/2}$ convergence rate. Based on the asymptotic normality property, we construct confidence sets for the unknown system parameters and propose a new method to infer the causal structure of the ODE system, i.e., inferring whether there is a causal link between system variables. Furthermore, we extend the results to degraded observations, including aggregated and time-scaled ones. To the best of our knowledge, our work is the first systematic study of the identifiability and asymptotic properties in learning linear ODE systems. We also construct simulations with various system dimensions to illustrate the established theoretical results.
△ Less
Submitted 2 June, 2024; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Compact Learning for Multi-Label Classification
Authors:
Jiaqi Lv,
Tianran Wu,
Chenglun Peng,
Yunpeng Liu,
Ning Xu,
Xin Geng
Abstract:
Multi-label classification (MLC) studies the problem where each instance is associated with multiple relevant labels, which leads to the exponential growth of output space. MLC encourages a popular framework named label compression (LC) for capturing label dependency with dimension reduction. Nevertheless, most existing LC methods failed to consider the influence of the feature space or misguided…
▽ More
Multi-label classification (MLC) studies the problem where each instance is associated with multiple relevant labels, which leads to the exponential growth of output space. MLC encourages a popular framework named label compression (LC) for capturing label dependency with dimension reduction. Nevertheless, most existing LC methods failed to consider the influence of the feature space or misguided by original problematic features, so that may result in performance degeneration. In this paper, we present a compact learning (CL) framework to embed the features and labels simultaneously and with mutual guidance. The proposal is a versatile concept, hence the embedding way is arbitrary and independent of the subsequent learning process. Following its spirit, a simple yet effective implementation called compact multi-label learning (CMLL) is proposed to learn a compact low-dimensional representation for both spaces. CMLL maximizes the dependence between the embedded spaces of the labels and features, and minimizes the loss of label space recovery concurrently. Theoretically, we provide a general analysis for different embedding methods. Practically, we conduct extensive experiments to validate the effectiveness of the proposed method.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Provably Consistent Partial-Label Learning
Authors:
Lei Feng,
Jiaqi Lv,
Bo Han,
Miao Xu,
Gang Niu,
Xin Geng,
Bo An,
Masashi Sugiyama
Abstract:
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then…
▽ More
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two novel PLL methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods.
△ Less
Submitted 23 October, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling
Authors:
Russell Mendonca,
Xinyang Geng,
Chelsea Finn,
Sergey Levine
Abstract:
Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experie…
▽ More
Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task.
△ Less
Submitted 15 June, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data
Authors:
Xiaoqing Geng,
Xiwen Chen,
Kenny Q. Zhu,
Libin Shen,
Yinggong Zhao
Abstract:
Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is partic…
▽ More
Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is particularly powerful when the training data is very small. In this framework, models not only strive to classify query instances, but also seek underlying knowledge about the support instances to obtain better instance representations. The framework also includes a method for aggregating cross-domain knowledge into models by open-source task enrichment. Additionally, we construct a brand new dataset: the TinyRel-CM dataset, a few-shot relation classification dataset in health domain with purposely small training data and challenging relation classes. Experimental results demonstrate that our framework brings performance gains for most underlying classification models, outperforms the state-of-the-art results given small training data, and achieves competitive results with sufficiently large training data.
△ Less
Submitted 14 December, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
Role-Wise Data Augmentation for Knowledge Distillation
Authors:
Jie Fu,
Xue Geng,
Zhijian Duan,
Bohan Zhuang,
Xingdi Yuan,
Adam Trischler,
Jie Lin,
Chris Pal,
Hao Dong
Abstract:
Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teac…
▽ More
Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated. Due to the difference in model capacities, the student may not benefit fully from the same data points on which the teacher is trained. On the other hand, a human teacher may demonstrate a piece of knowledge with individualized examples adapted to a particular student, for instance, in terms of her cultural background and interests. Inspired by this behavior, we design data augmentation agents with distinct roles to facilitate knowledge distillation. Our data augmentation agents generate distinct training data for the teacher and student, respectively. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student. We compare our approach with existing KD methods on training popular neural architectures and demonstrate that role-wise data augmentation improves the effectiveness of KD over strong prior approaches. The code for reproducing our results can be found at https://github.com/bigaidream-projects/role-kd
△ Less
Submitted 19 April, 2020;
originally announced April 2020.
-
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
Authors:
Benjamin Eysenbach,
Xinyang Geng,
Sergey Levine,
Ruslan Salakhutdinov
Abstract:
Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsi…
▽ More
Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Progressive Identification of True Labels for Partial-Label Learning
Authors:
Jiaqi Lv,
Miao Xu,
Lei Feng,
Gang Niu,
Xin Geng,
Masashi Sugiyama
Abstract:
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data…
▽ More
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. The goal of this paper is to propose a novel framework of PLL with flexibility on the model and optimization algorithm. More specifically, we propose a novel estimator of the classification risk, theoretically analyze the classifier-consistency, and establish an estimation error bound. Then we propose a progressive identification algorithm for approximately minimizing the proposed risk estimator, where the update of the model and identification of true labels are conducted in a seamless manner. The resulting algorithm is model-independent and loss-independent, and compatible with stochastic optimization. Thorough experiments demonstrate it sets the new state of the art.
△ Less
Submitted 5 September, 2020; v1 submitted 19 February, 2020;
originally announced February 2020.
-
Dynamic Complex Network Analysis of PM2.5 Concentrations in the UK using Hierarchical Directed Graphs
Authors:
Parya Broomandi,
Xueyu Geng,
Weisi Guo,
Jong Kim,
Alessio Pagani,
David Topping
Abstract:
Worldwide exposure to fine atmospheric particles can exasperate the risk of a wide range of heart and respiratory diseases, due to their ability to penetrate deep into the lungs and blood streams. Epidemiological studies in Europe and elsewhere have established the evidence base pointing to the important role of PM2.5 in causing over 4 million deaths per year. Traditional approaches to model atmos…
▽ More
Worldwide exposure to fine atmospheric particles can exasperate the risk of a wide range of heart and respiratory diseases, due to their ability to penetrate deep into the lungs and blood streams. Epidemiological studies in Europe and elsewhere have established the evidence base pointing to the important role of PM2.5 in causing over 4 million deaths per year. Traditional approaches to model atmospheric transportation of particles suffer from high dimensionality from both transport and chemical reaction processes, making multi-sale causal inference challenging. We apply alternative model reduction methods: a data-driven directed graph representation to infer spatial embeddedness and causal directionality. Using PM2.5 concentrations in 14 UK cities over a 12 month period, we construct an undirected correlation and a directed Granger causality network. We show for both reduced-order cases, the UK is divided into two a northern and southern connected city communities, with greater spatial embedding in spring and summer. We go on to infer stability to disturbances via the network trophic coherence parameter, whereby we found that winter had the greatest vulnerability. As a result of our novel graph-based reduced modeling, we are able to represent high-dimensional knowledge into a causal inference and stability framework.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Quadruply Stochastic Gradients for Large Scale Nonlinear Semi-Supervised AUC Optimization
Authors:
Wanli Shi,
Bin Gu,
Xiang Li,
Xiang Geng,
Heng Huang
Abstract:
Semi-supervised learning is pervasive in real-world applications, where only a few labeled data are available and large amounts of instances remain unlabeled. Since AUC is an important model evaluation metric in classification, directly optimizing AUC in semi-supervised learning scenario has drawn much attention in the machine learning community. Recently, it has been shown that one could find an…
▽ More
Semi-supervised learning is pervasive in real-world applications, where only a few labeled data are available and large amounts of instances remain unlabeled. Since AUC is an important model evaluation metric in classification, directly optimizing AUC in semi-supervised learning scenario has drawn much attention in the machine learning community. Recently, it has been shown that one could find an unbiased solution for the semi-supervised AUC maximization problem without knowing the class prior distribution. However, this method is hardly scalable for nonlinear classification problems with kernels. To address this problem, in this paper, we propose a novel scalable quadruply stochastic gradient algorithm (QSG-S2AUC) for nonlinear semi-supervised AUC optimization. In each iteration of the stochastic optimization process, our method randomly samples a positive instance, a negative instance, an unlabeled instance and their random features to compute the gradient and then update the model by using this quadruply stochastic gradient to approach the optimal solution. More importantly, we prove that QSG-S2AUC can converge to the optimal solution in O(1/t), where t is the iteration number. Extensive experimental results on a variety of benchmark datasets show that QSG-S2AUC is far more efficient than the existing state-of-the-art algorithms for semi-supervised AUC maximization while retaining the similar generalization performance.
△ Less
Submitted 29 July, 2019;
originally announced July 2019.
-
Scalable Semi-Supervised SVM via Triply Stochastic Gradients
Authors:
Xiang Geng,
Bin Gu,
Xiang Li,
Wanli Shi,
Guansheng Zheng,
Heng Huang
Abstract:
Semi-supervised learning (SSL) plays an increasingly important role in the big data era because a large number of unlabeled samples can be used effectively to improve the performance of the classifier. Semi-supervised support vector machine (S$^3$VM) is one of the most appealing methods for SSL, but scaling up S$^3$VM for kernel learning is still an open problem. Recently, a doubly stochastic grad…
▽ More
Semi-supervised learning (SSL) plays an increasingly important role in the big data era because a large number of unlabeled samples can be used effectively to improve the performance of the classifier. Semi-supervised support vector machine (S$^3$VM) is one of the most appealing methods for SSL, but scaling up S$^3$VM for kernel learning is still an open problem. Recently, a doubly stochastic gradient (DSG) algorithm has been proposed to achieve efficient and scalable training for kernel methods. However, the algorithm and theoretical analysis of DSG are developed based on the convexity assumption which makes them incompetent for non-convex problems such as S$^3$VM. To address this problem, in this paper, we propose a triply stochastic gradient algorithm for S$^3$VM, called TSGS$^3$VM. Specifically, to handle two types of data instances involved in S$^3$VM, TSGS$^3$VM samples a labeled instance and an unlabeled instance as well with the random features in each iteration to compute a triply stochastic gradient. We use the approximated gradient to update the solution. More importantly, we establish new theoretic analysis for TSGS$^3$VM which guarantees that TSGS$^3$VM can converge to a stationary point. Extensive experimental results on a variety of datasets demonstrate that TSGS$^3$VM is much more efficient and scalable than existing S$^3$VM algorithms.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
Authors:
Kristian Hartikainen,
Xinyang Geng,
Tuomas Haarnoja,
Sergey Levine
Abstract:
Reinforcement learning requires manual specification of a reward function to learn a task. While in principle this reward function only needs to specify the task goal, in practice reinforcement learning can be very time-consuming or even infeasible unless the reward function is shaped so as to provide a smooth gradient towards a successful outcome. This shaping is difficult to specify by hand, par…
▽ More
Reinforcement learning requires manual specification of a reward function to learn a task. While in principle this reward function only needs to specify the task goal, in practice reinforcement learning can be very time-consuming or even infeasible unless the reward function is shaped so as to provide a smooth gradient towards a successful outcome. This shaping is difficult to specify by hand, particularly when the task is learned from raw observations, such as images. In this paper, we study how we can automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state. These dynamical distances can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently. We show that dynamical distances can be used in a semi-supervised regime, where unsupervised interaction with the environment is used to learn the dynamical distances, while a small amount of preference supervision is used to determine the task goal, without any manually engineered reward function or goal examples. We evaluate our method both on a real-world robot and in simulation. We show that our method can learn to turn a valve with a real-world 9-DoF hand, using raw image observations and just ten preference labels, without any other supervision. Videos of the learned skills can be found on the project website: https://sites.google.com/view/dynamical-distance-learning.
△ Less
Submitted 14 February, 2020; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Multi-Modal Graph Interaction for Multi-Graph Convolution Network in Urban Spatiotemporal Forecasting
Authors:
Xu Geng,
Xiyu Wu,
Lingyu Zhang,
Qiang Yang,
Yan Liu,
Jieping Ye
Abstract:
Graph convolution network based approaches have been recently used to model region-wise relationships in region-level prediction problems in urban computing. Each relationship represents a kind of spatial dependency, like region-wise distance or functional similarity. To incorporate multiple relationships into spatial feature extraction, we define the problem as a multi-modal machine learning prob…
▽ More
Graph convolution network based approaches have been recently used to model region-wise relationships in region-level prediction problems in urban computing. Each relationship represents a kind of spatial dependency, like region-wise distance or functional similarity. To incorporate multiple relationships into spatial feature extraction, we define the problem as a multi-modal machine learning problem on multi-graph convolution networks. Leveraging the advantage of multi-modal machine learning, we propose to develop modality interaction mechanisms for this problem, in order to reduce generalization error by reinforcing the learning of multimodal coordinated representations. In this work, we propose two interaction techniques for handling features in lower layers and higher layers respectively. In lower layers, we propose grouped GCN to combine the graph connectivity from different modalities for more complete spatial feature extraction. In higher layers, we adapt multi-linear relationship networks to GCN by exploring the dimension transformation and freezing part of the covariance structure. The adapted approach, called multi-linear relationship GCN, learns more generalized features to overcome the train-test divergence induced by time shifting. We evaluated our model on ridehailing demand forecasting problem using two real-world datasets. The proposed technique outperforms state-of-the art baselines in terms of prediction accuracy, training efficiency, interpretability and model robustness.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks
Authors:
Xue Geng,
Jie Fu,
Bin Zhao,
Jie Lin,
Mohamed M. Sabry Aly,
Christopher Pal,
Vijay Chandrasekhar
Abstract:
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information los…
▽ More
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information loss and thus improve the final performance. It first introduces a quantization scheme with efficient bit-shifting and rounding operations to represent network parameters and activations in low precision. Then it restructures the network architectures to form unified modules for optimization on the quantized model. Extensive experiments on ImageNet and KITTI validate the effectiveness of our model, demonstrating that state-of-the-art results for various tasks can be achieved by this quantized model. Besides, we designed and synthesized an RTL model to measure the hardware costs among various quantization methods. For each quantization operation, it reduces area cost by about 15 times and energy consumption by about 9 times, compared to a strong baseline.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
The basic equation for target detection in remote sensing
Authors:
Xiurui Geng,
Luyan Ji,
Yongchao Zhao
Abstract:
Our research has revealed a hidden relationship among several basic components, which leads to the best target detection result. Further, we have proved that the matched filter (MF) is always superior to the constrained energy minimization (CEM) operator, both of which were originally of parallel importance in the field of target detection for remotely sensed image.
Our research has revealed a hidden relationship among several basic components, which leads to the best target detection result. Further, we have proved that the matched filter (MF) is always superior to the constrained energy minimization (CEM) operator, both of which were originally of parallel importance in the field of target detection for remotely sensed image.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Ridesourcing Car Detection by Transfer Learning
Authors:
Leye Wang,
Xu Geng,
Jintao Ke,
Chen Peng,
Xiaojuan Ma,
Daqing Zhang,
Qiang Yang
Abstract:
Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars…
▽ More
Ridesourcing platforms like Uber and Didi are getting more and more popular around the world. However, unauthorized ridesourcing activities taking advantages of the sharing economy can greatly impair the healthy development of this emerging industry. As the first step to regulate on-demand ride services and eliminate black market, we design a method to detect ridesourcing cars from a pool of cars based on their trajectories. Since licensed ridesourcing car traces are not openly available and may be completely missing in some cities due to legal issues, we turn to transferring knowledge from public transport open data, i.e, taxis and buses, to ridesourcing detection among ordinary vehicles. We propose a two-stage transfer learning framework. In Stage 1, we take taxi and bus data as input to learn a random forest (RF) classifier using trajectory features shared by taxis/buses and ridesourcing/other cars. Then, we use the RF to label all the candidate cars. In Stage 2, leveraging the subset of high confident labels from the previous stage as input, we further learn a convolutional neural network (CNN) classifier for ridesourcing detection, and iteratively refine RF and CNN, as well as the feature set, via a co-training process. Finally, we use the resulting ensemble of RF and CNN to identify the ridesourcing cars in the candidate pool. Experiments on real car, taxi and bus traces show that our transfer learning framework, with no need of a pre-labeled ridesourcing dataset, can achieve similar accuracy as the supervised learning methods.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Probabilistic graphical model based approach for water mapping using GaoFen-2 (GF-2) high resolution imagery and Landsat 8 time series
Authors:
Luyan Ji,
Jie Wang,
Xiurui Geng,
Peng Gong
Abstract:
The objective of this paper is to evaluate the potential of Gaofen-2 (GF-2) high resolution multispectral sensor (MS) and panchromatic (PAN) imagery on water mapping. Difficulties of water mapping on high resolution data includes: 1) misclassification between water and shadows or other low-reflectance ground objects, which is mostly caused by the spectral similarity within the given band range; 2)…
▽ More
The objective of this paper is to evaluate the potential of Gaofen-2 (GF-2) high resolution multispectral sensor (MS) and panchromatic (PAN) imagery on water mapping. Difficulties of water mapping on high resolution data includes: 1) misclassification between water and shadows or other low-reflectance ground objects, which is mostly caused by the spectral similarity within the given band range; 2) small water bodies with size smaller than the spatial resolution of MS image. To solve the confusion between water and low-reflectance objects, the Landsat 8 time series with two shortwave infrared (SWIR) bands is added because water has extremely strong absorption in SWIR. In order to integrate the three multi-sensor, multi-resolution data sets, the probabilistic graphical model (PGM) is utilized here with conditional probability distribution defined mainly based on the size of each object. For comparison, results from the SVM classifier on the PCA fused and MS data, thresholding method on the PAN image, and water index method on the Landsat data are computed. The confusion matrices are calculated for all the methods. The results demonstrate that the PGM method can achieve the best performance with the highest overall accuracy. Moreover, small rivers can also be extracted by adding weight on the PAN result in PGM. Finally, the post-classification procedure is applied on the PGM result to further exclude misclassification in shadow and water-land boundary regions. Accordingly, the producer's, user's and overall accuracy are all increased, indicating the effectiveness of our method.
△ Less
Submitted 21 December, 2016;
originally announced December 2016.
-
MF is always superior to CEM
Authors:
Xiurui Geng,
Luyan Ji,
Weitun Yang,
Fuxiang Wang,
Yongchao Zhao
Abstract:
The constrained energy minimization (CEM) and matched filter (MF) are two most frequently used target detection algorithms in the remotely sensed community. In this paper, we first introduce an augmented CEM (ACEM) by adding an all-one band. According to a recently published conclusion that CEM can always achieve a better performance by adding any linearly independent bands, ACEM is better than CE…
▽ More
The constrained energy minimization (CEM) and matched filter (MF) are two most frequently used target detection algorithms in the remotely sensed community. In this paper, we first introduce an augmented CEM (ACEM) by adding an all-one band. According to a recently published conclusion that CEM can always achieve a better performance by adding any linearly independent bands, ACEM is better than CEM. Further, we prove that ACEM is mathematically equivalent to MF. As a result, we can conclude that the classical matched filter (MF) is always superior to the CEM operator.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.
-
Clustering by connection center evolution
Authors:
Xiurui Geng,
Hairong Tang
Abstract:
The determination of cluster centers generally depends on the scale that we use to analyze the data to be clustered. Inappropriate scale usually leads to unreasonable cluster centers and thus unreasonable results. In this study, we first consider the similarity of elements in the data as the connectivity of nodes in an undirected graph, then present the concept of a connection center and regard it…
▽ More
The determination of cluster centers generally depends on the scale that we use to analyze the data to be clustered. Inappropriate scale usually leads to unreasonable cluster centers and thus unreasonable results. In this study, we first consider the similarity of elements in the data as the connectivity of nodes in an undirected graph, then present the concept of a connection center and regard it as the cluster center of the data. Based on this definition, the determination of cluster centers and the assignment of class are very simple, natural and effective. One more crucial finding is that the cluster centers of different scales can be obtained easily by the different powers of a similarity matrix and the change of power from small to large leads to the dynamic evolution of cluster centers from local (microscopic) to global (microscopic). Further, in this process of evolution, the number of categories changes discontinuously, which means that the presented method can automatically skip the unreasonable number of clusters, suggest appropriate observation scales and provide corresponding cluster results.
△ Less
Submitted 19 October, 2016;
originally announced October 2016.