Skip to main content

Showing 1–42 of 42 results for author: Zhu, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.03120  [pdf

    stat.AP cs.LG

    Validating remotely sensed biomass estimates with forest inventory data in the western US

    Authors: Xiuyu Cao, Joseph O. Sexton, Panshi Wang, Dimitrios Gounaridis, Neil H. Carter, Kai Zhu

    Abstract: Monitoring aboveground biomass (AGB) and its density (AGBD) at high resolution is essential for carbon accounting and ecosystem management. While NASA's spaceborne Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission provides globally distributed reference measurements for AGBD estimation, the majority of commercial remote sensing products based on GEDI remain without rigorous or independe… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 32 pages, 5 figures

  2. arXiv:2505.08092  [pdf, ps, other

    stat.ME stat.ML

    Doubly Robust Fusion of Many Treatments for Policy Learning

    Authors: Ke Zhu, Jianing Chu, Ilya Lipkovich, Wenyu Ye, Shu Yang

    Abstract: Individualized treatment rules/recommendations (ITRs) aim to improve patient outcomes by tailoring treatments to the characteristics of each individual. However, when there are many treatment groups, existing methods face significant challenges due to data sparsity within treatment groups and highly unbalanced covariate distributions across groups. To address these challenges, we propose a novel c… ▽ More

    Submitted 23 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  3. arXiv:2505.00217  [pdf, other

    stat.ME

    Robust Estimation and Inference in Hybrid Controlled Trials for Binary Outcomes: A Case Study on Non-Small Cell Lung Cancer

    Authors: Jiajun Liu, Ke Zhu, Shu Yang, Xiaofei Wang

    Abstract: Hybrid controlled trials (HCTs), which augment randomized controlled trials (RCTs) with external controls (ECs), are increasingly receiving attention as a way to address limited power, slow accrual, and ethical concerns in clinical research. However, borrowing from ECs raises critical statistical challenges in estimation and inference, especially for binary outcomes where hidden bias is harder to… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  4. Rejoinder to Reader Reaction "On exact randomization-based covariate-adjusted confidence intervals" by Jacob Fiksel

    Authors: Ke Zhu, Hanzhong Liu

    Abstract: We applaud Fiksel (2024) for their valuable contributions to randomization-based inference, particularly their work on inverting the Fisher randomization test (FRT) to construct confidence intervals using the covariate-adjusted test statistic. FRT is advocated by many scholars because it produces finite-sample exact p-values for any test statistic and can be easily adopted for any experimental des… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Published in Biometrics

  5. arXiv:2502.13461  [pdf, other

    q-fin.PM econ.EM stat.ME

    Tensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"

    Authors: Cheng Yu, Zhoufan Zhu, Ke Zhu

    Abstract: Style investing creates asset classes (or the so-called "styles") with low correlations, aligning well with the principle of "Holy Grail of investing" in terms of portfolio selection. The returns of styles naturally form a tensor-valued time series, which requires new tools for studying the dynamics of the conditional correlation matrix to facilitate the aforementioned principle. Towards this goal… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  6. arXiv:2501.18798  [pdf, other

    stat.ME math.ST stat.ML

    Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift

    Authors: Yi Liu, Alexander W. Levis, Ke Zhu, Shu Yang, Peter B. Gilbert, Larry Han

    Abstract: Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research, are underdeveloped. Existing approaches focus on binary or continuous outcomes but fail to address the unique challenges of survival analysis, such as censoring… ▽ More

    Submitted 14 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

  7. arXiv:2501.08945  [pdf, other

    stat.ME

    COADVISE: Covariate Adjustment with Variable Selection in Randomized Controlled Trials

    Authors: Yi Liu, Ke Zhu, Larry Han, Shu Yang

    Abstract: Adjusting for covariates in randomized controlled trials can enhance the credibility and efficiency of treatment effect estimation. However, handling numerous covariates and their complex (non-linear) transformations poses a challenge. Motivated by the case study of the Best Apnea Interventions for Research (BestAIR) trial data from the National Sleep Research Resource (NSRR), where the number of… ▽ More

    Submitted 26 February, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  8. arXiv:2411.08352  [pdf, other

    stat.ME

    Imputation-based randomization tests for randomized experiments with interference

    Authors: Tingxuan Han, Ke Zhu, Hanzhong Liu, Ke Deng

    Abstract: The presence of interference renders classic Fisher randomization tests infeasible due to nuisance unknowns. To address this issue, we propose imputing the nuisance unknowns and computing Fisher randomization p-values multiple times, then averaging them. We term this approach the imputation-based randomization test and provide theoretical results on its asymptotic validity. Our method leverages th… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: 39 pages, 12 figures

  9. arXiv:2410.11713  [pdf, other

    stat.ME

    Enhancing Statistical Validity and Power in Hybrid Controlled Trials: A Randomization Inference Approach with Conformal Selective Borrowing

    Authors: Ke Zhu, Shu Yang, Xiaofei Wang

    Abstract: External controls from historical trials or observational data can augment randomized controlled trials when large-scale randomization is impractical or unethical, such as in drug evaluation for rare diseases. However, non-randomized external controls can introduce biases, and existing Bayesian and frequentist methods may inflate the type I error rate, particularly in small-sample trials where ext… ▽ More

    Submitted 7 May, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted by ICML 2025

  10. arXiv:2408.08483  [pdf, other

    q-fin.PM stat.ML

    Enhancement of price trend trading strategies via image-induced importance weights

    Authors: Zhoufan Zhu, Ke Zhu

    Abstract: We open up the "black-box" to identify the predictive general price patterns in price chart images via the deep learning image analysis techniques. Our identified price patterns lead to the construction of image-induced importance (triple-I) weights, which are applied to weighted moving average the existing price trend trading signals according to their level of importance in predicting price move… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  11. arXiv:2408.01626  [pdf, other

    stat.ME stat.AP

    Weighted Brier Score -- an Overall Summary Measure for Risk Prediction Models with Clinical Utility Consideration

    Authors: Kehao Zhu, Yingye Zheng, Kwun Chuen Gary Chan

    Abstract: As advancements in novel biomarker-based algorithms and models accelerate disease risk prediction and stratification in medicine, it is crucial to evaluate these models within the context of their intended clinical application. Prediction models output the absolute risk of disease; subsequently, patient counseling and shared decision-making are based on the estimated individual risk and cost-benef… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  12. arXiv:2404.18377  [pdf, other

    stat.ME

    Inference for the panel ARMA-GARCH model when both $N$ and $T$ are large

    Authors: Bing Su, Ke Zhu

    Abstract: We propose a panel ARMA-GARCH model to capture the dynamics of large panel data with $N$ individuals over $T$ time periods. For this model, we provide a two-step estimation procedure to estimate the ARMA parameters and GARCH parameters stepwisely. Under some regular conditions, we show that all of the proposed estimators are asymptotically normal with the convergence rate $(NT)^{-1/2}$, and they h… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  13. arXiv:2404.11092  [pdf, ps, other

    econ.EM stat.ME

    Estimation for conditional moment models based on martingale difference divergence

    Authors: Kunyang Song, Feiyu Jiang, Ke Zhu

    Abstract: We provide a new estimation method for conditional moment models via the martingale difference divergence (MDD).Our MDD-based estimation method is formed in the framework of a continuum of unconditional moment restrictions. Unlike the existing estimation methods in this framework, the MDD-based estimation method adopts a non-integrable weighting function, which could grab more information from unc… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  14. arXiv:2401.16667  [pdf, other

    math.ST stat.AP stat.ME

    Sharp variance estimator and causal bootstrap in stratified randomized experiments

    Authors: Haoyang Yu, Ke Zhu, Hanzhong Liu

    Abstract: Randomized experiments are the gold standard for estimating treatment effects, and randomization serves as a reasoned basis for inference. In widely used stratified randomized experiments, randomization-based finite-population asymptotic theory enables valid inference for the average treatment effect, relying on normal approximation and a Neyman-type conservative variance estimator. However, when… ▽ More

    Submitted 16 May, 2025; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by Statistics in Medicine

  15. arXiv:2307.09725  [pdf

    stat.AP

    Global Inequality in Cooling from Urban Green Spaces and its Climate Change Adaptation Potential

    Authors: Yuxiang Li, Jens-Christian Svenning, Weiqi Zhou, Kai Zhu, Jesse F. Abrams, Timothy M. Lenton, Shuqing N. Teng, Robert R. Dunn, Chi Xu

    Abstract: Heat extremes are projected to severely impact humanity and with increasing geographic disparities. Global South countries are more exposed to heat extremes and have reduced adaptation capacity. One documented source of such adaptation inequality is a lack of resources to cool down indoor temperatures. Less is known about the capacity to ameliorate outdoor heat stress. Here, we assess global inequ… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 56 pages, 28 figures

  16. arXiv:2306.05169  [pdf, other

    stat.ME econ.EM

    Matrix GARCH Model: Inference and Application

    Authors: Cheng Yu, Dong Li, Feiyu Jiang, Ke Zhu

    Abstract: Matrix-variate time series data are largely available in applications. However, no attempt has been made to study their conditional heteroskedasticity that is often observed in economic and financial data. To address this gap, we propose a novel matrix generalized autoregressive conditional heteroskedasticity (GARCH) model to capture the dynamics of conditional row and column covariance matrices o… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  17. arXiv:2302.06799  [pdf, ps, other

    stat.ME econ.EM

    Quantiled conditional variance, skewness, and kurtosis by Cornish-Fisher expansion

    Authors: Ningning Zhang, Ke Zhu

    Abstract: The conditional variance, skewness, and kurtosis play a central role in time series analysis. These three conditional moments (CMs) are often studied by some parametric models but with two big issues: the risk of model mis-specification and the instability of model estimation. To avoid the above two issues, this paper proposes a novel method to estimate these three CMs by the so-called quantiled C… ▽ More

    Submitted 6 June, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  18. arXiv:2301.11697  [pdf, other

    stat.ML cs.LG

    Big portfolio selection by graph-based conditional moments method

    Authors: Zhoufan Zhu, Ningning Zhang, Ke Zhu

    Abstract: How to do big portfolio selection is very important but challenging for both researchers and practitioners. In this paper, we propose a new graph-based conditional moments (GRACE) method to do portfolio selection based on thousands of stocks or more. The GRACE method first learns the conditional quantiles and mean of stock returns via a factor-augmented temporal graph convolutional network, which… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: 35 pages

    MSC Class: 62M10; 68T07; 91G10

  19. arXiv:2301.06658  [pdf, other

    econ.EM stat.ME

    Statistical inference for the logarithmic spatial heteroskedasticity model with exogenous variables

    Authors: Bing Su, Fukang Zhu, Ke Zhu

    Abstract: The spatial dependence in mean has been well studied by plenty of models in a large strand of literature, however, the investigation of spatial dependence in variance is lagging significantly behind. The existing models for the spatial dependence in variance are scarce, with neither probabilistic structure nor statistical inference procedure being explored. To circumvent this deficiency, this pape… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

  20. Design-based theory for Lasso adjustment in randomized block experiments and rerandomized experiments

    Authors: Ke Zhu, Hanzhong Liu, Yuehan Yang

    Abstract: Blocking, a special case of rerandomization, is routinely implemented in the design stage of randomized experiments to balance the baseline covariates. This study proposes a regression adjustment method based on the least absolute shrinkage and selection operator (Lasso) to efficiently estimate the average treatment effect in randomized block experiments with high-dimensional covariates. We derive… ▽ More

    Submitted 28 June, 2024; v1 submitted 23 September, 2021; originally announced September 2021.

  21. arXiv:2106.03178  [pdf, other

    cs.AI stat.OT

    Path-specific Effects Based on Information Accounts of Causality

    Authors: Heyang Gong, Ke Zhu

    Abstract: Path-specific effects in mediation analysis provide a useful tool for fairness analysis, which is mostly based on nested counterfactuals. However, the dictum ``no causation without manipulation'' implies that path-specific effects might be induced by certain interventions. This paper proposes a new path intervention inspired by information accounts of causality, and develops the corresponding inte… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  22. Quantifying out-of-station waiting time in oversaturated urban metro systems

    Authors: Kangli Zhu, Zhanhong Cheng, Jianjun Wu, Fuya Yuan, Lijun Sun

    Abstract: Metro systems in megacities such as Beijing, Shenzhen and Guangzhou are under great passenger demand pressure. During peak hours, it is common to see oversaturated conditions (i.e., passenger demand exceeds network capacity), which bring significant operational risks and safety issues. A popular control intervention is to restrict the entering rate during peak hours by setting up out-of-station qu… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Journal ref: Communications in Transportation Research (2022)

  23. arXiv:2103.13051  [pdf, other

    stat.ME

    Pair-switching rerandomization

    Authors: Ke Zhu, Hanzhong Liu

    Abstract: Rerandomization discards assignments with covariates unbalanced in the treatment and control groups to improve estimation and inference efficiency. However, the acceptance-rejection sampling method used in rerandomization is computationally inefficient. As a result, it is time-consuming for rerandomization to draw numerous independent assignments, which are necessary for performing Fisher randomiz… ▽ More

    Submitted 25 June, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

  24. arXiv:2009.09462  [pdf, other

    stat.ME

    Confidence intervals for parameters in high-dimensional sparse vector autoregression

    Authors: Ke Zhu, Hanzhong Liu

    Abstract: Vector autoregression (VAR) models are widely used to analyze the interrelationship between multiple variables over time. Estimation and inference for the transition matrices of VAR models are crucial for practitioners to make decisions in fields such as economics and finance. However, when the number of variables is larger than the sample size, it remains a challenge to perform statistical infere… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

  25. arXiv:2008.00747  [pdf, ps, other

    econ.EM stat.ME

    Testing error distribution by kernelized Stein discrepancy in multivariate time series models

    Authors: Donghang Luo, Ke Zhu, Huan Gong, Dong Li

    Abstract: Knowing the error distribution is important in many multivariate time series applications. To alleviate the risk of error distribution mis-specification, testing methodologies are needed to detect whether the chosen error distribution is correct. However, the majority of the existing tests only deal with the multivariate normal distribution for some special multivariate time series models, and the… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  26. arXiv:2004.14164  [pdf, other

    cs.CL cs.LG stat.ML

    MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data

    Authors: Xiaoqing Geng, Xiwen Chen, Kenny Q. Zhu, Libin Shen, Yinggong Zhao

    Abstract: Few-shot relation classification seeks to classify incoming query instances after meeting only few support instances. This ability is gained by training with large amount of in-domain annotated data. In this paper, we tackle an even harder problem by further limiting the amount of data available at training time. We propose a few-shot learning framework for relation classification, which is partic… ▽ More

    Submitted 14 December, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

    Journal ref: CIKM 2020: The 29th ACM International Conference on Information and Knowledge Management

  27. arXiv:2004.09161  [pdf, ps, other

    econ.EM stat.ME

    Multi-frequency-band tests for white noise under heteroskedasticity

    Authors: Mengya Liu, Fukan Zhu, Ke Zhu

    Abstract: This paper proposes a new family of multi-frequency-band (MFB) tests for the white noise hypothesis by using the maximum overlap discrete wavelet packet transform (MODWPT). The MODWPT allows the variance of a process to be decomposed into the variance of its components on different equal-length frequency sub-bands, and the MFB tests then measure the distance between the MODWPT-based variance ratio… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  28. arXiv:2004.01358  [pdf, other

    cs.LG stat.ML

    Unpack Local Model Interpretation for GBDT

    Authors: Wenjing Fang, Jun Zhou, Xiaolong Li, Kenny Q. Zhu

    Abstract: A gradient boosting decision tree (GBDT), which aggregates a collection of single weak learners (i.e. decision trees), is widely used for data mining tasks. Because GBDT inherits the good performance from its ensemble essence, much attention has been drawn to the optimization of this model. With its popularization, an increasing need for model interpretation arises. Besides the commonly used featu… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 12 pages, 5 figures

  29. arXiv:2002.11982  [pdf, other

    cs.LG stat.ML

    Adapted tree boosting for Transfer Learning

    Authors: Wenjing Fang, Chaochao Chen, Bowen Song, Li Wang, Jun Zhou, Kenny Q. Zhu

    Abstract: Secure online transaction is an essential task for e-commerce platforms. Alipay, one of the world's leading cashless payment platform, provides the payment service to both merchants and individual customers. The fraud detection models are built to protect the customers, but stronger demands are raised by the new scenes, which are lacking in training data and labels. The proposed model makes a diff… ▽ More

    Submitted 2 April, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    ACM Class: I.2.6

  30. arXiv:1911.09343  [pdf, other

    econ.EM stat.ME

    Hybrid quantile estimation for asymmetric power GARCH models

    Authors: Guochang Wang, Ke Zhu, Guodong Li, Wai Keung Li

    Abstract: Asymmetric power GARCH models have been widely used to study the higher order moments of financial returns, while their quantile estimation has been rarely investigated. This paper introduces a simple monotonic transformation on its conditional quantile function to make the quantile regression tractable. The asymptotic normality of the resulting quantile estimators is established under either stat… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  31. arXiv:1907.04147  [pdf, ps, other

    stat.ME econ.EM

    Adaptive inference for a semiparametric generalized autoregressive conditional heteroskedasticity model

    Authors: Feiyu Jiang, Dong Li, Ke Zhu

    Abstract: This paper considers a semiparametric generalized autoregressive conditional heteroskedasticity (S-GARCH) model. For this model, we first estimate the time-varying long run component for unconditional variance by the kernel estimator, and then estimate the non-time-varying parameters in GARCH-type short run component by the quasi maximum likelihood estimator (QMLE). We show that the QMLE is asympt… ▽ More

    Submitted 2 October, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

  32. arXiv:1906.06357  [pdf, other

    cs.NI cs.LG stat.ML

    Data-Driven Machine Learning Techniques for Self-healing in Cellular Wireless Networks: Challenges and Solutions

    Authors: Tao Zhang, Kun Zhu, Ekram Hossain

    Abstract: For enabling automatic deployment and management of cellular networks, the concept of self-organizing network (SON) was introduced. SON capabilities can enhance network performance, improve service quality, and reduce operational and capital expenditure (OPEX/CAPEX). As an important component in SON, self-healing is defined as a network paradigm where the faults of target networks are mitigated or… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  33. arXiv:1905.01798  [pdf, ps, other

    econ.EM stat.ME

    Non-standard inference for augmented double autoregressive models with null volatility coefficients

    Authors: Feiyu Jiang, Dong Li, Ke Zhu

    Abstract: This paper considers an augmented double autoregressive (DAR) model, which allows null volatility coefficients to circumvent the over-parameterization problem in the DAR model. Since the volatility coefficients might be on the boundary, the statistical inference methods based on the Gaussian quasi-maximum likelihood estimation (GQMLE) become non-standard, and their asymptotics require the data to… ▽ More

    Submitted 5 May, 2019; originally announced May 2019.

  34. arXiv:1903.12077  [pdf, ps, other

    math.ST econ.EM stat.ME

    Time series models for realized covariance matrices based on the matrix-F distribution

    Authors: Jiayuan Zhou, Feiyu Jiang, Ke Zhu, Wai Keung Li

    Abstract: We propose a new Conditional BEKK matrix-F (CBF) model for the time-varying realized covariance (RCOV) matrices. This CBF model is capable of capturing heavy-tailed RCOV, which is an important stylized fact but could not be handled adequately by the Wishart-based models. To further mimic the long memory feature of the RCOV, a special CBF model with the conditional heterogeneous autoregressive (HAR… ▽ More

    Submitted 9 July, 2020; v1 submitted 26 March, 2019; originally announced March 2019.

  35. arXiv:1903.09639  [pdf, other

    cs.CY cs.LG stat.ML

    Understanding Childhood Vulnerability in The City of Surrey

    Authors: Cody Griffith, Varoon Mathur, Catherine Lin, Kevin Zhu

    Abstract: Understanding the community conditions that best support universal access and improved childhood outcomes allows ultimately to improve decision-making in the areas of planning and investment across the early stages of childhood development. Here we describe two different data-driven approaches to visualizing the lived experiences of children throughout the City of Surrey, combining data derived fr… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

  36. arXiv:1804.09866  [pdf, ps, other

    stat.ME econ.EM

    New HSIC-based tests for independence between two stationary multivariate time series

    Authors: Guochang Wang, Wai Keung Li, Ke Zhu

    Abstract: This paper proposes some novel one-sided omnibus tests for independence between two multivariate stationary time series. These new tests apply the Hilbert-Schmidt independence criterion (HSIC) to test the independence between the innovations of both time series. Under regular conditions, the limiting null distributions of our HSIC-based tests are established. Next, our HSIC-based tests are shown t… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

  37. arXiv:1804.02348  [pdf, ps, other

    stat.ME econ.EM

    Statistical inference for autoregressive models under heteroscedasticity of unknown form

    Authors: Ke Zhu

    Abstract: This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementat… ▽ More

    Submitted 8 August, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

  38. arXiv:1711.05225  [pdf, other

    cs.CV cs.LG stat.ML

    CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

    Authors: Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

    Abstract: We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on w… ▽ More

    Submitted 25 December, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

  39. arXiv:1601.00062  [pdf, other

    stat.ML cs.LG math.OC

    Practical Algorithms for Learning Near-Isometric Linear Embeddings

    Authors: Jerry Luo, Kayla Shapiro, Hao-Jun Michael Shi, Qi Yang, Kan Zhu

    Abstract: We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points $\mathcal{X}$, we consider the secant set $S(\mathcal{X})$ that consists of all pairwise difference vectors of $\mathcal{X}$, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite… ▽ More

    Submitted 22 April, 2016; v1 submitted 1 January, 2016; originally announced January 2016.

    MSC Class: 90C90

  40. arXiv:1403.1600  [pdf, other

    stat.ML cs.IT cs.LG

    Collaborative Filtering with Information-Rich and Information-Sparse Entities

    Authors: Kai Zhu, Rui Wu, Lei Ying, R. Srikant

    Abstract: In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items a… ▽ More

    Submitted 6 March, 2014; originally announced March 2014.

  41. arXiv:1310.0512  [pdf, other

    stat.ML

    Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

    Authors: Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

    Abstract: In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a sm… ▽ More

    Submitted 4 February, 2014; v1 submitted 1 October, 2013; originally announced October 2013.

  42. arXiv:1211.2073  [pdf, ps, other

    cs.LG cs.CE q-bio.QM stat.ML

    LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data

    Authors: Yang Lu, Mengying Wang, Kenny Q. Zhu, Bo Yuan

    Abstract: LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlapping commu… ▽ More

    Submitted 9 November, 2012; originally announced November 2012.

    Comments: 2 pages