Skip to main content

Showing 1–29 of 29 results for author: Hu, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.09054  [pdf

    cs.HC cs.CY stat.AP

    EcoSphere: A Decision-Support Tool for Automated Carbon Emission and Cost Optimization in Sustainable Urban Development

    Authors: Siavash Ghorbany, Ming Hu, Siyuan Yao, Matthew Sisk, Chaoli Wang

    Abstract: The construction industry is a major contributor to global greenhouse gas emissions, with embodied carbon being a key component. This study develops EcoSphere, an innovative software designed to evaluate and balance embodied and operational carbon emissions with construction and environmental costs in urban planning. Using high-resolution data from the National Structure Inventory, combined with c… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Proc of the 23rd CIB World Building Congress, 19th to 23rd May 2025, Purdue University, West Lafayette, USA

  2. arXiv:2411.05735  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Aioli: A Unified Optimization Framework for Language Model Data Mixing

    Authors: Mayee F. Chen, Michael Y. Hu, Nicholas Lourie, Kyunghyun Cho, Christopher Ré

    Abstract: Language model performance depends on identifying the optimal mixture of data groups to train on (e.g., law, code, math). Prior work has proposed a diverse set of methods to efficiently learn mixture proportions, ranging from fitting regression models over training runs to dynamically updating proportions throughout training. Surprisingly, we find that no existing method consistently outperforms a… ▽ More

    Submitted 20 April, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: ICLR 2025 Camera Ready

  3. arXiv:2411.05237  [pdf

    cs.LG q-bio.QM stat.AP stat.CO stat.ML

    Pruning the Path to Optimal Care: Identifying Systematically Suboptimal Medical Decision-Making with Inverse Reinforcement Learning

    Authors: Inko Bovenzi, Adi Carmel, Michael Hu, Rebecca M. Hurwitz, Fiona McBride, Leo Benac, José Roberto Tello Ayala, Finale Doshi-Velez

    Abstract: In aims to uncover insights into medical decision-making embedded within observational data from clinical settings, we present a novel application of Inverse Reinforcement Learning (IRL) that identifies suboptimal clinician actions based on the actions of their peers. This approach centers two stages of IRL with an intermediate step to prune trajectories displaying behavior that deviates significa… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: 13 pages, 4 figures

  4. arXiv:2409.04716  [pdf, other

    stat.AP math.ST

    Privacy enhanced collaborative inference in the Cox proportional hazards model for distributed data

    Authors: Mengtong Hu, Xu Shi, Peter X. -K. Song

    Abstract: Data sharing barriers are paramount challenges arising from multicenter clinical studies where multiple data sources are stored in a distributed fashion at different local study sites. Particularly in the case of time-to-event analysis when global risk sets are needed for the Cox proportional hazards model, access to a centralized database is typically necessary. Merging such data sources into a c… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  5. Causal Inference with Latent Variables: Recent Advances and Future Prospectives

    Authors: Yaochen Zhu, Yinhan He, Jing Ma, Mengxuan Hu, Sheng Li, Jundong Li

    Abstract: Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD'24 Survey Track

  6. arXiv:2307.01389  [pdf, other

    cs.LG stat.ME

    Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

    Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  7. arXiv:2303.16532  [pdf, other

    cs.LG q-fin.ST stat.AP

    Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network

    Authors: Min Hu, Zhizhong Tan, Bin Liu, Guosheng Yin

    Abstract: This study aims to address the challenges of futures price prediction in high-frequency trading (HFT) by proposing a continuous learning factor predictor based on graph neural networks. The model integrates multi-factor pricing theories with real-time market dynamics, effectively bypassing the limitations of existing methods that lack financial theory guidance and ignore various trend signals and… ▽ More

    Submitted 19 December, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  8. Statistics for Spatially Stratified Heterogeneous Data

    Authors: Jinfeng Wang, Robert Haining, Tonglin Zhang, Chengdong Xu, Maogui Hu

    Abstract: Spatial statistics is dominated by spatial autocorrelation (SAC) based Kriging and BHM, and spatial local heterogeneity based hotspots and geographical regression methods, appraised as the first and second laws of Geography (Tobler 1970; Goodchild 2004), respectively. Spatial stratified heterogeneity (SSH), the phenomena of a partition that within strata is more similar than between strata, exampl… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Journal ref: Annals of the American Association of Geographers 2024

  9. arXiv:2211.11028  [pdf, other

    stat.ML cs.HC cs.LG

    Algorithmic Decision-Making Safeguarded by Human Knowledge

    Authors: Ningyuan Chen, Ming Hu, Wenhao Li

    Abstract: Commercial AI solutions provide analysts and managers with data-driven business intelligence for a wide range of decisions, such as demand forecasting and pricing. However, human analysts may have their own insights and experiences about the decision-making that is at odds with the algorithmic recommendation. In view of such a conflict, we provide a general analytical framework to study the augmen… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  10. Accelerated Sparse Recovery via Gradient Descent with Nonlinear Conjugate Gradient Momentum

    Authors: Mengqi Hu, Yifei Lou, Bao Wang, Ming Yan, Xiu Yang, Qiang Ye

    Abstract: This paper applies an idea of adaptive momentum for the nonlinear conjugate gradient to accelerate optimization problems in sparse recovery. Specifically, we consider two types of minimization problems: a (single) differentiable function and the sum of a non-smooth function and a differentiable function. In the first case, we adopt a fixed step size to avoid the traditional line search and establi… ▽ More

    Submitted 5 April, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

  11. arXiv:2204.00857  [pdf, other

    stat.ME

    Collaborative causal inference with a distributed data-sharing management

    Authors: Mengtong Hu, Xu Shi, Peter X. -K. Song

    Abstract: Data sharing barriers are paramount challenges arising from multicenter clinical trials where multiple data sources are stored in a distributed fashion at different local study sites. Merging such data sources into a common data storage for a centralized statistical analysis requires a data use agreement, which is often time-consuming. Data merging may become more burdensome when causal inference… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

  12. arXiv:2108.13935  [pdf, other

    stat.ME

    Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework

    Authors: Xu Shi, Kendrick Li, Wang Miao, Mengtong Hu, Eric Tchetgen Tchetgen

    Abstract: Synthetic control (SC) methods are commonly used to estimate the treatment effect on a single treated unit in panel data settings. An SC is a weighted average of control units built to match the treated unit, with weights typically estimated by regressing (summaries of) pre-treatment outcomes and measured covariates of the treated unit to those of the control units. However, it has been establishe… ▽ More

    Submitted 18 February, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: 37 pages, 3 figures. The Supplementary Materials are attached

  13. arXiv:2106.09564  [pdf, other

    cs.CV cs.AI stat.ML

    Knowledge distillation from multi-modal to mono-modal segmentation networks

    Authors: Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, Pietro Gori

    Abstract: The joint use of multiple imaging modalities for medical image segmentation has been widely studied in recent years. The fusion of information from different modalities has demonstrated to improve the segmentation accuracy, with respect to mono-modal segmentations, in several applications. However, acquiring multiple modalities is usually not possible in a clinical setting due to a limited number… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: MICCAI 2020

    Journal ref: MICCAI 2020

  14. arXiv:2101.04890  [pdf, ps, other

    stat.CO math.OC

    A general framework of rotational sparse approximation in uncertainty quantification

    Authors: Mengqi Hu, Yifei Lou, Xiu Yang

    Abstract: This paper proposes a general framework to estimate coefficients of generalized polynomial chaos (gPC) used in uncertainty quantification via rotational sparse approximation. In particular, we aim to identify a rotation matrix such that the gPC expansion of a set of random variables after the rotation has a sparser representation. However, this rotational approach alters the underlying linear syst… ▽ More

    Submitted 17 September, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

  15. Seasonal association between viral causes of hospitalised acute lower respiratory infections and meteorological factors in China: a retrospective study

    Authors: Bing Xu, Jinfeng Wang, Zhongjie Li, Chengdong Xu, Yilan Liao, Maogui Hu, Jing Yang, Shengjie Lai, Liping Wang, Weizhong Yang

    Abstract: Acute lower respiratory infections caused by respiratory viruses are common and persistent infectious diseases worldwide and in China, which have pronounced seasonal patterns. Meteorological factors have important roles in the seasonality of some major viruses. Our aim was to identify the dominant meteorological factors and to model their effects on common respiratory viruses in different regions… ▽ More

    Submitted 15 April, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: 6 figures and tables

    Journal ref: The Lancet Planetary Health, 2021

  16. arXiv:2008.12003  [pdf, other

    cs.LG cs.AI stat.ML

    Predicting conversions in display advertising based on URL embeddings

    Authors: Yang Qiu, Nikolaos Tziortziotis, Martial Hue, Michalis Vazirgiannis

    Abstract: Online display advertising is growing rapidly in recent years thanks to the automation of the ad buying process. Real-time bidding (RTB) allows the automated trading of ad impressions between advertisers and publishers through real-time auctions. In order to increase the effectiveness of their campaigns, advertisers should deliver ads to the users who are highly likely to be converted (i.e., purch… ▽ More

    Submitted 28 August, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: Accepted at AdKDD 2020 workshop at KDD'20 conference, San Diego, USA

  17. arXiv:2004.04835  [pdf, other

    stat.AP

    COVID-19 in a social reinsurance framework: Forewarned is forearmed

    Authors: S. Sahin, M. C. Boado-Penas, C. Constantinescu, J. Eisenberg, K. Henshaw, M. Hu, J. Wang, W. Zhu

    Abstract: The crisis caused by COVID-19 revealed the global unpreparedness to handle the impact of a pandemic. In this paper, we present a statistical analysis of the data related to the COVID-19 outbreak in China, specifically the infection speed, death and fatality rates in Hubei province. By fitting distributions of these quantities we design a parametric reinsurance contract whose trigger and cap are ba… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  18. arXiv:1911.05020  [pdf

    cs.LG cs.NE stat.ML

    Generative adversarial networks (GAN) based efficient sampling of chemical space for inverse design of inorganic materials

    Authors: Yabo Dan, Yong Zhao, Xiang Li, Shaobo Li, Ming Hu, Jianjun Hu

    Abstract: A major challenge in materials design is how to efficiently search the vast chemical design space to find the materials with desired properties. One effective strategy is to develop sampling algorithms that can exploit both explicit chemical knowledge and implicit composition rules embodied in the large materials database. Here, we propose a generative machine learning model (MatGAN) based on a ge… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: 15 pages

    Journal ref: npj Comput Mater 6, 84 (2020)

  19. arXiv:1905.08495  [pdf, other

    cs.LG cs.AI stat.ML

    Exploring Bias in GAN-based Data Augmentation for Small Samples

    Authors: Mengxiao Hu, Jinlong Li

    Abstract: For machine learning task, lacking sufficient samples mean the trained model has low confidence to approach the ground truth function. Until recently, after the generative adversarial networks (GAN) had been proposed, we see the hope of small samples data augmentation (DA) with realistic fake data, and many works validated the viability of GAN-based DA. Although most of the works pointed out highe… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

    Comments: rejected by SIGKDD 2019

  20. arXiv:1903.02785  [pdf, other

    cs.LG stat.ML

    Doubly Aligned Incomplete Multi-view Clustering

    Authors: Menglei Hu, Songcan Chen

    Abstract: Nowadays, multi-view clustering has attracted more and more attention. To date, almost all the previous studies assume that views are complete. However, in reality, it is often the case that each view may contain some missing instances. Such incompleteness makes it impossible to directly use traditional multi-view clustering methods. In this paper, we propose a Doubly Aligned Incomplete Multi-view… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: 8 pages, IJCAI2018

  21. arXiv:1903.00637  [pdf, other

    cs.LG stat.ML

    One-Pass Incomplete Multi-view Clustering

    Authors: Menglei Hu, Songcan Chen

    Abstract: Real data are often with multiple modalities or from multiple heterogeneous sources, thus forming so-called multi-view data, which receives more and more attentions in machine learning. Multi-view clustering (MVC) becomes its important paradigm. In real-world applications, some views often suffer from instances missing. Clustering on such multi-view datasets is called incomplete multi-view cluster… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

    Comments: 9 pages, published in the AAAI 2019

  22. Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling

    Authors: Randell Cotta, Mingyang Hu, Dan Jiang, Peizhou Liao

    Abstract: We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying use… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Accepted by WSDM 2019

  23. arXiv:1811.04662  [pdf

    cs.LG eess.SP q-bio.NC stat.ML

    Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis

    Authors: Navin Cooray, Fernando Andreotti, Christine Lo, Mkael Symmonds, Michele T. M. Hu, Maarten De Vos

    Abstract: Evidence suggests Rapid-Eye-Movement (REM) Sleep Behaviour Disorder (RBD) is an early predictor of Parkinson's disease. This study proposes a fully-automated framework for RBD detection consisting of automated sleep staging followed by RBD identification. Analysis was assessed using a limited polysomnography montage from 53 participants with RBD and 53 age-matched healthy controls. Sleep stage cla… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: 20 pages, 3 figures

  24. arXiv:1811.01382  [pdf, other

    cs.LG cs.CL stat.ML

    Neural CRF transducers for sequence labeling

    Authors: Kai Hu, Zhijian Ou, Min Hu, Junlan Feng

    Abstract: Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling. Various linear-chain neural CRFs (NCRFs) are developed to implement the non-linear node potentials in CRFs, but still keeping the linear-chain hidden structure. In this paper, we propose NCRF transducers, which consists of two RNNs, one extracting features from observations and the ot… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

  25. arXiv:1810.02225  [pdf, other

    cs.NE cs.ET cs.LG stat.ML

    Memristor-based Deep Convolution Neural Network: A Case Study

    Authors: Fan Zhang, Miao Hu

    Abstract: In this paper, we firstly introduce a method to efficiently implement large-scale high-dimensional convolution with realistic memristor-based circuit components. An experiment verified simulator is adapted for accurate prediction of analog crossbar behavior. An improved conversion algorithm is developed to convert convolution kernels to memristor-based circuits, which minimizes the error with cons… ▽ More

    Submitted 14 September, 2018; originally announced October 2018.

  26. arXiv:1809.00852  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-target Unsupervised Domain Adaptation without Exactly Shared Categories

    Authors: Huanhuan Yu, Menglei Hu, Songcan Chen

    Abstract: Unsupervised domain adaptation (UDA) aims to learn the unlabeled target domain by transferring the knowledge of the labeled source domain. To date, most of the existing works focus on the scenario of one source domain and one target domain (1S1T), and just a few works concern the scenario of multiple source domains and one target domain (mS1T). While, to the best of our knowledge, almost no work c… ▽ More

    Submitted 17 September, 2018; v1 submitted 4 September, 2018; originally announced September 2018.

  27. arXiv:1709.02675  [pdf, ps, other

    stat.ME

    Modeling Coefficient Alpha for Measurement of Individualized Test Score Internal Consistency

    Authors: Molei Liu, Ming Hu, Xiaohua Zhou

    Abstract: A method for measuring individualized reliability of several tests on subjects with heterogenecity is proposed. A regression model is developed based on three sets of generalized estimating equations (GEE). The first set of GEE models the expectation of the responses, the second set of GEE models the response's variance, and the third set is proposed to estimate the individualized coefficient alph… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

  28. arXiv:1704.02007  [pdf

    stat.ML q-bio.QM

    DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

    Authors: Zhe Sun, Ting Wang, Ke Deng, Xiao-Feng Wang, Robert Lafyatis, Ying Ding, Ming Hu, Wei Chen

    Abstract: Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the t… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

  29. Detection of treatment effects by covariate-adjusted expected shortfall

    Authors: Xuming He, Ya-Hui Hsu, Mingxiu Hu

    Abstract: The statistical tests that are commonly used for detecting mean or median treatment effects suffer from low power when the two distribution functions differ only in the upper (or lower) tail, as in the assessment of the Total Sharp Score (TSS) under different treatments for rheumatoid arthritis. In this article, we propose a more powerful test that detects treatment effects through the expected sh… ▽ More

    Submitted 7 January, 2011; originally announced January 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS347 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS347

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 4, 2114-2125