Skip to main content

Showing 1–13 of 13 results for author: Dai, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.07347  [pdf, other

    stat.ML cs.LG math.PR

    Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

    Authors: Yueying Li, Jim Dai, Tianyi Peng

    Abstract: As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have focused on system-level engineering, little is explored from a mathematical modeling and queuing perspective. In this paper, we aim to develop the queuing fundamentals for large language model (LLM) inference, bridging the gap bet… ▽ More

    Submitted 24 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  2. arXiv:2410.20293  [pdf

    cs.LG stat.ML

    A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

    Authors: Yunchong Liu, Xiaorui Shen, Yeyubei Zhang, Zhongyan Wang, Yexin Tian, Jianglai Dai, Yuchen Cao

    Abstract: Social media platforms like Twitter, Facebook, and Instagram have facilitated the spread of misinformation, necessitating automated detection systems. This systematic review evaluates 36 studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media. Using the Prediction model Risk Of Bias ASsessment Tool (PROBAST), the review id… ▽ More

    Submitted 9 March, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

  3. arXiv:2306.15032  [pdf, other

    stat.ME

    DMseg: a Python algorithm for de novo detection of differentially or variably methylated regions

    Authors: Xiaoyu Wang, Ming Yu, William Grady, Ziding Feng, Wei Sun, James Y Dai

    Abstract: Detecting and assessing statistical significance of differentially methylated regions (DMRs) is a fundamental task in methylome association studies. While the average differential methylation in different phenotype groups has been the inferential focus, methylation changes in chromosomal regions may also present as differential variability, i.e., variably methylated regions (VMRs). Testing statist… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  4. arXiv:2306.14826  [pdf, other

    stat.ME

    Incorporating increased variability in testing for cancer DNA methylation

    Authors: James Y. Dai, Heng Chen, Xiaoyu Wang, Wei Sun, Ying Huang, William M. Grady, Ziding Feng

    Abstract: Cancer development is associated with aberrant DNA methylation, including increased stochastic variability. Statistical tests for discovering cancer methylation biomarkers have focused on changes in mean methylation. To improve the power of detection, we propose to incorporate increased variability in testing for cancer differential methylation by two joint constrained tests: one for differential… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  5. arXiv:2305.17187  [pdf, other

    stat.ME cs.DS

    Clip-OGD: An Experimental Design for Adaptive Neyman Allocation in Sequential Experiments

    Authors: Jessica Dai, Paula Gradu, Christopher Harshaw

    Abstract: From clinical development of cancer therapies to investigations into partisan bias, adaptive sequential designs have become increasingly popular method for causal inference, as they offer the possibility of improved precision over their non-adaptive counterparts. However, even in simple settings (e.g. two treatments) the extent to which adaptive designs can improve precision is not sufficiently we… ▽ More

    Submitted 13 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  6. arXiv:2302.01367  [pdf, other

    stat.ML cs.LG stat.ME

    Augmented Learning of Heterogeneous Treatment Effects via Gradient Boosting Trees

    Authors: Heng Chen, Michael L. LeBlanc, James Y. Dai

    Abstract: Heterogeneous treatment effects (HTE) based on patients' genetic or clinical factors are of significant interest to precision medicine. Simultaneously modeling HTE and corresponding main effects for randomized clinical trials with high-dimensional predictive markers is challenging. Motivated by the modified covariates approach, we propose a two-stage statistical learning procedure for estimating H… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  7. arXiv:2211.03611  [pdf, other

    stat.ME math.PR stat.AP

    Heavy-Tailed Loss Frequencies from Mixtures of Negative Binomial and Poisson Counts

    Authors: Jiansheng Dai, Ziheng Huang, Michael R. Powers, Jiaxin Xu

    Abstract: Heavy-tailed random variables have been used in insurance research to model both loss frequencies and loss severities, with substantially more emphasis on the latter. In the present work, we take a step toward addressing this imbalance by exploring the class of heavy-tailed frequency models formed by continuous mixtures of Negative Binomial and Poisson random variables. We begin by defining the co… ▽ More

    Submitted 10 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

    MSC Class: 60E05; 60E10

  8. arXiv:2011.03654  [pdf, other

    cs.CY cs.LG stat.ML

    Fair Machine Learning Under Partial Compliance

    Authors: Jessica Dai, Sina Fazelpour, Zachary C. Lipton

    Abstract: Typically, fair machine learning research focuses on a single decisionmaker and assumes that the underlying population is stationary. However, many of the critical domains motivating this work are characterized by competitive marketplaces with many decisionmakers. Realistically, we might expect only a subset of them to adopt any non-compulsory fairness-conscious policy, a situation that political… ▽ More

    Submitted 26 September, 2022; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Presented at AIES 2021; previously at the NeurIPS 2020 Workshop on Consequential Decision Making in Dynamic Environments and the NeurIPS 2020 Workshop on ML for Economic Policy. Minor correction uploaded Sept. 2022

  9. arXiv:2008.06200  [pdf, ps, other

    math.PR stat.AP

    Characterizing the Zeta Distribution via Continuous Mixtures

    Authors: Jiansheng Dai, Ziheng Huang, Michael R. Powers, Jiaxin Xu

    Abstract: We offer two novel characterizations of the Zeta distribution: first, as tractable continuous mixtures of Negative Binomial distributions (with fixed shape parameter, r > 0), and second, as a tractable continuous mixture of Poisson distributions. In both the Negative Binomial case for r >= 1 and the Poisson case, the resulting Zeta distributions are identifiable because each mixture can be associa… ▽ More

    Submitted 4 June, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    MSC Class: 60E05 (Primary); 62E10 (Secondary)

  10. arXiv:2007.12070  [pdf, other

    cs.CR cs.LG stat.ML

    Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

    Authors: Chuanshuai Chen, Jiazhu Dai

    Abstract: It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text class… ▽ More

    Submitted 14 March, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

  11. arXiv:1911.01172  [pdf

    cs.LG stat.ML

    Fast-UAP: An Algorithm for Speeding up Universal Adversarial Perturbation Generation with Orientation of Perturbation Vectors

    Authors: Jiazhu Dai, Le Shu

    Abstract: Convolutional neural networks (CNN) have become one of the most popular machine learning tools and are being applied in various tasks, however, CNN models are vulnerable to universal perturbations, which are usually human-imperceptible but can cause natural images to be misclassified with high probability. One of the state-of-the-art algorithms to generate universal perturbations is known as UAP.… ▽ More

    Submitted 6 January, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: 9 pages, 7 figures, 1 table, 1 algorithm

    MSC Class: I.2.0 ACM Class: I.2.0

  12. arXiv:1903.11385  [pdf, ps, other

    eess.SP cs.LG stat.ML

    Signal Demodulation with Machine Learning Methods for Physical Layer Visible Light Communications: Prototype Platform, Open Dataset and Algorithms

    Authors: Shuai Ma, Jiahui Dai, Songtao Lu, Hang Li, Han Zhang, Chun Du, Shiyin Li

    Abstract: In this paper, we investigate the design and implementation of machine learning (ML) based demodulation methods in the physical layer of visible light communication (VLC) systems. We build a flexible hardware prototype of an end-to-end VLC system, from which the received signals are collected as the real data. The dataset is available online, which contains eight types of modulated signals. Then,… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

  13. arXiv:1010.4686  [pdf, ps, other

    stat.ME q-bio.GN

    Structures and Assumptions: Strategies to Harness Gene $\times$ Gene and Gene $\times$ Environment Interactions in GWAS

    Authors: Charles Kooperberg, Michael LeBlanc, James Y. Dai, Indika Rajapakse

    Abstract: Genome-wide association studies, in which as many as a million single nucleotide polymorphisms (SNP) are measured on several thousand samples, are quickly becoming a common type of study for identifying genetic factors associated with many phenotypes. There is a strong assumption that interactions between SNPs or genes and interactions between genes and environmental factors substantially contribu… ▽ More

    Submitted 22 October, 2010; originally announced October 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-STS287 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS287

    Journal ref: Statistical Science 2009, Vol. 24, No. 4, 472-488