Skip to main content

Showing 1–50 of 61 results for author: Dong, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.14938  [pdf

    stat.AP cs.LG

    Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding

    Authors: Jiaxuan Jiang, Jiapeng Liu, Miłosz Kadziński, Xiuwu Liao, Jingyu Dong

    Abstract: We introduce a multiple criteria Bayesian preference learning framework incorporating behavioral cues for decision aiding. The framework integrates pairwise comparisons, response time, and attention duration to deepen insights into decision-making processes. The approach employs an additive value function model and utilizes a Bayesian framework to derive the posterior distribution of potential ran… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  2. arXiv:2504.13467  [pdf, ps, other

    stat.ME

    Efficient Estimation under Multiple Missing Patterns via Balancing Weights

    Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

    Abstract: As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and conventional methods cannot be applied. Pattern graphs have recently been proposed as a tool to systematically relate various observed patterns in the sample. We… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.08873

  3. arXiv:2501.00467  [pdf, other

    cs.LG stat.CO

    Score-Based Metropolis-Hastings Algorithms

    Authors: Ahmed Aloui, Ali Hasan, Juncheng Dong, Zihao Wu, Vahid Tarokh

    Abstract: In this paper, we introduce a new approach for integrating score-based models with the Metropolis-Hastings algorithm. While traditional score-based diffusion models excel in accurately learning the score function from data points, they lack an energy function, making the Metropolis-Hastings adjustment step inaccessible. Consequently, the unadjusted Langevin algorithm is often used for sampling usi… ▽ More

    Submitted 31 March, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  4. arXiv:2412.10658  [pdf, ps, other

    stat.ME cs.AI cs.LG

    Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling

    Authors: Jinzong Dong, Zhaohui Jiang, Dong Pan, Haoyang Yu

    Abstract: Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully… ▽ More

    Submitted 18 February, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI-25

  5. arXiv:2404.13056  [pdf, other

    cs.LG cs.CE stat.CO stat.ME stat.ML

    Variational Bayesian Optimal Experimental Design with Normalizing Flows

    Authors: Jiayuan Dong, Christian Jacobsen, Mehdi Khalloufi, Maryam Akram, Wanjiao Liu, Karthik Duraisamy, Xun Huan

    Abstract: Bayesian optimal experimental design (OED) seeks experiments that maximize the expected information gain (EIG) in model parameters. Directly estimating the EIG using nested Monte Carlo is computationally expensive and requires an explicit likelihood. Variational OED (vOED), in contrast, estimates a lower bound of the EIG without likelihood evaluations by approximating the posterior distributions w… ▽ More

    Submitted 27 April, 2025; v1 submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62K05; 94A17; 62C10; 62F15

    Journal ref: Computer Methods in Applied Mechanics and Engineering 433 (2025) 117457

  6. arXiv:2402.08873  [pdf, ps, other

    stat.ME

    Balancing Weights for Non-monotone Missing Data

    Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

    Abstract: Balancing weights have been widely applied to single or monotone missingness due to empirical advantages over likelihood-based methods and inverse probability weighting approaches. This paper considers non-monotone missing data under the complete-case missing variable condition (CCMV), a case of missing not at random (MNAR). Using relationships between each missing pattern and the complete-case su… ▽ More

    Submitted 12 December, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  7. arXiv:2312.16607  [pdf, other

    eess.IV cs.CV stat.ML

    A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

    Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

    Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  8. arXiv:2311.03630  [pdf, other

    cs.LG stat.ME stat.ML

    Counterfactual Data Augmentation with Contrastive Learning

    Authors: Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh

    Abstract: Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a simila… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  9. Clusterability test for categorical data

    Authors: Lianyu Hu, Junjie Dong, Mudi Jiang, Yan Liu, Zengyou He

    Abstract: The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existi… ▽ More

    Submitted 17 December, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: 28 pages, 12 appendix pages, 17 figures

  10. arXiv:2306.13673  [pdf, ps, other

    cs.GT cs.LG stat.ML

    Taming the Exponential Action Set: Sublinear Regret and Fast Convergence to Nash Equilibrium in Online Congestion Games

    Authors: Jing Dong, Jingyu Wu, Siwei Wang, Baoxiang Wang, Wei Chen

    Abstract: The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation. It describes the behavior of a group of agents who share a common set of $F$ facilities and take actions as subsets with $k$ facilities. In this work, we study the online formulation of congestion games, where agents participate in the game repeatedly and observ… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

  11. arXiv:2306.10430  [pdf, other

    stat.ML cs.AI cs.LG stat.CO stat.ME

    Variational Sequential Optimal Experimental Design using Reinforcement Learning

    Authors: Wanggang Shen, Jiayuan Dong, Xun Huan

    Abstract: We present variational sequential optimal experimental design (vsOED), a novel method for optimally designing a finite sequence of experiments within a Bayesian framework with information-theoretic criteria. vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed follo… ▽ More

    Submitted 23 December, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

    MSC Class: 62K05; 62L05; 62C10; 62F15; 90C40

  12. arXiv:2306.07163  [pdf, other

    cs.LG stat.ML

    A Batch-to-Online Transformation under Random-Order Model

    Authors: Jing Dong, Yuichi Yoshida

    Abstract: We introduce a transformation framework that can be utilized to develop online algorithms with low $ε$-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low $ε$-approximate regret. We then demonstrate that offline ap… ▽ More

    Submitted 25 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  13. arXiv:2305.11400  [pdf, other

    cs.LG stat.ML

    Mode-Aware Continual Learning for Conditional Generative Adversarial Networks

    Authors: Cat P. Le, Juncheng Dong, Ahmed Aloui, Vahid Tarokh

    Abstract: The main challenge in continual learning for generative models is to effectively learn new target modes with limited samples while preserving previously learned ones. To this end, we introduce a new continual learning approach for conditional generative adversarial networks by leveraging a mode-affinity score specifically designed for generative modeling. First, the generator produces samples of e… ▽ More

    Submitted 23 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  14. arXiv:2305.07817  [pdf

    stat.AP

    What Risk Factors to Cause Long COVID and Its Impact on Patient Survival Outcomes when Combined with the Effect from Organ Transplantation in the Acute COVID

    Authors: Jianghu Dong

    Abstract: Coronavirus disease 2019 in solid organ transplant (SOT) patients is associated with more severe outcomes than non-immunosuppressed hosts. However, exactly which risk factors cause Long COVID in acute COVID cases remains unknown. More importantly, the impact of Long COVID on patient survival remains understudied, especially when examined alongside the effect of SOT. All patients have been identifi… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  15. arXiv:2303.02550  [pdf, other

    physics.soc-ph stat.AP

    Opinion Dynamics on Complex Networks

    Authors: Jiarui Dong, Yi-Cheng Zhang, Yixiu Kong

    Abstract: Social media has emerged as a significant source of information for people. As agents interact with each other through social media platforms, they create numerous complex social networks. Within these networks, information spread among agents and their opinions may be altered by their neighbors' influence. This paper explores opinion dynamics on social networks, which are influenced by complex ne… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  16. arXiv:2302.03821  [pdf, other

    cs.LG math.OC stat.ME stat.ML

    PASTA: Pessimistic Assortment Optimization

    Authors: Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh

    Abstract: We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimiza… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  17. arXiv:2302.02009  [pdf, other

    cs.LG stat.ML

    Domain Adaptation via Rebalanced Sub-domain Alignment

    Authors: Yiling Liu, Juncheng Dong, Ziyang Jiang, Ahmed Aloui, Keyu Li, Hunter Klein, Vahid Tarokh, David Carlson

    Abstract: Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitati… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 20 pages, 6 figures, 4 tables

  18. arXiv:2210.14707  [pdf, other

    cs.LG stat.ML

    Is Out-of-Distribution Detection Learnable?

    Authors: Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, Feng Liu

    Abstract: Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good general… ▽ More

    Submitted 23 February, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022 Outstanding Paper

  19. arXiv:2210.05122  [pdf, other

    cond-mat.stat-mech stat.AP

    Universal cover-time distribution of heterogeneous random walks

    Authors: Jia-Qi Dong, Wen-Hui Han, Yisen Wang, Xiao-Song Chen, Liang Huang

    Abstract: The cover-time problem, i.e., time to visit every site in a system, is one of the key issues of random walks with wide applications in natural, social, and engineered systems. Addressing the full distribution of cover times for random walk on complex structures has been a long-standing challenge and has attracted persistent efforts. Yet, the known results are essentially limited to homogeneous sys… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 12 pages, 6 figures

  20. arXiv:2210.00380  [pdf, other

    cs.LG stat.ME stat.ML

    Transfer Learning for Individual Treatment Effect Estimation

    Authors: Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh

    Abstract: This work considers the problem of transferring causal knowledge between tasks for Individual Treatment Effect (ITE) estimation. To this end, we theoretically assess the feasibility of transferring ITE knowledge and present a practical framework for efficient transfer. A lower bound is introduced on the ITE error of the target task to demonstrate that ITE knowledge transfer is challenging due to t… ▽ More

    Submitted 5 June, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

  21. arXiv:2209.13841  [pdf, other

    cs.LG stat.ML

    Online Policy Optimization for Robust MDP

    Authors: Jing Dong, Jingwei Li, Baoxiang Wang, Jingzhao Zhang

    Abstract: Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework -- in which the transition probabilities belong to an uncertainty set around a nominal… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  22. arXiv:2209.02690  [pdf, ps, other

    cs.CR cs.CY cs.DS cs.LG stat.ML

    Classification Protocols with Minimal Disclosure

    Authors: Jinshuo Dong, Jason Hartline, Aravindan Vijayaraghavan

    Abstract: We consider multi-party protocols for classification that are motivated by applications such as e-discovery in court proceedings. We identify a protocol that guarantees that the requesting party receives all responsive documents and the sending party discloses the minimal amount of non-responsive documents necessary to prove that all responsive documents have been received. This protocol can be em… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Journal ref: In Proceedings of the 2022 Symposium on Computer Science and Law (CSLAW '22), November 1-2, 2022, Washington, DC, USA. ACM, New York, NY, USA, 10 pages

  23. arXiv:2207.09288  [pdf, other

    stat.AP stat.CO stat.ME

    Expert Elicitation and Data Noise Learning for Material Flow Analysis using Bayesian Inference

    Authors: Jiayuan Dong, Jiankan Liao, Xun Huan, Daniel Cooper

    Abstract: Bayesian inference allows the transparent communication of uncertainty in material flow analyses (MFAs), and a systematic update of uncertainty as new data become available. However, the method is undermined by the difficultly of defining proper priors for the MFA parameters and quantifying the noise in the collected data. We start to address these issues by first deriving and implementing an expe… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: 23 pages of main paper and 10 pages of supporting information

    MSC Class: 62F15; 62P12; 62P30

    Journal ref: Journal of Industrial Ecology 27(2023) 1105-1122

  24. arXiv:2206.03854  [pdf, other

    cs.NE cs.LG stat.ML

    Asymptotic Stability in Reservoir Computing

    Authors: Jonathan Dong, Erik Börve, Mushegh Rafayelyan, Michael Unser

    Abstract: Reservoir Computing is a class of Recurrent Neural Networks with internal weights fixed at random. Stability relates to the sensitivity of the network state to perturbations. It is an important property in Reservoir Computing as it directly impacts performance. In practice, it is desirable to stay in a stable regime, where the effect of perturbations does not explode exponentially, but also close… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  25. arXiv:2205.08098  [pdf, other

    cs.LG stat.ML

    Can We Do Better Than Random Start? The Power of Data Outsourcing

    Authors: Yi Chen, Jing Dong, Xin T. Tong

    Abstract: Many organizations have access to abundant data but lack the computational power to process the data. While they can outsource the computational task to other facilities, there are various constraints on the amount of data that can be shared. It is natural to ask what can data outsourcing accomplish under such constraints. We address this question from a machine learning perspective. When training… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 22 pages, 5 figures

  26. arXiv:2203.03104  [pdf, ps, other

    stat.CO math.PR

    Convergence Speed and Approximation Accuracy of Numerical MCMC

    Authors: Tiangang Cui, Jing Dong, Ajay Jasra, Xin T. Tong

    Abstract: When implementing Markov Chain Monte Carlo (MCMC) algorithms, perturbation caused by numerical errors is sometimes inevitable. This paper studies how perturbation of MCMC affects the convergence speed and Monte Carlo estimation accuracy. Our results show that when the original Markov chain converges to stationarity fast enough and the perturbed transition kernel is a good approximation to the orig… ▽ More

    Submitted 6 March, 2022; originally announced March 2022.

    Comments: 26 pages, 5 figures

    Journal ref: Adv. Appl. Probab. 57 (2025) 101-133

  27. arXiv:2202.13863  [pdf, other

    cs.LG stat.ML

    Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

    Authors: Jing Dong, Li Shen, Yinggan Xu, Baoxiang Wang

    Abstract: We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  28. arXiv:2106.11767  [pdf, other

    cs.CR cs.LG math.ST stat.ML

    Privacy Amplification via Iteration for Shuffled and Online PNSGD

    Authors: Matteo Sordello, Zhiqi Bu, Jinshuo Dong

    Abstract: In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

  29. arXiv:2106.10758  [pdf, other

    physics.optics stat.AP

    Fundamental bounds on the precision of iSCAT, COBRI and dark-field microscopy for 3D localization and mass photometry

    Authors: Jonathan Dong, Dante Maestre, Clara Conrad-Billroth, Thomas Juffmann

    Abstract: Interferometric imaging is an emerging technique for particle tracking and mass photometry. Mass or position are estimated from weak signals, coherently scattered from nanoparticles or single molecules, and interfered with a co-propagating reference. In this work, we perform a statistical analysis and derive lower bounds on the measurement precision of the parameters of interest from shot-noise li… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

  30. arXiv:2104.01987  [pdf, ps, other

    cs.CR cs.LG math.ST stat.ML

    Rejoinder: Gaussian Differential Privacy

    Authors: Jinshuo Dong, Aaron Roth, Weijie J. Su

    Abstract: In this rejoinder, we aim to address two broad issues that cover most comments made in the discussion. First, we discuss some theoretical aspects of our work and comment on how this work might impact the theoretical foundation of privacy-preserving data analysis. Taking a practical viewpoint, we next discuss how f-differential privacy (f-DP) and Gaussian differential privacy (GDP) can make a diffe… ▽ More

    Submitted 25 June, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Updated the references. Rejoinder to discussions on Gaussian Differential Privacy, read to the Royal Statistical Society in December 2020

  31. arXiv:2103.12827  [pdf, other

    cs.LG eess.IV stat.ML

    Fisher Task Distance and Its Application in Neural Architecture Search

    Authors: Cat P. Le, Mohammadreza Soltani, Juncheng Dong, Vahid Tarokh

    Abstract: We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Tas… ▽ More

    Submitted 30 April, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Published in IEEE Access, Volume 10, 2022

  32. arXiv:2103.08721  [pdf, other

    stat.ML cs.CR cs.IT cs.LG math.ST

    A Central Limit Theorem for Differentially Private Query Answering

    Authors: Jinshuo Dong, Weijie J. Su, Linjun Zhang

    Abstract: Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question, therefore, is to understand which noise distribution optimizes the privacy-accuracy trade-off, especially when the dimension of the answer vector is high. Accordingly, extensive literature has been dedica… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

  33. arXiv:2007.01990  [pdf, other

    stat.ML cs.LG math.PR

    Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

    Authors: Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang

    Abstract: Langevin diffusion is a powerful method for nonconvex optimization, which enables the escape from local minima by injecting noise into the gradient. In particular, the temperature parameter controlling the noise level gives rise to a tradeoff between ``global exploration'' and ``local exploitation'', which correspond to high and low temperatures. To attain the advantages of both regimes, we propos… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  34. arXiv:2006.15334  [pdf, other

    cs.LG stat.ML

    Evolving Metric Learning for Incremental and Decremental Features

    Authors: Jiahua Dong, Yang Cong, Gan Sun, Tao Zhang, Xu Tang, Xiaowei Xu

    Abstract: Online metric learning has been widely exploited for large-scale data classification due to the low computational cost. However, amongst online practical scenarios where the features are evolving (e.g., some features are vanished and some new features are augmented), most metric learning models cannot be successfully applied to these scenarios, although they can tackle the evolving instances effic… ▽ More

    Submitted 29 June, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT 2021)

  35. arXiv:2006.07310  [pdf, other

    stat.ML cs.LG eess.SP

    Reservoir Computing meets Recurrent Kernels and Structured Transforms

    Authors: Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan, Florent Krzakala

    Abstract: Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence.… ▽ More

    Submitted 21 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 16785--16796, 2020

  36. arXiv:2003.04493  [pdf, other

    stat.ML cs.AI cs.CR cs.LG stat.ME

    Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion

    Authors: Qinqing Zheng, Jinshuo Dong, Qi Long, Weijie J. Su

    Abstract: Datasets containing sensitive information are often sequentially analyzed by many algorithms. This raises a fundamental question in differential privacy regarding how the overall privacy bound degrades under composition. To address this question, we introduce a family of analytical and sharp privacy bounds under composition using the Edgeworth expansion in the framework of the recently proposed f-… ▽ More

    Submitted 25 March, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

  37. arXiv:2001.10237  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Faster Activity and Data Detection in Massive Random Access: A Multi-armed Bandit Approach

    Authors: Jialin Dong, Jun Zhang, Yuanming Shi, Jessie Hui Wang

    Abstract: This paper investigates the grant-free random access with massive IoT devices. By embedding the data symbols in the signature sequences, joint device activity detection and data decoding can be achieved, which, however, significantly increases the computational complexity. Coordinate descent algorithms that enjoy a low per-iteration complexity have been employed to solve the detection problem, but… ▽ More

    Submitted 28 January, 2020; originally announced January 2020.

    Comments: 30 pages, 5 figures

  38. arXiv:2001.08356  [pdf, other

    math.OC math.PR stat.ML

    Replica Exchange for Non-Convex Optimization

    Authors: Jing Dong, Xin T. Tong

    Abstract: Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima. On the other hand, Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and weak stochastic force, which in general slow down its convergence. This paper shows t… ▽ More

    Submitted 16 June, 2021; v1 submitted 22 January, 2020; originally announced January 2020.

    Comments: 70 pages, 15 figures

  39. arXiv:1912.11082  [pdf, other

    cs.CV cs.LG stat.ML

    Scalable Fine-grained Generated Image Classification Based on Deep Metric Learning

    Authors: Xinsheng Xuan, Bo Peng, Wei Wang, Jing Dong

    Abstract: Recently, generated images could reach very high quality, even human eyes could not tell them apart from real images. Although there are already some methods for detecting generated images in current forensic community, most of these methods are used to detect a single type of generated images. The new types of generated images are emerging one after another, and the existing detection methods can… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.

  40. arXiv:1911.11607  [pdf, other

    cs.LG cs.CR stat.ML

    Deep Learning with Gaussian Differential Privacy

    Authors: Zhiqi Bu, Jinshuo Dong, Qi Long, Weijie J. Su

    Abstract: Deep learning models are often trained on datasets that contain sensitive information such as individuals' shopping transactions, personal contacts, and medical records. An increasingly important line of work therefore has sought to train neural networks subject to privacy constraints that are specified by differential privacy or its divergence-based relaxations. These privacy definitions, however… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in Harvard Data Science Review

  41. arXiv:1911.05904  [pdf, other

    cs.LG cs.SE stat.ML

    There is Limited Correlation between Coverage and Robustness for Deep Neural Networks

    Authors: Yizhen Dong, Peixin Zhang, Jingyi Wang, Shuang Liu, Jun Sun, Jianye Hao, Xinyu Wang, Li Wang, Jin Song Dong, Dai Ting

    Abstract: Deep neural networks (DNN) are increasingly applied in safety-critical systems, e.g., for face recognition, autonomous car control and malware detection. It is also shown that DNNs are subject to attacks such as adversarial perturbation and thus must be properly tested. Many coverage criteria for DNN since have been proposed, inspired by the success of code coverage criteria for software programs.… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  42. arXiv:1911.01483  [pdf, other

    stat.ML cs.LG math.ST

    On Constructing Confidence Region for Model Parameters in Stochastic Gradient Descent via Batch Means

    Authors: Yi Zhu, Jing Dong

    Abstract: In this paper, we study a simple algorithm to construct asymptotically valid confidence regions for model parameters using the batch means method. The main idea is to cancel out the covariance matrix which is hard/costly to estimate. In the process of developing the algorithm, we establish process-level functional central limit theorem for Polyak-Ruppert averaging based stochastic gradient descent… ▽ More

    Submitted 31 January, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

  43. arXiv:1910.05471  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification and Exploration for Reinforcement Learning

    Authors: YI Zhu, Jing Dong, Henry Lam

    Abstract: We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors… ▽ More

    Submitted 4 December, 2022; v1 submitted 11 October, 2019; originally announced October 2019.

  44. arXiv:1910.01382  [pdf, other

    cs.LG cs.LO stat.ML

    Silas: High Performance, Explainable and Verifiable Machine Learning

    Authors: Hadrien Bride, Zhe Hou, Jie Dong, Jin Song Dong, Ali Mirjalili

    Abstract: This paper introduces a new classification tool named Silas, which is built to provide a more transparent and dependable data analytics service. A focus of Silas is on providing a formal foundation of decision trees in order to support logical analysis and verification of learned prediction models. This paper describes the distinct features of Silas: The Model Audit module formally verifies the pr… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

  45. arXiv:1909.10023  [pdf, other

    cs.LG stat.ML

    Towards Interpreting Recurrent Neural Networks through Probabilistic Abstraction

    Authors: Guoliang Dong, Jingyi Wang, Jun Sun, Yang Zhang, Xinyu Wang, Ting Dai, Jin Song Dong, Xingen Wang

    Abstract: Neural networks are becoming a popular tool for solving many real-world problems such as object recognition and machine translation, thanks to its exceptional performance as an end-to-end solution. However, neural networks are complex black-box models, which hinders humans from interpreting and consequently trusting them in making critical decisions. Towards interpreting neural networks, several a… ▽ More

    Submitted 27 September, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

    Comments: Accepted by ASE 2020

  46. arXiv:1908.06177  [pdf, other

    cs.LG cs.CL cs.LO stat.ML

    CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

    Authors: Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, William L. Hamilton

    Abstract: The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems. Motivated by classic work on inductive logic programming, C… ▽ More

    Submitted 3 September, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: Accepted at EMNLP 2019, 9 page content + Appendix

  47. arXiv:1905.02383  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Gaussian Differential Privacy

    Authors: Jinshuo Dong, Aaron Roth, Weijie J. Su

    Abstract: Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis test… ▽ More

    Submitted 30 May, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

    Comments: v2 revises introduction, adds discussion and fixes some inconsistencies. v3 fixes typos

  48. arXiv:1903.00197  [pdf

    q-bio.QM cs.LG stat.ML

    Outcome-Driven Clustering of Acute Coronary Syndrome Patients using Multi-Task Neural Network with Attention

    Authors: Eryu Xia, Xin Du, Jing Mei, Wen Sun, Suijun Tong, Zhiqing Kang, Jian Sheng, Jian Li, Changsheng Ma, Jianzeng Dong, Shaochun Li

    Abstract: Cluster analysis aims at separating patients into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. It is an important approach in data-driven disease classification and subtyping. Acute coronary syndrome (ACS) is a syndrome due to sudden decrease of coronary artery blood flow, where disease classification would help to inform therapeutic strategies an… ▽ More

    Submitted 27 March, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

  49. arXiv:1902.11153  [pdf, other

    cs.CV cs.LG stat.ML

    On the generalization of GAN image forensics

    Authors: Xinsheng Xuan, Bo Peng, Wei Wang, Jing Dong

    Abstract: Recently the GAN generated face images are more and more realistic with high-quality, even hard for human eyes to detect. On the other hand, the forensics community keeps on developing methods to detect these generated fake images and try to guarantee the credibility of visual contents. Although researchers have developed some methods to detect generated images, few of them explore the important p… ▽ More

    Submitted 10 December, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

  50. arXiv:1901.03209  [pdf, other

    stat.ML cs.LG

    Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

    Authors: Jiayun Dong, Cynthia Rudin

    Abstract: Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case… ▽ More

    Submitted 9 February, 2020; v1 submitted 10 January, 2019; originally announced January 2019.