Search | arXiv e-print repository

Efficient Quantification of Multimodal Interaction at Sample Level

Abstract: Interactions between modalities -- redundancy, uniqueness, and synergy -- collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sa… ▽ More Interactions between modalities -- redundancy, uniqueness, and synergy -- collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI's precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator. △ Less

Submitted 7 June, 2025; originally announced June 2025.

Comments: Accepted to ICML 2025

arXiv:2502.00818 [pdf, other]

Error-quantified Conformal Inference for Time Series

Authors: Junxi Wu, Dongjian Hu, Yajie Bao, Shu-Tao Xia, Changliang Zou

Abstract: Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data. Conformal inference provides a pivotal and flexible instrument for assessing the uncertainty of machine learning models through prediction sets. Recently, a series of online conformal inference methods updated thresholds of prediction sets by performing onli… ▽ More Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data. Conformal inference provides a pivotal and flexible instrument for assessing the uncertainty of machine learning models through prediction sets. Recently, a series of online conformal inference methods updated thresholds of prediction sets by performing online gradient descent on a sequence of quantile loss functions. A drawback of such methods is that they only use the information of revealed non-conformity scores via miscoverage indicators but ignore error quantification, namely the distance between the non-conformity score and the current threshold. To accurately leverage the dynamic of miscoverage error, we propose \textit{Error-quantified Conformal Inference} (ECI) by smoothing the quantile loss function. ECI introduces a continuous and adaptive feedback scale with the miscoverage error, rather than simple binary feedback in existing methods. We establish a long-term coverage guarantee for ECI under arbitrary dependence and distribution shift. The extensive experimental results show that ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines. △ Less

Submitted 2 February, 2025; originally announced February 2025.

Comments: ICLR 2025 camera version

arXiv:2411.17402 [pdf, other]

Receiver operating characteristic curve analysis with non-ignorable missing disease status

Authors: Dingding Hu, Tao Yu, Pengfei Li

Abstract: This article considers the receiver operating characteristic (ROC) curve analysis for medical data with non-ignorable missingness in the disease status. In the framework of the logistic regression models for both the disease status and the verification status, we first establish the identifiability of model parameters, and then propose a likelihood method to estimate the model parameters, the ROC… ▽ More This article considers the receiver operating characteristic (ROC) curve analysis for medical data with non-ignorable missingness in the disease status. In the framework of the logistic regression models for both the disease status and the verification status, we first establish the identifiability of model parameters, and then propose a likelihood method to estimate the model parameters, the ROC curve, and the area under the ROC curve (AUC) for the biomarker. The asymptotic distributions of these estimators are established. Via extensive simulation studies, we compare our method with competing methods in the point estimation and assess the accuracy of confidence interval estimation under various scenarios. To illustrate the application of the proposed method in practical data, we apply our method to the National Alzheimer's Coordinating Center data set. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: 20 pages, 1 figure

arXiv:2409.06473 [pdf, other]

Some statistical aspects of the Covid-19 response

Authors: Simon N. Wood, Ernst C. Wit, Paul M. McKeigue, Danshu Hu, Beth Flood, Lauren Corcoran, Thea Abou Jawad

Abstract: This paper discusses some statistical aspects of the U.K. Covid-19 pandemic response, focussing particularly on cases where we believe that a statistically questionable approach or presentation has had a substantial impact on public perception, or government policy, or both. We discuss the presentation of statistics relating to Covid risk, and the risk of the response measures, arguing that biases… ▽ More This paper discusses some statistical aspects of the U.K. Covid-19 pandemic response, focussing particularly on cases where we believe that a statistically questionable approach or presentation has had a substantial impact on public perception, or government policy, or both. We discuss the presentation of statistics relating to Covid risk, and the risk of the response measures, arguing that biases tended to operate in opposite directions, overplaying Covid risk and underplaying the response risks. We also discuss some issues around presentation of life loss data, excess deaths and the use of case data. The consequences of neglect of most individual variability from epidemic models, alongside the consequences of some other statistically important omissions are also covered. Finally the evidence for full stay at home lockdowns having been necessary to reverse waves of infection is examined, with new analyses provided for a number of European countries. △ Less

Submitted 5 February, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: Version finally accepted by Journal of the Royal Statistical Society (Series A) as a discussion paper

arXiv:2205.00505 [pdf, ps, other]

Statistical inference for the two-sample problem under likelihood ratio ordering, with application to the ROC curve estimation

Authors: Dingding Hu, Meng Yuan, Tao Yu, Pengfei Li

Abstract: The receiver operating characteristic (ROC) curve is a powerful statistical tool and has been widely applied in medical research. In the ROC curve estimation, a commonly used assumption is that larger the biomarker value, greater severity the disease. In this paper, we mathematically interpret ``greater severity of the disease" as ``larger probability of being diseased". This in turn is equivalent… ▽ More The receiver operating characteristic (ROC) curve is a powerful statistical tool and has been widely applied in medical research. In the ROC curve estimation, a commonly used assumption is that larger the biomarker value, greater severity the disease. In this paper, we mathematically interpret ``greater severity of the disease" as ``larger probability of being diseased". This in turn is equivalent to assume the likelihood ratio ordering of the biomarker between the diseased and healthy individuals. With this assumption, we first propose a Bernstein polynomial method to model the distributions of both samples; we then estimate the distributions by the maximum empirical likelihood principle. The ROC curve estimate and the associated summary statistics are obtained subsequently. Theoretically, we establish the asymptotic consistency of our estimators. Via extensive numerical studies, we compare the performance of our method with competitive methods. The application of our method is illustrated by a real-data example. △ Less

Submitted 22 February, 2023; v1 submitted 1 May, 2022; originally announced May 2022.

Comments: 35 pages, 2 figure

arXiv:2106.05001 [pdf, other]

No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Authors: Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, Jiashi Feng

Abstract: A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or i… ▽ More A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data. △ Less

Submitted 28 October, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 22 pages, NeurIPS 2021

arXiv:2006.08690 [pdf, other]

Generalized and Scalable Optimal Sparse Decision Trees

Authors: Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer

Abstract: Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possi… ▽ More Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art. △ Less

Submitted 22 November, 2022; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: This paper was published in ICML 2020

ACM Class: I.2.6

arXiv:2005.09624 [pdf, other]

Batch-Augmented Multi-Agent Reinforcement Learning for Efficient Traffic Signal Optimization

Authors: Yueh-Hua Wu, I-Hau Yeh, David Hu, Hong-Yuan Mark Liao

Abstract: The goal of this work is to provide a viable solution based on reinforcement learning for traffic signal control problems. Although the state-of-the-art reinforcement learning approaches have yielded great success in a variety of domains, directly applying it to alleviate traffic congestion can be challenging, considering the requirement of high sample efficiency and how training data is gathered.… ▽ More The goal of this work is to provide a viable solution based on reinforcement learning for traffic signal control problems. Although the state-of-the-art reinforcement learning approaches have yielded great success in a variety of domains, directly applying it to alleviate traffic congestion can be challenging, considering the requirement of high sample efficiency and how training data is gathered. In this work, we address several challenges that we encountered when we attempted to mitigate serious traffic congestion occurring in a metropolitan area. Specifically, we are required to provide a solution that is able to (1) handle the traffic signal control when certain surveillance cameras that retrieve information for reinforcement learning are down, (2) learn from batch data without a traffic simulator, and (3) make control decisions without shared information across intersections. We present a two-stage framework to deal with the above-mentioned situations. The framework can be decomposed into an Evolution Strategies approach that gives a fixed-time traffic signal control schedule and a multi-agent off-policy reinforcement learning that is capable of learning from batch data with the aid of three proposed components, bounded action, batch augmentation, and surrogate reward clipping. Our experiments show that the proposed framework reduces traffic congestion by 36% in terms of waiting time compared with the currently used fixed-time traffic signal plan. Furthermore, the framework requires only 600 queries to a simulator to achieve the result. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2004.09031 [pdf, other]

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

Authors: Huanrui Yang, Minxue Tang, Wei Wen, Feng Yan, Daniel Hu, Ang Li, Hai Li, Yiran Chen

Abstract: Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. Ho… ▽ More Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: In proceeding of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). To be presented at EDLCV 2020 workshop co-located with CVPR 2020

arXiv:1812.10012 [pdf, ps, other]

doi 10.1109/TCYB.2019.2953564

Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning

Authors: Hong Tao, Chenping Hou, Dongyun Yi, Jubo Zhu, Dewen Hu

Abstract: In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete data, Incomplete Multi-view Learning (IML) rises. In this paper, we propose the Joint Embedding Learning and Low-Rank Approximation (JELLA) framework for IML. The JELLA framework approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by line… ▽ More In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete data, Incomplete Multi-view Learning (IML) rises. In this paper, we propose the Joint Embedding Learning and Low-Rank Approximation (JELLA) framework for IML. The JELLA framework approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by linear transformation. Several existing IML methods can be unified as special cases of the framework. More interestingly, some linear transformation based complete multi-view methods can be adapted to IML directly with the guidance of the framework. Thus, the JELLA framework improves the efficiency of processing incomplete multi-view data, and bridges the gap between complete multi-view learning and IML. Moreover, the JELLA framework can provide guidance for developing new algorithms. For illustration, within the framework, we propose the Incomplete Multi-view Learning with Block Diagonal Representation (IML-BDR) method. Assuming that the sampled examples have approximate linear subspace structure, IML-BDR uses the block diagonal structure prior to learn the full embedding, which would lead to more correct clustering. A convergent alternating iterative algorithm with the Successive Over-Relaxation optimization technique is devised for optimization. Experimental results on various datasets demonstrate the effectiveness of IML-BDR. △ Less

Submitted 16 December, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

arXiv:1812.04407 [pdf, other]

Learning Item-Interaction Embeddings for User Recommendations

Authors: Xiaoting Zhao, Raphael Louca, Diane Hu, Liangjie Hong

Abstract: Industry-scale recommendation systems have become a cornerstone of the e-commerce shopping experience. For Etsy, an online marketplace with over 50 million handmade and vintage items, users come to rely on personalized recommendations to surface relevant items from its massive inventory. One hallmark of Etsy's shopping experience is the multitude of ways in which a user can interact with an item t… ▽ More Industry-scale recommendation systems have become a cornerstone of the e-commerce shopping experience. For Etsy, an online marketplace with over 50 million handmade and vintage items, users come to rely on personalized recommendations to surface relevant items from its massive inventory. One hallmark of Etsy's shopping experience is the multitude of ways in which a user can interact with an item they are interested in: they can view it, favorite it, add it to a collection, add it to cart, purchase it, etc. We hypothesize that the different ways in which a user interacts with an item indicates different kinds of intent. Consequently, a user's recommendations should be based not only on the item from their past activity, but also the way in which they interacted with that item. In this paper, we propose a novel method for learning interaction-based item embeddings that encode the co-occurrence patterns of not only the item itself, but also the interaction type. The learned embeddings give us a convenient way of approximating the likelihood that one item-interaction pair would co-occur with another by way of a simple inner product. Because of its computational efficiency, our model lends itself naturally as a candidate set selection method, and we evaluate it as such in an industry-scale recommendation system that serves live traffic on Etsy.com. Our experiments reveal that taking interaction type into account shows promising results in improving the accuracy of modeling user shopping behavior. △ Less

Submitted 11 December, 2018; originally announced December 2018.

arXiv:1811.05544 [pdf, other]

An Introductory Survey on Attention Mechanisms in NLP Problems

Authors: Dichao Hu

Abstract: First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summa… ▽ More First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Comments: 9 pages

Showing 1–12 of 12 results for author: Hu, D