-
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
Authors:
Taejong Joo,
Diego Klabjan
Abstract:
Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compare…
▽ More
Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting to new tasks by simply conditioning on demonstrations without parameter updates. Compelling empirical and theoretical evidence suggests that ICL, as a general-purpose learner, could outperform task-specific models. However, it remains unclear to what extent the transformers optimally learn in-context compared to principled learning algorithms. To bridge this gap, we introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. Through an information-theoretic analysis, we show that the diminishing efficiency is inherent to ICL. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods without the diminishing efficiency.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Improving self-training under distribution shifts via anchored confidence with theoretical guarantees
Authors:
Taejong Joo,
Diego Klabjan
Abstract:
Self-training often falls short under distribution shifts due to an increased discrepancy between prediction confidence and actual accuracy. This typically necessitates computationally demanding methods such as neighborhood or ensemble-based label corrections. Drawing inspiration from insights on early learning regularization, we develop a principled method to improve self-training under distribut…
▽ More
Self-training often falls short under distribution shifts due to an increased discrepancy between prediction confidence and actual accuracy. This typically necessitates computationally demanding methods such as neighborhood or ensemble-based label corrections. Drawing inspiration from insights on early learning regularization, we develop a principled method to improve self-training under distribution shifts based on temporal consistency. Specifically, we build an uncertainty-aware temporal ensemble with a simple relative thresholding. Then, this ensemble smooths noisy pseudo labels to promote selective temporal consistency. We show that our temporal ensemble is asymptotically correct and our label smoothing technique can reduce the optimality gap of self-training. Our extensive experiments validate that our approach consistently improves self-training performances by 8% to 16% across diverse distribution shift scenarios without a computational overhead. Besides, our method exhibits attractive properties, such as improved calibration performance and robustness to different hyperparameter choices.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape
Authors:
Sifat Muhammad Abdullah,
Aravind Cheruvu,
Shravya Kanchi,
Taejoong Chung,
Peng Gao,
Murtuza Jadliwala,
Bimal Viswanath
Abstract:
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developm…
▽ More
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved Calibration and Model Selection in Unsupervised Domain Adaptation
Authors:
Taejong Joo,
Diego Klabjan
Abstract:
Distribution shifts pose significant challenges for model calibration and model selection tasks in the unsupervised domain adaptation problem -- a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we present…
▽ More
Distribution shifts pose significant challenges for model calibration and model selection tasks in the unsupervised domain adaptation problem -- a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we present a new perspective of addressing the model calibration and model selection tasks by estimating the group accuracy. Then, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation with theoretical analyses. Our extensive experiments show that our approach improves state-of-the-art performances by 22% in the model calibration task and 14% in the model selection task.
△ Less
Submitted 17 July, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Privacy Guarantees of BLE Contact Tracing: A Case Study on COVIDWISE
Authors:
Salman Ahmed,
Ya Xiao,
Taejoong,
Chung,
Carol Fung,
Moti Yung,
Danfeng,
Yao
Abstract:
Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scen…
▽ More
Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scenarios (and expected typical adversary in mind), and quite realistic use cases. We do it in the context of Virginia's COVIDWISE app. This experimental analysis validates the properties of the system under the above conditions, a result that seems crucial for the peace of mind of the exposure notification technology adopting authorities, and may also help with the system's transparency and overall user trust.
△ Less
Submitted 16 December, 2021; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty
Authors:
Taejong Joo,
Uijung Chung
Abstract:
From the statistical learning perspective, complexity control via explicit regularization is a necessity for improving the generalization of over-parameterized models. However, the impressive generalization performance of neural networks with only implicit regularization may be at odds with this conventional wisdom. In this work, we revisit the importance of explicit regularization for obtaining w…
▽ More
From the statistical learning perspective, complexity control via explicit regularization is a necessity for improving the generalization of over-parameterized models. However, the impressive generalization performance of neural networks with only implicit regularization may be at odds with this conventional wisdom. In this work, we revisit the importance of explicit regularization for obtaining well-calibrated predictive uncertainty. Specifically, we introduce a probabilistic measure of calibration performance, which is lower bounded by the log-likelihood. We then explore explicit regularization techniques for improving the log-likelihood on unseen samples, which provides well-calibrated predictive uncertainty. Our findings present a new direction to improve the predictive probability quality of deterministic neural networks, which can be an efficient and scalable alternative to Bayesian neural networks and ensemble methods.
△ Less
Submitted 6 February, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Being Bayesian about Categorical Probability
Authors:
Taejong Joo,
Uijung Chung,
Min-Gwan Seo
Abstract:
Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label…
▽ More
Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label, which provides consistent gains in generalization performance in multiple challenging tasks. The proposed method inherits advantages of Bayesian approaches that achieve better uncertainty estimation and model calibration. Our method can be implemented as a plug-and-play loss function with negligible computational overhead compared to the softmax with the cross-entropy loss function.
△ Less
Submitted 29 June, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Regularizing activations in neural networks via distribution matching with the Wasserstein metric
Authors:
Taejong Joo,
Donggu Kang,
Byunghoon Kim
Abstract:
Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance. We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. PER randomly projects activations onto one-dimensional space and computes the regulariza…
▽ More
Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance. We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. PER randomly projects activations onto one-dimensional space and computes the regularization loss in the projected space. PER is similar to the Pseudo-Huber loss in the projected space, thus taking advantage of both $L^1$ and $L^2$ regularization losses. Besides, PER can capture the interaction between hidden units by projection vector drawn from a unit sphere. By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution. To the best of the authors' knowledge, this is the first work to regularize activations via distribution matching in the probability distribution space. We evaluate the proposed method on the image classification task and the word-level language modeling task.
△ Less
Submitted 26 April, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
An Empirical Study on Content Bundling in BitTorrent Swarming System
Authors:
Jinyoung Han,
Taejoong Chung,
Seungbae Kim,
Hyun-chul Kim,
Ted "Taekyoung" Kwon,
Yanghee Choi
Abstract:
Despite the tremendous success of BitTorrent, its swarming system suffers from a fundamental limitation: lower or no availability of unpopular contents. Recently, Menasche et al. has shown that bundling is a promising solution to mitigate this availability problem; it improves the availability and reduces download times for unpopular contents by combining multiple files into a single swarm. There…
▽ More
Despite the tremendous success of BitTorrent, its swarming system suffers from a fundamental limitation: lower or no availability of unpopular contents. Recently, Menasche et al. has shown that bundling is a promising solution to mitigate this availability problem; it improves the availability and reduces download times for unpopular contents by combining multiple files into a single swarm. There also have been studies on bundling strategies and performance issues in bundled swarms. In spite of the recent surge of interest in the benefits of and strategies for bundling, there are still little empirical grounding for understanding, describing, and modeling it. This is the first empirical study that measures and analyzes how prevalent contents bundling is in BitTorrent and how peers access the bundled contents, in comparison to the other non-bundled (i.e., single-filed) ones. To our surprise, we found that around 70% of BitTorrent swarms contain multiple files, which indicate that bundling has become widespread for contents sharing. We also show that the amount of bytes shared in bundled swarms is estimated to be around 85% out of all the BitTorrent contents logged in our datasets. Inspired from our findings, we raise and discuss three important research questions in the field of file sharing systems as well as future contents-oriented networking: i) bundling strategies, ii) bundling-aware sharing systems in BitTorrent, and iii) implications on content-oriented networking.
△ Less
Submitted 16 August, 2010;
originally announced August 2010.