Search | arXiv e-print repository

PILAF: Optimal Human Preference Sampling for Reward Modeling

Authors: Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng, Julia Kempe, Yaqi Duan

Abstract: As large language models increasingly drive real-world applications, aligning them with human values becomes paramount. Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique, translating preference data into reward models when oracle human values remain inaccessible. In practice, RLHF mostly relies on approximate reward models, which may not consistently guide the policy… ▽ More As large language models increasingly drive real-world applications, aligning them with human values becomes paramount. Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique, translating preference data into reward models when oracle human values remain inaccessible. In practice, RLHF mostly relies on approximate reward models, which may not consistently guide the policy toward maximizing the underlying human values. We propose Policy-Interpolated Learning for Aligned Feedback (PILAF), a novel response sampling strategy for preference labeling that explicitly aligns preference learning with maximizing the underlying oracle reward. PILAF is theoretically grounded, demonstrating optimality from both an optimization and a statistical perspective. The method is straightforward to implement and demonstrates strong performance in iterative and online RLHF settings where feedback curation is critical. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2410.22069 [pdf, other]

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Authors: Nikolaos Tsilivis, Gal Vardi, Julia Kempe

Abstract: We study the implicit bias of the general family of steepest descent algorithms with infinitesimal learning rate in deep homogeneous neural networks. We show that: (a) an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy, and (b) any limit point of the training trajectory corresponds to a KKT point of the corresponding margin-maximization prob… ▽ More We study the implicit bias of the general family of steepest descent algorithms with infinitesimal learning rate in deep homogeneous neural networks. We show that: (a) an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy, and (b) any limit point of the training trajectory corresponds to a KKT point of the corresponding margin-maximization problem. We experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of popular adaptive methods (Adam and Shampoo). △ Less

Submitted 2 April, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

Comments: The earlier conference version (ICLR 2025) of this paper showed a bias towards KKT points of the max-margin problem only in the case of 'smooth' norms. The current version (submitted to JMLR) proves that this holds true for any norm. It also includes new experiments on the implicit bias of the Shampoo algorithm

arXiv:2410.16073 [pdf, other]

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Authors: Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe

Abstract: Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to… ▽ More Regularization, whether explicit in terms of a penalty in the loss or implicit in the choice of algorithm, is a cornerstone of modern machine learning. Indeed, controlling the complexity of the model class is particularly important when data is scarce, noisy or contaminated, as it translates a statistical belief on the underlying structure of the data. This work investigates the question of how to choose the regularization norm $\lVert \cdot \rVert$ in the context of high-dimensional adversarial training for binary classification. To this end, we first derive an exact asymptotic description of the robust, regularized empirical risk minimizer for various types of adversarial attacks and regularization norms (including non-$\ell_p$ norms). We complement this analysis with a uniform convergence analysis, deriving bounds on the Rademacher Complexity for this class of problems. Leveraging our theoretical results, we quantitatively characterize the relationship between perturbation size and the optimal choice of $\lVert \cdot \rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.07041 [pdf, other]

Emergent properties with repeated examples

Authors: François Charton, Julia Kempe

Abstract: We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets… ▽ More We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.04840 [pdf, other]

Strong Model Collapse

Authors: Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian, Julia Kempe

Abstract: Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little… ▽ More Within the scaling laws paradigm, which underpins the training of large neural networks like ChatGPT and Llama, we consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon, a critical performance degradation due to synthetic data in the training corpus. Our results show that even the smallest fraction of synthetic data (e.g., as little as 1\% of the total training dataset) can still lead to model collapse: larger and larger training sets do not enhance performance. We further investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse. In a simplified regime where neural networks are approximated via random projections of tunable size, we both theoretically and empirically show that larger models can amplify model collapse. Interestingly, our theory also indicates that, beyond the interpolation threshold (which can be extremely high for very large datasets), larger models may mitigate the collapse, although they do not entirely prevent it. Our theoretical findings are empirically verified through experiments on language models and feed-forward neural networks for images. △ Less

Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2408.01420 [pdf, other]

Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

Authors: Jingtong Su, Julia Kempe, Karen Ullrich

Abstract: Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empiric… ▽ More Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empirical evidence shows preference aligned LLMs can be enticed to harmful behaviour. This so called jailbreaking of LLMs is typically achieved by adversarially modifying the input prompt to the LLM. Our paper provides theoretical insights into the phenomenon of preference alignment and jailbreaking from a statistical perspective. Under our framework, we first show that pretrained LLMs will mimic harmful behaviour if present in the training corpus. Under that same framework, we then introduce a statistical notion of alignment, and lower-bound the jailbreaking probability, showing that it is unpreventable under reasonable assumptions. Based on our insights, we propose an alteration to the currently prevalent alignment strategy RLHF. Specifically, we introduce a simple modification to the RLHF objective, we call E-RLHF, that aims to increase the likelihood of safe responses. E-RLHF brings no additional training cost, and is compatible with other methods. Empirically, we demonstrate that E-RLHF outperforms RLHF on all alignment problems put forward by the AdvBench and HarmBench project without sacrificing model performance as measured by the MT-Bench project. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2406.07515 [pdf, other]

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Authors: Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe

Abstract: Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is… ▽ More Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of verification on synthesized data to prevent model collapse. We provide a theoretical characterization using Gaussian mixtures, linear classifiers, and linear verifiers to derive conditions with measurable proxies to assess whether the verifier can effectively select synthesized data that leads to optimal performance. We experiment with two practical tasks -- computing matrix eigenvalues with transformers and news summarization with LLMs -- which both exhibit model collapse when trained on generated data, and show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse and that our proposed proxy measure strongly correlates with performance. △ Less

Submitted 24 October, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04981 [pdf, other]

The Price of Implicit Bias in Adversarially Robust Generalization

Authors: Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe

Abstract: We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization… ▽ More We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization in robust ERM can significantly affect the robustness of the model and identify two ways this can happen; either through the optimization algorithm or the architecture. We verify our predictions in simulations with synthetic data and experimentally study the importance of implicit bias in robust ERM with deep neural networks. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02128 [pdf, other]

Iteration Head: A Mechanistic Study of Chain-of-Thought

Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks. △ Less

Submitted 28 October, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.19640 [pdf, other]

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

Abstract: Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN infe… ▽ More Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.05579 [pdf, other]

DRoP: Distributionally Robust Data Pruning

Authors: Artem Vysogorets, Kartik Ahuja, Julia Kempe

Abstract: In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models.… ▽ More In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning algorithms can produce highly biased classifiers. We present theoretical analysis of the classification risk in a mixture of Gaussians to argue that choosing appropriate class pruning ratios, coupled with random pruning within classes has potential to improve worst-class performance. We thus propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks. In sharp contrast to existing algorithms, our proposed method continues improving distributional robustness at a tolerable drop of average performance as we prune more from the datasets. △ Less

Submitted 9 February, 2025; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.09869 [pdf, other]

Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

arXiv:2402.07712 [pdf, other]

Model Collapse Demystified: The Case of Regression

Authors: Elvis Dohmatob, Yunzhen Feng, Julia Kempe

Abstract: In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting… ▽ More In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments. △ Less

Submitted 30 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07043 [pdf, other]

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe

Abstract: As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will… ▽ More As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2. △ Less

Submitted 31 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Journal ref: ICML 2024

arXiv:2402.03579 [pdf, other]

Deconstructing the Goldilocks Zone of Neural Network Initialization

Authors: Artem Vysogorets, Anna Dawid, Julia Kempe

Abstract: The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this rel… ▽ More The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this relationship, so it remains largely unexplained. In this paper, we present a rigorous and comprehensive analysis of the Goldilocks zone for homogeneous neural networks. In particular, we derive the fundamental condition resulting in excess of positive curvature of the loss, explaining and refining its conventionally accepted connection to the initialization norm. Further, we relate the excess of positive curvature to model confidence, low initial loss, and a previously unknown type of vanishing cross-entropy loss gradient. To understand the importance of excessive positive curvature for trainability of deep networks, we optimize fully-connected and convolutional architectures outside the Goldilocks zone and analyze the emergent behaviors. We find that strong model performance is not perfectly aligned with the Goldilocks zone, calling for further research into this relationship. △ Less

Submitted 4 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR (2024) 235:49717-49732

arXiv:2311.17967 [pdf, other]

Discovering Galaxy Features via Dataset Distillation

Authors: Haowen Guan, Xuan Zhao, Zishi Wang, Zhiyang Li, Julia Kempe

Abstract: In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity. Moreover, it is likely that NNs leverage underlying features that might differ from those humans perceive to classify. Can we "reverse-engineer" pertinent features to enhance our scientific understanding? Here, we apply this idea to the notoriously difficult task of galaxy classificatio… ▽ More In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity. Moreover, it is likely that NNs leverage underlying features that might differ from those humans perceive to classify. Can we "reverse-engineer" pertinent features to enhance our scientific understanding? Here, we apply this idea to the notoriously difficult task of galaxy classification: NNs have reached high performance for this task, but what does a neural net (NN) "see" when it classifies galaxies? Are there morphological features that the human eye might overlook that could help with the task and provide new insights? Can we visualize tracers of early evolution, or additionally incorporated spectral data? We present a novel way to summarize and visualize galaxy morphology through the lens of neural networks, leveraging Dataset Distillation, a recent deep-learning methodology with the primary objective to distill knowledge from a large dataset and condense it into a compact synthetic dataset, such that a model trained on this synthetic dataset achieves performance comparable to a model trained on the full dataset. We curate a class-balanced, medium-size high-confidence version of the Galaxy Zoo 2 dataset, and proceed with dataset distillation from our accurate NN-classifier to create synthesized prototypical images of galaxy morphological features, demonstrating its effectiveness. Of independent interest, we introduce a self-adaptive version of the state-of-the-art Matching Trajectory algorithm to automate the distillation process, and show enhanced performance on computer vision benchmarks. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted to NeurIPS Workshop on Machine Learning and the Physical Sciences, 2023

arXiv:2311.07444 [pdf, other]

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

Authors: Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe

Abstract: Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness,… ▽ More Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data. Our code is available at https://github.com/JingtongSu/robust_neural_collapse . △ Less

Submitted 13 November, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Transactions on Machine Learning Research, 2024

arXiv:2311.07025 [pdf, other]

Embarassingly Simple Dataset Distillation

Authors: Yunzhen Feng, Ramakrishna Vedantam, Julia Kempe

Abstract: Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced varian… ▽ More Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: Short version appears at NeurIPS 2023 WANT workshop

arXiv:2307.02693 [pdf, other]

Kernels, Data & Physics

Authors: Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe

Abstract: Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as… ▽ More Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: These are notes from the lecture of Julia Kempe given at the summer school "Statistical Physics \& Machine Learning", that took place in Les Houches School of Physics in France from 4th to 29th July 2022

arXiv:2304.09403 [pdf, other]

Wavelets Beat Monkeys at Adversarial Robustness

Authors: Jingtong Su, Julia Kempe

Abstract: Research on improving the robustness of neural networks to adversarial noise - imperceptible malicious perturbations of the data - has received significant attention. The currently uncontested state-of-the-art defense to obtain robust deep neural networks is Adversarial Training (AT), but it consumes significantly more resources compared to standard training and trades off accuracy for robustness.… ▽ More Research on improving the robustness of neural networks to adversarial noise - imperceptible malicious perturbations of the data - has received significant attention. The currently uncontested state-of-the-art defense to obtain robust deep neural networks is Adversarial Training (AT), but it consumes significantly more resources compared to standard training and trades off accuracy for robustness. An inspiring recent work [Dapello et al.] aims to bring neurobiological tools to the question: How can we develop Neural Nets that robustly generalize like human vision? [Dapello et al.] design a network structure with a neural hidden first layer that mimics the primate primary visual cortex (V1), followed by a back-end structure adapted from current CNN vision models. It seems to achieve non-trivial adversarial robustness on standard vision benchmarks when tested on small perturbations. Here we revisit this biologically inspired work, and ask whether a principled parameter-free representation with inspiration from physics is able to achieve the same goal. We discover that the wavelet scattering transform can replace the complex V1-cortex and simple uniform Gaussian noise can take the role of neural stochasticity, to achieve adversarial robustness. In extensive experiments on the CIFAR-10 benchmark with adaptive adversarial attacks we show that: 1) Robustness of VOneBlock architectures is relatively weak (though non-zero) when the strength of the adversarial attack radius is set to commonly used benchmarks. 2) Replacing the front-end VOneBlock by an off-the-shelf parameter-free Scatternet followed by simple uniform Gaussian noise can achieve much more substantial adversarial robustness without adversarial training. Our work shows how physically inspired structures yield new insights into robustness that were previously only thought possible by meticulously mimicking the human cortex. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Machine Learning and the Physical Sciences Workshop, NeurIPS 2022

arXiv:2210.05577 [pdf, other]

What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?

Authors: Nikolaos Tsilivis, Julia Kempe

Abstract: The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK… ▽ More The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training. △ Less

Submitted 30 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022; added link to GitHub repository

arXiv:2210.01987 [pdf, other]

ImpressLearn: Continual Learning via Combined Task Impressions

Authors: Dhrupad Bhardwaj, Julia Kempe, Artem Vysogorets, Angela M. Teng, Evaristus C. Ezekwem

Abstract: This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a rando… ▽ More This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks. In contrast to previous methods, we do not require to generate dedicated masks or contexts for each new task, instead leveraging transfer learning to keep per-task parameter overhead small. Our work illustrates the power of linearly combining individual impressions, each of which fares poorly in isolation, to achieve performance comparable to a dedicated mask. Moreover, even repeated impressions from the same task (homogeneous masks), when combined, can approach the performance of heterogeneous combinations if sufficiently many impressions are used. Our approach scales more efficiently than existing methods, often requiring orders of magnitude fewer parameters and can function without modification even when task identity is missing. In addition, in the setting where task labels are not given at inference, our algorithm gives an often favorable alternative to the one-shot procedure used by Wortsman et al., 2020. We evaluate our method on a number of well-known image classification datasets and network architectures. △ Less

Submitted 31 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2207.11727 [pdf, other]

Can we achieve robustness from data alone?

Authors: Nikolaos Tsilivis, Jingtong Su, Julia Kempe

Abstract: We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train… ▽ More We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train a robust model. We formulate the data optimization procedure as a bi-level optimization problem on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels). We present extensive experiments on standard computer vision benchmarks using a variety of different models, demonstrating the effectiveness of our method, while also pointing out its current shortcomings. In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought. △ Less

Submitted 30 January, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

arXiv:2107.02306 [pdf, other]

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Authors: Artem Vysogorets, Julia Kempe

Abstract: Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output… ▽ More Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output layers of underlying subnetworks, potentially underestimating actual effective sparsity: the fraction of inactivated connections. While this effect might be negligible for moderately pruned networks (up to 10-100 compression rates), we find that it plays an increasing role for thinner subnetworks, greatly distorting comparison between different pruning algorithms. For example, we show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart, while no discrepancy is ever observed when using SynFlow for pruning [Tanaka et al., 2020]. In this work, we adopt the lens of effective sparsity to reevaluate several recent pruning algorithms on common benchmark architectures (e.g., LeNet-300-100, VGG-19, ResNet-18) and discover that their absolute and relative performance changes dramatically in this new and more appropriate framework. To aim for effective, rather than direct, sparsity, we develop a low-cost extension to most pruning algorithms. Further, equipped with effective sparsity as a reference frame, we partially reconfirm that random pruning with appropriate sparsity allocation across layers performs as well or better than more sophisticated algorithms for pruning at initialization [Su et al., 2020]. In response to this observation, using a simple analogy of pressure distribution in coupled cylinders from physics, we design novel layerwise sparsity quotas that outperform all existing baselines in the context of random pruning. △ Less

Submitted 7 April, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:1209.1055 [pdf, other]

doi 10.1007/978-3-642-31594-7_33

Hardness of approximation for quantum problems

Authors: Sevag Gharibian, Julia Kempe

Abstract: The polynomial hierarchy plays a central role in classical complexity theory. Here, we define a quantum generalization of the polynomial hierarchy, and initiate its study. We show that not only are there natural complete problems for the second level of this quantum hierarchy, but that these problems are in fact hard to approximate. Using these techniques, we also obtain hardness of approximation… ▽ More The polynomial hierarchy plays a central role in classical complexity theory. Here, we define a quantum generalization of the polynomial hierarchy, and initiate its study. We show that not only are there natural complete problems for the second level of this quantum hierarchy, but that these problems are in fact hard to approximate. Using these techniques, we also obtain hardness of approximation for the class QCMA. Our approach is based on the use of dispersers, and is inspired by the classical results of Umans regarding hardness of approximation for the second level of the classical polynomial hierarchy [Umans, FOCS 1999]. The problems for which we prove hardness of approximation for include, among others, a quantum version of the Succinct Set Cover problem, and a variant of the local Hamiltonian problem with hybrid classical-quantum ground states. △ Less

Submitted 5 September, 2012; originally announced September 2012.

Comments: 21 pages, 1 figure, extended abstract appeared in Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP), pages 387-398, Springer, 2012

Journal ref: Quantum Information & Computation 14 (5 & 6): 517-540, 2014. Also in Proceedings of ICALP 2012

arXiv:1101.3884 [pdf, ps, other]

doi 10.1137/110842272

Approximation algorithms for QMA-complete problems

Authors: Sevag Gharibian, Julia Kempe

Abstract: Approximation algorithms for classical constraint satisfaction problems are one of the main research areas in theoretical computer science. Here we define a natural approximation version of the QMA-complete local Hamiltonian problem and initiate its study. We present two main results. The first shows that a non-trivial approximation ratio can be obtained in the class NP using product states. The s… ▽ More Approximation algorithms for classical constraint satisfaction problems are one of the main research areas in theoretical computer science. Here we define a natural approximation version of the QMA-complete local Hamiltonian problem and initiate its study. We present two main results. The first shows that a non-trivial approximation ratio can be obtained in the class NP using product states. The second result (which builds on the first one), gives a polynomial time (classical) algorithm providing a similar approximation ratio for dense instances of the problem. The latter result is based on an adaptation of the "exhaustive sampling method" by Arora et al. [J. Comp. Sys. Sci. 58, p.193 (1999)] to the quantum setting, and might be of independent interest. △ Less

Submitted 20 January, 2011; originally announced January 2011.

Comments: 22 pages, comments welcome

Journal ref: SIAM Journal on Computing 41(4): 1028-1050, 2012. Also in Proceedings of 26th IEEE Conference on Computational Complexity (CCC), 178-188, 2011

arXiv:1005.0512 [pdf, ps, other]

Two-Source Extractors Secure Against Quantum Adversaries

Authors: Roy Kasher, Julia Kempe

Abstract: We initiate the study of multi-source extractors in the quantum world. In this setting, our goal is to extract random bits from two independent weak random sources, on which two quantum adversaries store a bounded amount of information. Our main result is a two-source extractor secure against quantum adversaries, with parameters closely matching the classical case and tight in several instances. M… ▽ More We initiate the study of multi-source extractors in the quantum world. In this setting, our goal is to extract random bits from two independent weak random sources, on which two quantum adversaries store a bounded amount of information. Our main result is a two-source extractor secure against quantum adversaries, with parameters closely matching the classical case and tight in several instances. Moreover, the extractor is secure even if the adversaries share entanglement. The construction is the Chor-Goldreich [CG88] two-source inner product extractor and its multi-bit variant by Dodis et al. [DEOR04]. Previously, research in this area focused on the construction of seeded extractors secure against quantum adversaries; the multi-source setting poses new challenges, among which is the presence of entanglement that could potentially break the independence of the sources. △ Less

Submitted 4 May, 2010; originally announced May 2010.

Comments: 20 pages, no figures

arXiv:0911.1696 [pdf, ps, other]

doi 10.1145/2371656.2371659

A Quantum Lovasz Local Lemma

Authors: Andris Ambainis, Julia Kempe, Or Sattath

Abstract: The Lovasz Local Lemma (LLL) is a powerful tool in probability theory to show the existence of combinatorial objects meeting a prescribed collection of "weakly dependent" criteria. We show that the LLL extends to a much more general geometric setting, where events are replaced with subspaces and probability is replaced with relative dimension, which allows to lower bound the dimension of the int… ▽ More The Lovasz Local Lemma (LLL) is a powerful tool in probability theory to show the existence of combinatorial objects meeting a prescribed collection of "weakly dependent" criteria. We show that the LLL extends to a much more general geometric setting, where events are replaced with subspaces and probability is replaced with relative dimension, which allows to lower bound the dimension of the intersection of vector spaces under certain independence conditions. Our result immediately applies to the k-QSAT problem: For instance we show that any collection of rank 1 projectors with the property that each qubit appears in at most $2^k/(e \cdot k)$ of them, has a joint satisfiable state. We then apply our results to the recently studied model of random k-QSAT. Recent works have shown that the satisfiable region extends up to a density of 1 in the large k limit, where the density is the ratio of projectors to qubits. Using a hybrid approach building on work by Laumann et al. we greatly extend the known satisfiable region for random k-QSAT to a density of $Ω(2^k/k^2)$. Since our tool allows us to show the existence of joint satisfying states without the need to construct them, we are able to penetrate into regions where the satisfying states are conjectured to be entangled, avoiding the need to construct them, which has limited previous approaches to product states. △ Less

Submitted 9 November, 2009; originally announced November 2009.

Comments: 19 pages

Journal ref: Journal of the ACM, Volume 59 Issue 5, October 2012, Article No. 24

arXiv:0911.0201 [pdf, ps, other]

No Strong Parallel Repetition with Entangled and Non-signaling Provers

Authors: Julia Kempe, Oded Regev

Abstract: We consider one-round games between a classical verifier and two provers. One of the main questions in this area is the \emph{parallel repetition question}: If the game is played $\ell$ times in parallel, does the maximum winning probability decay exponentially in $\ell$? In the classical setting, this question was answered in the affirmative by Raz. More recently the question arose whether the… ▽ More We consider one-round games between a classical verifier and two provers. One of the main questions in this area is the \emph{parallel repetition question}: If the game is played $\ell$ times in parallel, does the maximum winning probability decay exponentially in $\ell$? In the classical setting, this question was answered in the affirmative by Raz. More recently the question arose whether the decay is of the form $(1-Θ(\eps))^\ell$ where $1-\eps$ is the value of the game and $\ell$ is the number of repetitions. This question is known as the \emph{strong parallel repetition question} and was motivated by its connections to the unique games conjecture. It was resolved by Raz who showed that strong parallel repetition does \emph{not} hold, even in the very special case of games known as XOR games. This opens the question whether strong parallel repetition holds in the case when the provers share entanglement. Evidence for this is provided by the behavior of XOR games, which have strong (in fact \emph{perfect}) parallel repetition, and by the recently proved strong parallel repetition of linear unique games. A similar question was open for games with so-called non-signaling provers. Here the best known parallel repetition theorem is due to Holenstein, and is of the form $(1-Θ(\eps^2))^\ell$. We show that strong parallel repetition holds neither with entangled provers nor with non-signaling provers. In particular we obtain that Holenstein's bound is tight. Along the way we also provide a tight characterization of the asymptotic behavior of the entangled value under parallel repetition of unique games in terms of a semidefinite program. △ Less

Submitted 1 November, 2009; originally announced November 2009.

Comments: 15 pages, 2 figures

arXiv:quant-ph/0607174 [pdf, ps, other]

Exponential Separation of Quantum and Classical One-Way Communication Complexity for a Boolean Function

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We give an exponential separation between one-way quantum and classical communication complexity for a Boolean function. Earlier such a separation was known only for a relation. A very similar result was obtained earlier but independently by Kerenidis and Raz [KR06]. Our version of the result gives an example in the bounded storage model of cryptography, where the key is secure if the adversary… ▽ More We give an exponential separation between one-way quantum and classical communication complexity for a Boolean function. Earlier such a separation was known only for a relation. A very similar result was obtained earlier but independently by Kerenidis and Raz [KR06]. Our version of the result gives an example in the bounded storage model of cryptography, where the key is secure if the adversary has a certain amount of classical storage, but is completely insecure if he has a similar amount of quantum storage. △ Less

Submitted 25 July, 2006; originally announced July 2006.

Comments: 8 pages, no figures

arXiv:quant-ph/0603173 [pdf, ps, other]

Strengths and Weaknesses of Quantum Fingerprinting

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We study the power of quantum fingerprints in the simultaneous message passing (SMP) setting of communication complexity. Yao recently showed how to simulate, with exponential overhead, classical shared-randomness SMP protocols by means of quantum SMP protocols without shared randomness ($Q^\parallel$-protocols). Our first result is to extend Yao's simulation to the strongest possible model: eve… ▽ More We study the power of quantum fingerprints in the simultaneous message passing (SMP) setting of communication complexity. Yao recently showed how to simulate, with exponential overhead, classical shared-randomness SMP protocols by means of quantum SMP protocols without shared randomness ($Q^\parallel$-protocols). Our first result is to extend Yao's simulation to the strongest possible model: every many-round quantum protocol with unlimited shared entanglement can be simulated, with exponential overhead, by $Q^\parallel$-protocols. We apply our technique to obtain an efficient $Q^\parallel$-protocol for a function which cannot be efficiently solved through more restricted simulations. Second, we tightly characterize the power of the quantum fingerprinting technique by making a connection to arrangements of homogeneous halfspaces with maximal margin. These arrangements have been well studied in computational learning theory, and we use some strong results obtained in this area to exhibit weaknesses of quantum fingerprinting. In particular, this implies that for almost all functions, quantum fingerprinting protocols are exponentially worse than classical deterministic SMP protocols. △ Less

Submitted 20 March, 2006; originally announced March 2006.

Comments: 13 pages, no figures, to appear in CCC'06

Journal ref: Proc. 21st CCC (Complexity), p. 288-295 (2006)

arXiv:quant-ph/0511013 [pdf, ps, other]

Bounded-Error Quantum State Identification and Exponential Separations in Communication Complexity

Authors: Dmytro Gavinsky, Julia Kempe, Oded Regev, Ronald de Wolf

Abstract: We consider the problem of bounded-error quantum state identification: given either state α_0 or state α_1, we are required to output `0', `1' or `?' ("don't know"), such that conditioned on outputting `0' or `1', our guess is correct with high probability. The goal is to maximize the probability of not outputting `?'. We prove a direct product theorem: if we're given two such problems, with opt… ▽ More We consider the problem of bounded-error quantum state identification: given either state α_0 or state α_1, we are required to output `0', `1' or `?' ("don't know"), such that conditioned on outputting `0' or `1', our guess is correct with high probability. The goal is to maximize the probability of not outputting `?'. We prove a direct product theorem: if we're given two such problems, with optimal probabilities a and b, respectively, and the states in the first problem are pure, then the optimal probability for the joint bounded-error state identification problem is O(ab). Our proof is based on semidefinite programming duality and may be of wider interest. Using this result, we present two exponential separations in the simultaneous message passing model of communication complexity. Both are shown in the strongest possible sense. First, we describe a relation that can be computed with O(log n) classical bits of communication in the presence of shared randomness, but needs Omega(n^{1/3}) communication if the parties don't share randomness, even if communication is quantum. This shows the optimality of Yao's recent exponential simulation of shared-randomness protocols by quantum protocols without shared randomness. Second, we describe a relation that can be computed with O(log n) classical bits of communication in the presence of shared entanglement, but needs Omega((n/log n)^{1/3}) communication if the parties share randomness but no entanglement, even if communication is quantum. This is the first example in communication complexity of a situation where entanglement buys you much more than quantum communication does. △ Less

Submitted 2 November, 2005; originally announced November 2005.

Comments: 20 pages, no figures

arXiv:quant-ph/0411051 [pdf, ps, other]

Quantum Communication Cannot Simulate a Public Coin

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We study the simultaneous message passing model of communication complexity. Building on the quantum fingerprinting protocol of Buhrman et al., Yao recently showed that a large class of efficient classical public-coin protocols can be turned into efficient quantum protocols without public coin. This raises the question whether this can be done always, i.e. whether quantum communication can alway… ▽ More We study the simultaneous message passing model of communication complexity. Building on the quantum fingerprinting protocol of Buhrman et al., Yao recently showed that a large class of efficient classical public-coin protocols can be turned into efficient quantum protocols without public coin. This raises the question whether this can be done always, i.e. whether quantum communication can always replace a public coin in the SMP model. We answer this question in the negative, exhibiting a communication problem where classical communication with public coin is exponentially more efficient than quantum communication. Together with a separation in the other direction due to Bar-Yossef et al., this shows that the quantum SMP model is incomparable with the classical public-coin SMP model. In addition we give a characterization of the power of quantum fingerprinting by means of a connection to geometrical tools from machine learning, a quadratic improvement of Yao's simulation, and a nearly tight analysis of the Hamming distance problem from Yao's paper. △ Less

Submitted 8 November, 2004; originally announced November 2004.

Comments: 12 pages LaTeX

arXiv:quant-ph/0406180 [pdf, ps, other]

The Complexity of the Local Hamiltonian Problem

Authors: Julia Kempe, Alexei Kitaev, Oded Regev

Abstract: The k-local Hamiltonian problem is a natural complete problem for the complexity class QMA, the quantum analog of NP. It is similar in spirit to MAX-k-SAT, which is NP-complete for k<=2. It was known that the problem is QMA-complete for any k <= 3. On the other hand 1-local Hamiltonian is in P, and hence not believed to be QMA-complete. The complexity of the 2-local Hamiltonian problem has long… ▽ More The k-local Hamiltonian problem is a natural complete problem for the complexity class QMA, the quantum analog of NP. It is similar in spirit to MAX-k-SAT, which is NP-complete for k<=2. It was known that the problem is QMA-complete for any k <= 3. On the other hand 1-local Hamiltonian is in P, and hence not believed to be QMA-complete. The complexity of the 2-local Hamiltonian problem has long been outstanding. Here we settle the question and show that it is QMA-complete. We provide two independent proofs; our first proof uses only elementary linear algebra. Our second proof uses a powerful technique for analyzing the sum of two Hamiltonians; this technique is based on perturbation theory and we believe that it might prove useful elsewhere. Using our techniques we also show that adiabatic computation with two-local interactions on qubits is equivalent to standard quantum computation. △ Less

Submitted 2 October, 2005; v1 submitted 24 June, 2004; originally announced June 2004.

Comments: 30 pages, 3 figures, replaced with revised version, numerous improvements to readability and expanded adiabatic section

Journal ref: SIAM Journal of Computing, Vol. 35(5), p. 1070-1097 (2006), conference version in Proc. 24th FSTTCS, p. 372-383 (2004)

arXiv:quant-ph/0406046 [pdf, ps, other]

The hidden subgroup problem and permutation group theory

Authors: Julia Kempe, Aner Shalev

Abstract: We employ concepts and tools from the theory of finite permutation groups in order to analyse the Hidden Subgroup Problem via Quantum Fourier Sampling (QFS) for the symmetric group. We show that under very general conditions both the weak and the random-strong form (strong form with random choices of basis) of QFS fail to provide any advantage over classical exhaustive search. In particular we g… ▽ More We employ concepts and tools from the theory of finite permutation groups in order to analyse the Hidden Subgroup Problem via Quantum Fourier Sampling (QFS) for the symmetric group. We show that under very general conditions both the weak and the random-strong form (strong form with random choices of basis) of QFS fail to provide any advantage over classical exhaustive search. In particular we give a complete characterisation of polynomial size subgroups, and of primitive subgroups, that can be distinguished from the identity subgroup with the above methods. Furthermore, assuming a plausible group theoretic conjecture for which we give supporting evidence, we show that weak and random-strong QFS for the symmetric group have no advantage whatsoever over classical search. △ Less

Submitted 8 June, 2004; originally announced June 2004.

Comments: 12 pages

Journal ref: Proc. 16th ACM-SIAM SODA, p. 1118-1125 (2005)

arXiv:quant-ph/0402107 [pdf, ps, other]

Coins Make Quantum Walks Faster

Authors: Andris Ambainis, Julia Kempe, Alexander Rivosh

Abstract: We show how to search N items arranged on a $\sqrt{N}\times\sqrt{N}$ grid in time $O(\sqrt N \log N)$, using a discrete time quantum walk. This result for the first time exhibits a significant difference between discrete time and continuous time walks without coin degrees of freedom, since it has been shown recently that such a continuous time walk needs time $Ω(N)$ to perform the same task. Our… ▽ More We show how to search N items arranged on a $\sqrt{N}\times\sqrt{N}$ grid in time $O(\sqrt N \log N)$, using a discrete time quantum walk. This result for the first time exhibits a significant difference between discrete time and continuous time walks without coin degrees of freedom, since it has been shown recently that such a continuous time walk needs time $Ω(N)$ to perform the same task. Our result furthermore improves on a previous bound for quantum local search by Aaronson and Ambainis. We generalize our result to 3 and more dimensions where the walk yields the optimal performance of $O(\sqrt{N})$ and give several extensions of quantum walk search algorithms for general graphs. The coin-flip operation needs to be chosen judiciously: we show that another ``natural'' choice of coin gives a walk that takes $Ω(N)$ steps. We also show that in 2 dimensions it is sufficient to have a two-dimensional coin-space to achieve the time $O(\sqrt{N} \log N)$. △ Less

Submitted 16 February, 2004; originally announced February 2004.

Comments: 25 pages, no figures

Journal ref: Proc. 16th ACM-SIAM SODA, p. 1099-1108 (2005)

arXiv:quant-ph/0303081 [pdf, ps, other]

doi 10.1080/00107151031000110776

Quantum random walks - an introductory overview

Authors: Julia Kempe

Abstract: This article aims to provide an introductory survey on quantum random walks. Starting from a physical effect to illustrate the main ideas we will introduce quantum random walks, review some of their properties and outline their striking differences to classical walks. We will touch upon both physical effects and computer science applications, introducing some of the main concepts and language of… ▽ More This article aims to provide an introductory survey on quantum random walks. Starting from a physical effect to illustrate the main ideas we will introduce quantum random walks, review some of their properties and outline their striking differences to classical walks. We will touch upon both physical effects and computer science applications, introducing some of the main concepts and language of present day quantum information science in this context. We will mention recent developments in this new area and outline some open questions. △ Less

Submitted 13 March, 2003; originally announced March 2003.

Comments: 20 pages, 13 figures, to appear in Contemporary Physics

Journal ref: Contemporary Physics, Vol. 44 (4), p.307-327, 2003

arXiv:quant-ph/0302079 [pdf, ps, other]

3-Local Hamiltonian is QMA-complete

Authors: Julia Kempe, Oded Regev

Abstract: It has been shown by Kitaev that the 5-local Hamiltonian problem is QMA-complete. Here we reduce the locality of the problem by showing that 3-local Hamiltonian is already QMA-complete. It has been shown by Kitaev that the 5-local Hamiltonian problem is QMA-complete. Here we reduce the locality of the problem by showing that 3-local Hamiltonian is already QMA-complete. △ Less

Submitted 20 May, 2003; v1 submitted 10 February, 2003; originally announced February 2003.

Comments: 7 pages, minor changes and corrections, published version

Journal ref: Quantum Computation and Information, Vol. 3(3), p. 258-64, 2003

arXiv:quant-ph/0210064 [pdf, ps, other]

doi 10.1103/PhysRevA.67.052307

A Quantum Random Walk Search Algorithm

Authors: Neil Shenvi, Julia Kempe, K. Birgitta Whaley

Abstract: Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speed-up over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random walk architectur… ▽ More Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speed-up over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random walk architecture that provides such a speed-up. It will be shown that this algorithm performs an oracle search on a database of $N$ items with $O(\sqrt{N})$ calls to the oracle, yielding a speed-up similar to other quantum search algorithms. It appears that the quantum random walk formulation has considerable flexibility, presenting interesting opportunities for development of other, possibly novel quantum algorithms. △ Less

Submitted 9 October, 2002; originally announced October 2002.

Comments: 13 pages, 3 figures

Journal ref: Phys. Rev. A, Vol. 67 (5), 052307 (2003)

arXiv:quant-ph/0205083 [pdf, ps, other]

Quantum Random Walks Hit Exponentially Faster

Authors: Julia Kempe

Abstract: We show that the hitting time of the discrete time quantum random walk on the n-bit hypercube from one corner to its opposite is polynomial in n. This gives the first exponential quantum-classical gap in the hitting time of discrete quantum random walks. We provide the framework for quantum hitting time and give two alternative definitions to set the ground for its study on general graphs. We th… ▽ More We show that the hitting time of the discrete time quantum random walk on the n-bit hypercube from one corner to its opposite is polynomial in n. This gives the first exponential quantum-classical gap in the hitting time of discrete quantum random walks. We provide the framework for quantum hitting time and give two alternative definitions to set the ground for its study on general graphs. We then give an application to random routing. △ Less

Submitted 14 May, 2002; originally announced May 2002.

Comments: 15 pages, no Figures

Journal ref: Probability Theory and Related Fields, Vol. 133(2), p. 215-235 (2005), conference version in Proc. 7th RANDOM, p. 354-69, 2003

Showing 1–40 of 40 results for author: Kempe, J