Skip to main content

Showing 1–12 of 12 results for author: Sucholutsky, I

Searching in archive stat. Search in all archives.
.
  1. arXiv:2411.07483  [pdf, other

    stat.ML cs.CV cs.IT cs.LG eess.IV

    Quantifying Knowledge Distillation Using Partial Information Decomposition

    Authors: Pasan Dissanayake, Faisal Hamman, Barproda Halder, Ilia Sucholutsky, Qiuyi Zhang, Sanghamitra Dutta

    Abstract: Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performanc… ▽ More

    Submitted 4 April, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: Accepted at the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) 2025

  2. arXiv:2402.06992  [pdf, other

    q-bio.NC cs.AI cs.CL stat.AP

    A Rational Analysis of the Speech-to-Song Illusion

    Authors: Raja Marjieh, Pol van Rijn, Ilia Sucholutsky, Harin Lee, Thomas L. Griffiths, Nori Jacoby

    Abstract: The speech-to-song illusion is a robust psychological phenomenon whereby a spoken sentence sounds increasingly more musical as it is repeated. Despite decades of research, a complete formal account of this transformation is still lacking, and some of its nuanced characteristics, namely, that certain phrases appear to transform while others do not, is not well understood. Here we provide a formal a… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 7 pages, 5 figures

  3. arXiv:2302.01308  [pdf, other

    cs.CL cs.LG stat.ML

    Large language models predict human sensory judgments across six modalities

    Authors: Raja Marjieh, Ilia Sucholutsky, Pol van Rijn, Nori Jacoby, Thomas L. Griffiths

    Abstract: Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments f… ▽ More

    Submitted 15 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 9 pages, 3 figures

  4. arXiv:2301.11990  [pdf, other

    cs.LG cs.AI cs.CV cs.HC stat.ML

    Alignment with human representations supports robust few-shot learning

    Authors: Ilia Sucholutsky, Thomas L. Griffiths

    Abstract: Should we care whether AI systems have representations of the world that are similar to those of humans? We provide an information-theoretic analysis that suggests that there should be a U-shaped relationship between the degree of representational alignment with humans and performance on few-shot learning tasks. We confirm this prediction empirically, finding such a relationship in an analysis of… ▽ More

    Submitted 29 October, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Spotlight at NeurIPS 2023

  5. arXiv:2209.14821  [pdf, other

    cs.LG stat.ML

    Analyzing Diffusion as Serial Reproduction

    Authors: Raja Marjieh, Ilia Sucholutsky, Thomas A. Langlois, Nori Jacoby, Thomas L. Griffiths

    Abstract: Diffusion models are a class of generative models that learn to synthesize samples by inverting a diffusion process that gradually maps data into noise. While these models have enjoyed great success recently, a full theoretical understanding of their observed properties is still lacking, in particular, their weak sensitivity to the choice of noise family and the role of adequate scheduling of nois… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 10 pages, 4 figures

    Journal ref: PMLR 202:24005-24019, 2023

  6. arXiv:2206.04105  [pdf, other

    cs.CL cs.LG stat.ML

    Words are all you need? Language as an approximation for human similarity judgments

    Authors: Raja Marjieh, Pol van Rijn, Ilia Sucholutsky, Theodore R. Sumers, Harin Lee, Thomas L. Griffiths, Nori Jacoby

    Abstract: Human similarity judgments are a powerful supervision signal for machine learning applications based on techniques such as contrastive learning, information retrieval, and model alignment, but classical methods for collecting human similarity judgments are too expensive to be used at scale. Recent methods propose using pre-trained deep neural networks (DNNs) to approximate human similarity, but pr… ▽ More

    Submitted 23 February, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to ICLR 2023, final revision. https://openreview.net/forum?id=O-G91-4cMdv

  7. One Line To Rule Them All: Generating LO-Shot Soft-Label Prototypes

    Authors: Ilia Sucholutsky, Nam-Hwui Kim, Ryan P. Browne, Matthias Schonlau

    Abstract: Increasingly large datasets are rapidly driving up the computational costs of machine learning. Prototype generation methods aim to create a small set of synthetic observations that accurately represent a training dataset but greatly reduce the computational cost of learning from it. Assigning soft labels to prototypes can allow increasingly small sets of prototypes to accurately represent the ori… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: 8 pages

  8. Optimal 1-NN Prototypes for Pathological Geometries

    Authors: Ilia Sucholutsky, Matthias Schonlau

    Abstract: Using prototype methods to reduce the size of training datasets can drastically reduce the computational cost of classification with instance-based learning algorithms like the k-Nearest Neighbour classifier. The number and distribution of prototypes required for the classifier to match its original performance is intimately related to the geometry of the training data. As a result, it is often di… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: 8 pages

  9. arXiv:2009.09155  [pdf, other

    cs.LG stat.ML

    SecDD: Efficient and Secure Method for Remotely Training Neural Networks

    Authors: Ilia Sucholutsky, Matthias Schonlau

    Abstract: We leverage what are typically considered the worst qualities of deep learning algorithms - high computational cost, requirement for large data, no explainability, high dependence on hyper-parameter choice, overfitting, and vulnerability to adversarial perturbations - in order to create a method for the secure and efficient training of remotely deployed neural networks over unsecured channels.

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: 2 pages, 1 figure

  10. arXiv:2009.08449  [pdf, other

    cs.LG stat.ML

    'Less Than One'-Shot Learning: Learning N Classes From M<N Samples

    Authors: Ilia Sucholutsky, Matthias Schonlau

    Abstract: Deep neural networks require large training sets but suffer from high computational cost and long training times. Training on much smaller training sets while maintaining nearly the same accuracy would be very beneficial. In the few-shot learning setting, a model must learn a new class given only a small number of samples from that class. One-shot learning is an extreme form of few-shot learning w… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Journal ref: Sucholutsky, I. and Schonlau, M. 2021. 'Less Than One'-Shot Learning: Learning N Classes From M < N Samples. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 11 (May 2021), 9739-9746

  11. Soft-Label Dataset Distillation and Text Dataset Distillation

    Authors: Ilia Sucholutsky, Matthias Schonlau

    Abstract: Dataset distillation is a method for reducing dataset sizes by learning a small number of synthetic samples containing all the information of a large dataset. This has several benefits like speeding up model training, reducing energy consumption, and reducing required storage space. Currently, each synthetic sample is assigned a single `hard' label, and also, dataset distillation can currently onl… ▽ More

    Submitted 5 May, 2020; v1 submitted 6 October, 2019; originally announced October 2019.

  12. arXiv:1904.05411  [pdf, other

    cs.LG stat.ML

    Deep Learning for System Trace Restoration

    Authors: Ilia Sucholutsky, Apurva Narayan, Matthias Schonlau, Sebastian Fischmeister

    Abstract: Most real-world datasets, and particularly those collected from physical systems, are full of noise, packet loss, and other imperfections. However, most specification mining, anomaly detection and other such algorithms assume, or even require, perfect data quality to function properly. Such algorithms may work in lab conditions when given clean, controlled data, but will fail in the field when giv… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Pre-print (accepted to IJCNN 2019)