-
Dense Associative Memory with Epanechnikov Energy
Authors:
Benjamin Hoover,
Zhaoyang Shi,
Krishnakumar Balasubramanian,
Dmitry Krotov,
Parikshit Ram
Abstract:
We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant addit…
▽ More
We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional \emph{emergent} local minima while preserving perfect pattern recovery -- a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR's emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method's potential for both large-scale memory storage and generative tasks.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Memorization to Generalization: Emergence of Diffusion Models from Associative Memory
Authors:
Bao Pham,
Gabriel Raya,
Matteo Negri,
Mohammed J. Zaki,
Luca Ambrogioni,
Dmitry Krotov
Abstract:
Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. I…
▽ More
Hopfield networks are associative memory (AM) systems, designed for storing and retrieving patterns as local minima of an energy landscape. In the classical Hopfield model, an interesting phenomenon occurs when the amount of training data reaches its critical memory load $- spurious\,\,states$, or unintended stable points, emerge at the end of the retrieval dynamics, leading to incorrect recall. In this work, we examine diffusion models, commonly used in generative modeling, from the perspective of AMs. The training phase of diffusion model is conceptualized as memory encoding (training data is stored in the memory). The generation phase is viewed as an attempt of memory retrieval. In the small data regime the diffusion model exhibits a strong memorization phase, where the network creates distinct basins of attraction around each sample in the training set, akin to the Hopfield model below the critical memory load. In the large data regime, a different phase appears where an increase in the size of the training set fosters the creation of new attractor states that correspond to manifolds of the generated samples. Spurious states appear at the boundary of this transition and correspond to emergent attractor states, which are absent in the training set, but, at the same time, have distinct basins of attraction around them. Our findings provide: a novel perspective on the memorization-generalization phenomenon in diffusion models via the lens of AMs, theoretical prediction of existence of spurious states, empirical validation of this prediction in commonly-used diffusion models.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Small Models, Smarter Learning: The Power of Joint Task Training
Authors:
Csaba Both,
Benjamin Hoover,
Hendrik Strobelt,
Dmitry Krotov,
Daniel Karl I. Weidele,
Mauro Martino,
Nima Dehmamy
Abstract:
The ability of a model to learn a task depends strongly on both the task difficulty and the model size. We aim to understand how task difficulty relates to the minimum number of parameters required for learning specific tasks in small transformer models. Our study focuses on the ListOps dataset, which consists of nested mathematical operations. We gradually increase task difficulty by introducing…
▽ More
The ability of a model to learn a task depends strongly on both the task difficulty and the model size. We aim to understand how task difficulty relates to the minimum number of parameters required for learning specific tasks in small transformer models. Our study focuses on the ListOps dataset, which consists of nested mathematical operations. We gradually increase task difficulty by introducing new operations or combinations of operations into the training data. We observe that sum modulo n is the hardest to learn. Curiously, when combined with other operations such as maximum and median, the sum operation becomes easier to learn and requires fewer parameters. We show that joint training not only improves performance but also leads to qualitatively different model behavior. We show evidence that models trained only on SUM might be memorizing and fail to capture the number structure in the embeddings. In contrast, models trained on a mixture of SUM and other operations exhibit number-like representations in the embedding space, and a strong ability to distinguish parity. Furthermore, the SUM-only model relies more heavily on its feedforward layers, while the jointly trained model activates the attention mechanism more. Finally, we show that learning pure SUM can be induced in models below the learning threshold of pure SUM, by pretraining them on MAX+MED. Our findings indicate that emergent abilities in language models depend not only on model size, but also the training curriculum.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Maximum Size $t$-Intersecting Families and Anticodes
Authors:
Xuan Wang,
Tuvi Etzion,
Denis Krotov,
Minjia Shi
Abstract:
The maximum size of $t$-intersecting families is one of the most celebrated topics in combinatorics, and its size is known as the Erdős-Ko-Rado theorem. Such intersecting families, also known as constant-weight anticodes in coding theory, were considered in a generalization of the well-known sphere-packing bound. In this work we consider the maximum size of $t$-intersecting families and their asso…
▽ More
The maximum size of $t$-intersecting families is one of the most celebrated topics in combinatorics, and its size is known as the Erdős-Ko-Rado theorem. Such intersecting families, also known as constant-weight anticodes in coding theory, were considered in a generalization of the well-known sphere-packing bound. In this work we consider the maximum size of $t$-intersecting families and their associated maximum size constant-weight anticodes over alphabet of size $q >2$. It is proved that the structure of the maximum size constant-weight anticodes with the same length, weight, and diameter, depends on the alphabet size. This structure implies some hierarchy of constant-weight anticodes.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
M+: Extending MemoryLLM with Scalable Long-Term Memory
Authors:
Yu Wang,
Dmitry Krotov,
Yuanzhe Hu,
Yifan Gao,
Wangchunshu Zhou,
Julian McAuley,
Dan Gutfreund,
Rogerio Feris,
Zexue He
Abstract:
Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers…
▽ More
Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead. We open-source our code at https://github.com/wangyu-ustc/MemoryLLM
△ Less
Submitted 30 May, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Operator Learning for Reconstructing Flow Fields from Sparse Measurements: an Energy Transformer Approach
Authors:
Qian Zhang,
Dmitry Krotov,
George Em Karniadakis
Abstract:
Machine learning methods have shown great success in various scientific areas, including fluid mechanics. However, reconstruction problems, where full velocity fields must be recovered from partial observations, remain challenging. In this paper, we propose a novel operator learning framework for solving reconstruction problems by using the Energy Transformer (ET), an architecture inspired by asso…
▽ More
Machine learning methods have shown great success in various scientific areas, including fluid mechanics. However, reconstruction problems, where full velocity fields must be recovered from partial observations, remain challenging. In this paper, we propose a novel operator learning framework for solving reconstruction problems by using the Energy Transformer (ET), an architecture inspired by associative memory models. We formulate reconstruction as a mapping from incomplete observed data to full reconstructed fields. The method is validated on three fluid mechanics examples using diverse types of data: (1) unsteady 2D vortex street in flow past a cylinder using simulation data; (2) high-speed under-expanded impinging supersonic jets impingement using Schlieren imaging; and (3) 3D turbulent jet flow using particle tracking. The results demonstrate the ability of ET to accurately reconstruct complex flow fields from highly incomplete data (90\% missing), even for noisy experimental measurements, with fast training and inference on a single GPU. This work provides a promising new direction for tackling reconstruction problems in fluid mechanics and other areas in mechanics, geophysics, weather prediction, and beyond.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Low-degree functions without non-essential arguments
Authors:
Denis S. Krotov
Abstract:
For the Hamming graph $H(n,q)$, where a $q$ is a constant prime power and $n$ grows, we construct perfect colorings without non-essential arguments such that $n$ depends exponentially on the off-diagonal part of the quotient matrix. In particular, we construct unbalanced Boolean ($q=2$) functions such that the number of essential arguments depends exponentially on the degree of the function.
For the Hamming graph $H(n,q)$, where a $q$ is a constant prime power and $n$ grows, we construct perfect colorings without non-essential arguments such that $n$ depends exponentially on the off-diagonal part of the quotient matrix. In particular, we construct unbalanced Boolean ($q=2$) functions such that the number of essential arguments depends exponentially on the degree of the function.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Generalizing the Bierbrauer-Friedman bound for orthogonal arrays
Authors:
Denis S. Krotov,
Ferruh Özbudak,
Vladimir N. Potapov
Abstract:
We characterize mixed-level orthogonal arrays it terms of algebraic designs in a special multigraph. We prove a mixed-level analog of the Bierbrauer--Friedman (BF) bound for pure-level orthogonal arrays and show that arrays attaining it are radius-$1$ completely regular codes (equivalently, intriguing sets, equitable $2$-partitions, perfect $2$-colorings) in the corresponding multigraph. For the c…
▽ More
We characterize mixed-level orthogonal arrays it terms of algebraic designs in a special multigraph. We prove a mixed-level analog of the Bierbrauer--Friedman (BF) bound for pure-level orthogonal arrays and show that arrays attaining it are radius-$1$ completely regular codes (equivalently, intriguing sets, equitable $2$-partitions, perfect $2$-colorings) in the corresponding multigraph. For the case when the numbers of levels are powers of the same prime number, we characterize, in terms of multispreads, additive mixed-level orthogonal arrays attaining the BF bound. For pure-level orthogonal arrays, we consider versions of the BF bound obtained by replacing the Hamming graph by its polynomial generalization and show that in some cases this gives a new bound.
Keywords: orthogonal array, algebraic $t$-design, mixed orthogonal array, completely-regular code, equitable partition, intriguing set, Hamming graph, Bierbrauer-Friedman bound, additive code.
△ Less
Submitted 23 December, 2024; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Completely regular codes in graphs covered by a Hamming graph
Authors:
Sergey Goryainov,
Denis Krotov
Abstract:
In Cayley graphs on the additive group of a small vector space over GF$(q)$, $q=2,3$, we look for completely regular (CR) codes whose parameters are new in Hamming graphs over the same field. The existence of a CR code in such Cayley graph $G$ implies the existence of a CR code with the same parameters in the corresponding Hamming graph that covers $G$. In such a way, we find several completely re…
▽ More
In Cayley graphs on the additive group of a small vector space over GF$(q)$, $q=2,3$, we look for completely regular (CR) codes whose parameters are new in Hamming graphs over the same field. The existence of a CR code in such Cayley graph $G$ implies the existence of a CR code with the same parameters in the corresponding Hamming graph that covers $G$. In such a way, we find several completely regular codes with new parameters in Hamming graphs over GF$(3)$. The most interesting findings are two new CR-$1$ (with covering radius~$1$) codes that are independent sets (such CR are equivalent to optimal orthogonal arrays attaining the Bierbrauer--Friedman bound) and one new CR-$2$. By recursive constructions, every knew CR code induces an infinite sequence of CR codes (in particular, optimal orthogonal arrays if the original code was CR-$1$ and independent). In between, we classify feasible parameters of CR codes in several strongly regular graphs.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Dense Associative Memory Through the Lens of Random Features
Authors:
Benjamin Hoover,
Duen Horng Chau,
Hendrik Strobelt,
Parikshit Ram,
Dmitry Krotov
Abstract:
Dense Associative Memories are high storage capacity variants of the Hopfield networks that are capable of storing a large number of memory patterns in the weights of the network of a given size. Their common formulations typically require storing each pattern in a separate set of synaptic weights, which leads to the increase of the number of synaptic weights when new patterns are introduced. In t…
▽ More
Dense Associative Memories are high storage capacity variants of the Hopfield networks that are capable of storing a large number of memory patterns in the weights of the network of a given size. Their common formulations typically require storing each pattern in a separate set of synaptic weights, which leads to the increase of the number of synaptic weights when new patterns are introduced. In this work we propose an alternative formulation of this class of models using random features, commonly used in kernel methods. In this formulation the number of network's parameters remains fixed. At the same time, new memories can be added to the network by modifying existing weights. We show that this novel network closely approximates the energy function and dynamics of conventional Dense Associative Memories and shares their desirable computational properties.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Losing dimensions: Geometric memorization in generative diffusion
Authors:
Beatrice Achilli,
Enrico Ventura,
Gianluigi Silvestri,
Bao Pham,
Gabriel Raya,
Dmitry Krotov,
Carlo Lucibello,
Luca Ambrogioni
Abstract:
Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical ph…
▽ More
Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Triangle decompositions of PG(n,2)
Authors:
Minjia Shi,
Xiaoxiao Li,
Denis S. Krotov
Abstract:
We define a triangle design as a partition of the set of $2$-dimensional subspaces of an $n$-dimensional vector space into triangles, where a triangle consists of three subspaces with the trivial, $0$-dimensional, intersection and $1$-dimensional mutual intersections. A triangle design is balanced if all nonzero vectors are involved in the same number of triangles. Over the binary field GF$(2)$, w…
▽ More
We define a triangle design as a partition of the set of $2$-dimensional subspaces of an $n$-dimensional vector space into triangles, where a triangle consists of three subspaces with the trivial, $0$-dimensional, intersection and $1$-dimensional mutual intersections. A triangle design is balanced if all nonzero vectors are involved in the same number of triangles. Over the binary field GF$(2)$, we construct balanced triangle designs for all admissible $n$ (congruent to $1$ modulo $6$) and an infinite class of balanced block-divisible triangle designs. We also prove that the existence of a triangle design over GF$(2)$ invariant under the action of the Singer cycle group is equivalent to the existence of a partition of $Z_{2^n-1}\backslash\{0\}$ into special $18$-subsets and find such designs for $n=7$, $13$, $19$.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory
Authors:
Zexue He,
Leonid Karlinsky,
Donghyun Kim,
Julian McAuley,
Dmitry Krotov,
Rogerio Feris
Abstract:
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained…
▽ More
Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory capacity and require costly re-training to integrate with a new LLM. In this work, we introduce an associative memory module which can be coupled to any pre-trained (frozen) attention-based LLM without re-training, enabling it to handle arbitrarily long input sequences. Unlike previous methods, our associative memory module consolidates representations of individual tokens into a non-parametric distribution model, dynamically managed by properly balancing the novelty and recency of the incoming data. By retrieving information from this consolidated associative memory, the base LLM can achieve significant (up to 29.7% on Arxiv) perplexity reduction in long-context modeling compared to other baselines evaluated on standard benchmarks. This architecture, which we call CAMELoT (Consolidated Associative Memory Enhanced Long Transformer), demonstrates superior performance even with a tiny context window of 128 tokens, and also enables improved in-context learning with a much larger set of demonstrations.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
On the existence of some completely regular codes in Hamming graphs
Authors:
Denis S. Krotov
Abstract:
We solve several first questions in the table of small parameters of completely regular (CR) codes in Hamming graphs $H(n,q)$. The most uplifting result is the existence of a $\{13,6,1;1,6,9\}$-CR code in $H(n,2)$, $n\ge 13$. We also establish the non-existence of a $\{11,4;3,6\}$-code and a $\{10,3;4,7\}$-code in $H(12,2)$ and $H(13,2)$. A partition of the complement of the quaternary Hamming cod…
▽ More
We solve several first questions in the table of small parameters of completely regular (CR) codes in Hamming graphs $H(n,q)$. The most uplifting result is the existence of a $\{13,6,1;1,6,9\}$-CR code in $H(n,2)$, $n\ge 13$. We also establish the non-existence of a $\{11,4;3,6\}$-code and a $\{10,3;4,7\}$-code in $H(12,2)$ and $H(13,2)$. A partition of the complement of the quaternary Hamming code of length~$5$ into $4$-cliques is found, which can be used to construct completely regular codes with covering radius $1$ by known constructions. Additionally we discuss the parameters $\{24,21,10;1,4,12\}$ of a putative completely regular code in $H(24,2)$ and show the nonexistence of such a code in $H(8,4)$.
Keywords: Hamming graph, equitable partition, completely regular code
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Multispreads
Authors:
Denis S. Krotov,
Ivan Yu. Mogilnykh
Abstract:
Additive one-weight codes over a finite field of non-prime order are equivalent to special subspace coverings of the points of projective space, which we call multispreads. The current paper is devoted to the characterization of the parameters of multispreads, which is equivalent to the characterization of the parameters of additive one-weight codes. We characterize these parameters for the case o…
▽ More
Additive one-weight codes over a finite field of non-prime order are equivalent to special subspace coverings of the points of projective space, which we call multispreads. The current paper is devoted to the characterization of the parameters of multispreads, which is equivalent to the characterization of the parameters of additive one-weight codes. We characterize these parameters for the case of the prime-square order of the field and make a partial characterization for the prime-cube case and the case of the fourth degree of a prime (including a complete characterization for orders 8, 27, and 16).
△ Less
Submitted 19 March, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Neuron-Astrocyte Associative Memory
Authors:
Leo Kozachkov,
Jean-Jacques Slotine,
Dmitry Krotov
Abstract:
Astrocytes, the most abundant type of glial cell, play a fundamental role in memory. Despite most hippocampal synapses being contacted by an astrocyte, there are no current theories that explain how neurons, synapses, and astrocytes might collectively contribute to memory function. We demonstrate that fundamental aspects of astrocyte morphology and physiology naturally lead to a dynamic, high-capa…
▽ More
Astrocytes, the most abundant type of glial cell, play a fundamental role in memory. Despite most hippocampal synapses being contacted by an astrocyte, there are no current theories that explain how neurons, synapses, and astrocytes might collectively contribute to memory function. We demonstrate that fundamental aspects of astrocyte morphology and physiology naturally lead to a dynamic, high-capacity associative memory system. The neuron-astrocyte networks generated by our framework are closely related to popular machine learning architectures known as Dense Associative Memories or Modern Hopfield Networks. In their known biological implementations the ratio of stored memories to the number of neurons remains constant, despite the growth of the network size. Our work demonstrates that neuron-astrocyte networks follow superior, supralinear memory scaling laws, outperforming all known biological implementations of Dense Associative Memory. This theoretical link suggests the exciting and previously unnoticed possibility that memories could be stored, at least in part, within astrocytes rather than solely in the synaptic weights between neurons.
△ Less
Submitted 22 July, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
On degree-$3$ and $(n-4)$-correlation-immune perfect colorings of $n$-cubes
Authors:
Denis S. Krotov,
Alexandr A. Valyuzhenich
Abstract:
A perfect $k$-coloring of the Boolean hypercube $Q_n$ is a function from the set of binary words of length $n$ onto a $k$-set of colors such that for any colors $i$ and $j$ every word of color $i$ has exactly $S(i,j)$ neighbors (at Hamming distance $1$) of color $j$, where the coefficient $S(i,j)$ depends only on $i$ and $j$ but not on the particular choice of the word. The $k$-by-$k$ table of all…
▽ More
A perfect $k$-coloring of the Boolean hypercube $Q_n$ is a function from the set of binary words of length $n$ onto a $k$-set of colors such that for any colors $i$ and $j$ every word of color $i$ has exactly $S(i,j)$ neighbors (at Hamming distance $1$) of color $j$, where the coefficient $S(i,j)$ depends only on $i$ and $j$ but not on the particular choice of the word. The $k$-by-$k$ table of all coefficients $S(i,j)$ is called the quotient matrix. We characterize perfect colorings of $Q_n$ of degree at most $3$, that is, with quotient matrix whose all eigenvalues are not less than $n-6$, or, equivalently, such that every color corresponds to a Boolean function represented by a polynomial of degree at most $3$ over $R$. Additionally, we characterize $(n-4)$-correlation-immune perfect colorings of $Q_n$, whose all colors correspond to $(n-4)$-correlation-immune Boolean functions, or, equivalently, all non-main (different from $n$) eigenvalues of the quotient matrix are not greater than $6-n$.
Keywords: perfect coloring, equitable partition, resilient function, correlation-immune function.
△ Less
Submitted 23 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
The classification of orthogonal arrays OA(2048,14,2,7) and some completely regular codes
Authors:
Denis S. Krotov
Abstract:
We describe the classification of orthogonal arrays OA$(2048,14,2,7)$, or, equivalently, completely regular $\{14;2\}$-codes in the $14$-cube ($30848$ equivalence classes). In particular, we find that there is exactly one almost-OA$(2048,14,2,7{+}1)$, up to equivalence. As derived objects, OA$(1024,13,2,6)$ ($202917$ classes) and completely regular $\{12,2;2,12\}$- and $\{14, 12, 2; 2, 12, 14\}$-c…
▽ More
We describe the classification of orthogonal arrays OA$(2048,14,2,7)$, or, equivalently, completely regular $\{14;2\}$-codes in the $14$-cube ($30848$ equivalence classes). In particular, we find that there is exactly one almost-OA$(2048,14,2,7{+}1)$, up to equivalence. As derived objects, OA$(1024,13,2,6)$ ($202917$ classes) and completely regular $\{12,2;2,12\}$- and $\{14, 12, 2; 2, 12, 14\}$-codes in the $13$- and $14$-cubes, respectively, are also classified.
Keywords: binary orthogonal array, completely regular code, binary 1-perfect code.
△ Less
Submitted 12 June, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models
Authors:
Benjamin Hoover,
Hendrik Strobelt,
Dmitry Krotov,
Judy Hoffman,
Zsolt Kira,
Duen Horng Chau
Abstract:
The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associa…
▽ More
The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs), making efforts to keep our presentation approachable to newcomers to both of these fields. Unifying these two fields provides insight that DMs can be seen as a particular kind of AM where Lyapunov stability guarantees are bypassed by intelligently engineering the dynamics (i.e., the noise and step size schedules) of the denoising process. Finally, we present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
△ Less
Submitted 28 May, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Long Sequence Hopfield Memory
Authors:
Hamza Tahir Chaudhry,
Jacob A. Zavatone-Veth,
Dmitry Krotov,
Cengiz Pehlevan
Abstract:
Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maxi…
▽ More
Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maximal length of the stored sequence) due to interference between the memories. Inspired by recent work on Dense Associative Memories, we expand the sequence capacity of these models by introducing a nonlinear interaction term, enhancing separation between the patterns. We derive novel scaling laws for sequence capacity with respect to network size, significantly outperforming existing scaling laws for models based on traditional Hopfield networks, and verify these theoretical results with numerical simulation. Moreover, we introduce a generalized pseudoinverse rule to recall sequences of highly correlated patterns. Finally, we extend this model to store sequences with variable timing between states' transitions and describe a biologically-plausible implementation, with connections to motor neuroscience.
△ Less
Submitted 2 November, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
End-to-end Differentiable Clustering with Associative Memories
Authors:
Bishwajit Saha,
Dmitry Krotov,
Mohammed J. Zaki,
Parikshit Ram
Abstract:
Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in cluste…
▽ More
Clustering is a widely used unsupervised learning technique involving an intensive discrete optimization problem. Associative Memory models or AMs are differentiable neural networks defining a recursive dynamical system, which have been integrated with various deep learning architectures. We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. Leveraging the pattern completion ability of AMs, we further develop a novel self-supervised clustering loss. Our evaluations on varied datasets demonstrate that ClAM benefits from the self-supervision, and significantly improves upon both the traditional Lloyd's k-means algorithm, and more recent continuous clustering relaxations (by upto 60% in terms of the Silhouette Coefficient).
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Quasi-cyclic perfect codes in Doob graphs and special partitions of Galois rings
Authors:
Minjia Shi,
Xiaoxiao Li,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the…
▽ More
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the Teichmuller set in GR$(4^Δ)$ and, if $Δ$ is not a multiple of $3$, under the action of the automorphism group of GR$(4^Δ)$.
As a corollary, this implies the existence of quasi-cyclic additive $1$-perfect codes of index $(2^Δ-1)$ in $D((2^Δ-1)(2^Δ-2)/{6}, 2^Δ-1 )$ where $D(m,n)$ is the Doob metric scheme on $Z^{2m+n}$.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Do K33-Free Latin Squares Exist?
Authors:
Aleksandr D. Krotov,
Denis S. Krotov
Abstract:
We discuss the problem of existence of latin squares without a substructure consisting of six elements $(r_1,c_2,l_3)$, $(r_2,c_3,l_1)$, $(r_3,c_1,l_2)$, $(r_2,c_1,l_3)$, $(r_3,c_2,l_1)$, $(r_1,c_3,l_2)$. Equivalently, the corresponding latin square graph does not have an induced subgraph isomorphic to $K_{3,3}$. The exhaustive search [Brouwer, Wanless. Universally noncommutative loops. 2011] says…
▽ More
We discuss the problem of existence of latin squares without a substructure consisting of six elements $(r_1,c_2,l_3)$, $(r_2,c_3,l_1)$, $(r_3,c_1,l_2)$, $(r_2,c_1,l_3)$, $(r_3,c_2,l_1)$, $(r_1,c_3,l_2)$. Equivalently, the corresponding latin square graph does not have an induced subgraph isomorphic to $K_{3,3}$. The exhaustive search [Brouwer, Wanless. Universally noncommutative loops. 2011] says that there are no such latin squares of order from $3$ to $11$, and there are only two $K_{3,3}$-free latin squares of order $8$, up to equivalence. We repeat the search, establishing also the number of latin $m$-by-$n$ rectangles for each $m$ and $n$ less or equal to $11$. As a switched combination of two orthogonal latin squares of order $8$, we construct a $K_{3,3}$-free (universally noncommutative) latin square of order $16$.
Keywords: latin square; transversal; trade; pattern avoiding; eigenfunction; universally noncommutative loops.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Sparse Distributed Memory is a Continual Learner
Authors:
Trenton Bricken,
Xander Davies,
Deepak Singh,
Dmitry Krotov,
Gabriel Kreiman
Abstract:
Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from bio…
▽ More
Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Energy Transformer
Authors:
Benjamin Hoover,
Yuchen Liang,
Bao Pham,
Rameswar Panda,
Hendrik Strobelt,
Duen Horng Chau,
Mohammed J. Zaki,
Dmitry Krotov
Abstract:
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not st…
▽ More
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory. Attention is the power-house driving modern deep learning successes, but it lacks clear theoretical foundations. Energy-based models allow a principled approach to discriminative and generative tasks, but the design of the energy functional is not straightforward. At the same time, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, and allow an intuitive design of the energy function. We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. In this work, we introduce the theoretical foundations of ET, explore its empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection and graph classification tasks.
△ Less
Submitted 31 October, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
An upper bound on the number of frequency hypercubes
Authors:
Denis S. Krotov,
Vladimir N. Potapov
Abstract:
A frequency $n$-cube $F^n(q;l_0,...,l_{m-1})$ is an $n$-dimensional $q$-by-...-by-$q$ array, where $q = l_0+...+l_{m-1}$, filled by numbers $0,...,m-1$ with the property that each line contains exactly $l_i$ cells with symbol $i$, $i = 0,...,m-1$ (a line consists of $q$ cells of the array differing in one coordinate). The trivial upper bound on the number of frequency $n$-cubes is $m^{(q-1)^{n}}$.…
▽ More
A frequency $n$-cube $F^n(q;l_0,...,l_{m-1})$ is an $n$-dimensional $q$-by-...-by-$q$ array, where $q = l_0+...+l_{m-1}$, filled by numbers $0,...,m-1$ with the property that each line contains exactly $l_i$ cells with symbol $i$, $i = 0,...,m-1$ (a line consists of $q$ cells of the array differing in one coordinate). The trivial upper bound on the number of frequency $n$-cubes is $m^{(q-1)^{n}}$. We improve that lower bound for $n>2$, replacing $q-1$ by a smaller value, by constructing a testing set of size $s^{n}$, $s<q-1$, for frequency $n$-cubes (a testing sets is a collection of cells of an array the values in which uniquely determine the array with given parameters). We also construct new testing sets for generalized frequency $n$-cubes, which are essentially correlation-immune functions in $n$ $q$-valued arguments; the cardinalities of new testing sets are smaller than for testing sets known before.
Keywords: frequency hypercube, correlation-immune function, latin hypercube, testing set.
△ Less
Submitted 12 June, 2024; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Multifold 1-perfect codes
Authors:
Denis S. Krotov
Abstract:
A multifold $1$-perfect code ($1$-perfect code for list decoding) in any graph is a set $C$ of vertices such that every vertex of the graph is at distance not more than $1$ from exactly $μ$ elements of $C$. In $q$-ary Hamming graphs, where $q$ is a prime power, we characterize all parameters of multifold $1$-perfect codes and all parameters of additive multifold $1$-perfect codes. In particular, w…
▽ More
A multifold $1$-perfect code ($1$-perfect code for list decoding) in any graph is a set $C$ of vertices such that every vertex of the graph is at distance not more than $1$ from exactly $μ$ elements of $C$. In $q$-ary Hamming graphs, where $q$ is a prime power, we characterize all parameters of multifold $1$-perfect codes and all parameters of additive multifold $1$-perfect codes. In particular, we show that additive multifold $1$-perfect codes are related to special multiset generalizations of spreads, multispreads, and that multispreads of parameters corresponding to multifold $1$-perfect codes always exist.
Keywords: perfect codes, multifold packing, multiple covering, list-decoding codes, additive codes, spreads, multispreads, completely regular codes, intriguing sets.
△ Less
Submitted 15 December, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
A family of diameter perfect constant-weight codes from Steiner systems
Authors:
Minjia Shi,
Yuhong Xia,
Denis S. Krotov
Abstract:
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair…
▽ More
If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair $(C,A)$ attains the code--anticode bound and the supports of the codewords of $C$ are the blocks of $S$ (respectively, the complements of the blocks of $S$). We study the problem of estimating the minimum value of $q$ for which such a code exists, and find that minimum for small values of $t$.
Keywords: diameter perfect codes, anticodes, constant-weight codes, code--anticode bound, Steiner systems.
△ Less
Submitted 31 July, 2023; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Constructing MRD codes by switching
Authors:
Minjia Shi,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twis…
▽ More
MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twisted Gabidulin codes and direct-product codes. Using switching, we construct a huge class of MRD codes whose cardinality grows doubly exponentially in $m$ if the other parameters ($n$, $q$, the code distance) are fixed. Moreover, we construct MRD codes with different affine ranks and aperiodic MRD codes.
Keywords: MRD codes, rank distance, bilinear forms graph, switching, diameter perfect codes
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Associative Learning for Network Embedding
Authors:
Yuchen Liang,
Dmitry Krotov,
Mohammed J. Zaki
Abstract:
The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for as…
▽ More
The network embedding task is to represent the node in the network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different downstream tasks such as node classification and linkage prediction. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Projective tilings and full-rank perfect codes
Authors:
Denis S. Krotov
Abstract:
A tiling of a vector space $S$ is the pair $(U,V)$ of its subsets such that every vector in $S$ is uniquely represented as the sum of a vector from $U$ and a vector from $V$. A tiling is connected to a perfect codes if one of the sets, say $U$, is projective, i.e., the union of one-dimensional subspaces of $S$. A tiling $(U,V)$ is full-rank if the affine span of each of $U$, $V$ is $S$. For finite…
▽ More
A tiling of a vector space $S$ is the pair $(U,V)$ of its subsets such that every vector in $S$ is uniquely represented as the sum of a vector from $U$ and a vector from $V$. A tiling is connected to a perfect codes if one of the sets, say $U$, is projective, i.e., the union of one-dimensional subspaces of $S$. A tiling $(U,V)$ is full-rank if the affine span of each of $U$, $V$ is $S$. For finite non-binary vector spaces of dimension at least $6$ (at least $10$), we construct full-rank tilings $(U,V)$ with projective $U$ (both $U$ and $V$, respectively). In particular, that construction gives a full-rank ternary $1$-perfect code of length $13$, solving a known problem. We also discuss the treatment of tilings with projective components as factorizations of projective spaces.
Keywords: perfect codes, tilings, group factorization, full-rank tilings, projective geometry
△ Less
Submitted 12 June, 2024; v1 submitted 30 June, 2022;
originally announced July 2022.
-
On the coset graph construction of distance-regular graphs
Authors:
Minjia Shi,
Denis S. Krotov,
Patrick Solé
Abstract:
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
We show that no more new distance-regular graphs in the tables of the book of (Brouwer, Cohen, Neumaier, 1989) can be produced by using the coset graph of additive completely regular codes over finite fields.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Self-dual Hadamard bent sequences
Authors:
Minjia Shi,
Yaya Li,
Wei Cheng,
Dean Crnković,
Denis Krotov,
Patrick Solé
Abstract:
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an…
▽ More
A new notion of bent sequence related to Hadamard matrices was introduced recently, motivated by a security application ( Solé et al, 2021). We study the self dual class in length at most $196.$ We use three competing methods of generation: Exhaustion, Linear Algebra and Groebner bases. Regular Hadamard matrices and Bush-type Hadamard matrices provide many examples. We conjecture that if $v$ is an even perfect square, a self-dual bent sequence of length $v$ always exist. We introduce the strong automorphism group of Hadamard matrices, which acts on their associated self-dual bent sequences. We give an efficient algorithm to compute that group.
△ Less
Submitted 22 June, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
An enumeration of 1-perfect ternary codes
Authors:
Minjia Shi,
Denis S. Krotov
Abstract:
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary…
▽ More
We study codes with parameters of the ternary Hamming $(n=(3^m-1)/2,3^{n-m},3)$ code, i.e., ternary $1$-perfect codes. The rank of the code is defined to be the dimension of its affine span. We characterize ternary $1$-perfect codes of rank $n-m+1$, count their number, and prove that all such codes can be obtained from each other by a sequence of two-coordinate switchings. We enumerate ternary $1$-perfect codes of length $13$ obtained by concatenation from codes of lengths $9$ and $4$; we find that there are $93241327$ equivalence classes of such codes.
Keywords: perfect codes, ternary codes, concatenation, switching.
△ Less
Submitted 8 April, 2023; v1 submitted 12 October, 2021;
originally announced October 2021.
-
On $q$-ary shortened-$1$-perfect-like codes
Authors:
Minjia Shi,
Rongsheng Wu,
Denis S. Krotov
Abstract:
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing wi…
▽ More
We study codes with parameters of $q$-ary shortened Hamming codes, i.e., $(n=(q^m-q)/(q-1), q^{n-m}, 3)_q$. Firstly, we prove the fact mentioned in 1998 by Brouwer et al. that such codes are optimal, generalizing it to a bound for multifold packings of radius-$1$ balls, with a corollary for multiple coverings. In particular, we show that the punctured Hamming code is an optimal $q$-fold packing with minimum distance $2$. Secondly, for every admissible length starting from $n=20$, we show the existence of $4$-ary codes with parameters of shortened $1$-perfect codes that cannot be obtained by shortening a $1$-perfect code.
Keywords: Hamming graph, multifold packings, multiple coverings, perfect codes.
△ Less
Submitted 28 June, 2023; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Hierarchical Associative Memory
Authors:
Dmitry Krotov
Abstract:
Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer,…
▽ More
Dense Associative Memories or Modern Hopfield Networks have many appealing properties of associative memory. They can do pattern completion, store a large number of memories, and can be described using a recurrent neural network with a degree of biological plausibility and rich feedback between the neurons. At the same time, up until now all the models of this class have had only one hidden layer, and have only been formulated with densely connected network architectures, two aspects that hinder their machine learning applications. This paper tackles this gap and describes a fully recurrent model of associative memory with an arbitrary large number of layers, some of which can be locally connected (convolutional), and a corresponding energy function that decreases on the dynamical trajectory of the neurons' activations. The memories of the full network are dynamically "assembled" using primitives encoded in the synaptic weights of the lower layers, with the "assembling rules" encoded in the synaptic weights of the higher layers. In addition to the bottom-up propagation of information, typical of commonly used feedforward neural networks, the model described has rich top-down feedback from higher layers that help the lower-layer neurons to decide on their response to the input stimuli.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Zero sum sets in abelian groups
Authors:
Minjia Shi,
Denis S. Krotov,
Xiaoxiao Li,
Patrick Solé
Abstract:
The distribution of cardinalities of zero-sum sets in abelian groups is completely determined. A complex summation involving the Möbius function is given for the general abelian group, while in many special cases, including the case of elementary abelian groups, solved earlier by Li and Wan, it has a compact form. The proof involves two different Möbius transforms, on positive integers and on set…
▽ More
The distribution of cardinalities of zero-sum sets in abelian groups is completely determined. A complex summation involving the Möbius function is given for the general abelian group, while in many special cases, including the case of elementary abelian groups, solved earlier by Li and Wan, it has a compact form. The proof involves two different Möbius transforms, on positive integers and on set partitions.
△ Less
Submitted 7 February, 2021; v1 submitted 29 January, 2021;
originally announced February 2021.
-
Can a Fruit Fly Learn Word Embeddings?
Authors:
Yuchen Liang,
Chaitanya K. Ryali,
Benjamin Hoover,
Leopold Grinberg,
Saket Navlakha,
Mohammed J. Zaki,
Dmitry Krotov
Abstract:
The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this n…
▽ More
The mushroom body of the fruit fly brain is one of the best studied systems in neuroscience. At its core it consists of a population of Kenyon cells, which receive inputs from multiple sensory modalities. These cells are inhibited by the anterior paired lateral neuron, thus creating a sparse high dimensional representation of the inputs. In this work we study a mathematical formalization of this network motif and apply it to learning the correlational structure between words and their context in a corpus of unstructured text, a common natural language processing (NLP) task. We show that this network can learn semantic representations of words and can generate both static and context-dependent word embeddings. Unlike conventional methods (e.g., BERT, GloVe) that use dense representations for word embedding, our algorithm encodes semantic meaning of words and their context in the form of sparse binary hash codes. The quality of the learned representations is evaluated on word similarity analysis, word-sense disambiguation, and document classification. It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources (shorter training time and smaller memory footprint).
△ Less
Submitted 14 March, 2021; v1 submitted 18 January, 2021;
originally announced January 2021.
-
On extended 1-perfect bitrades
Authors:
Evgeny A. Bespalov,
Denis S. Krotov
Abstract:
Extended $1$-perfect codes in the Hamming scheme $H(n,q)$ can be equivalently defined as codes that turn to $1$-perfect codes after puncturing in any coordinate, as completely regular codes with certain intersection array, as uniformly packed codes with certain weight coefficients, as diameter perfect codes with respect to a certain anticode, as distance-$4$ codes with certain dual distances. We d…
▽ More
Extended $1$-perfect codes in the Hamming scheme $H(n,q)$ can be equivalently defined as codes that turn to $1$-perfect codes after puncturing in any coordinate, as completely regular codes with certain intersection array, as uniformly packed codes with certain weight coefficients, as diameter perfect codes with respect to a certain anticode, as distance-$4$ codes with certain dual distances. We define extended $1$-perfect bitrades in $H(n,q)$ in five different manners, corresponding to the different definitions of extended $1$-perfect codes, and prove the equivalence of these definitions of extended $1$-perfect bitrades. For $q=2^m$, we prove that such bitrades exist if and only if $n=lq+2$. For any $q$, we prove the nonexistence of extended $1$-perfect bitrades if $n$ is odd.
Keywords: Perfect code, Extended perfect code, Bitrade, Completely regular code, Uniformly packed code.
△ Less
Submitted 29 August, 2024; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Equitable [[2,10],[6,6]]-partitions of the 12-cube
Authors:
Denis S. Krotov
Abstract:
We describe the computer-aided classification of equitable partitions of the $12$-cube with quotient matrix $[[2,10],[6,6]]$, or, equivalently, simple orthogonal arrays OA$(1536,12,2,7)$, or order-$7$ correlation-immune Boolean functions in $12$ variables with $1536$ ones (which completes the classification of unbalanced order-$7$ correlation-immune Boolean functions in $12$ variables). We find th…
▽ More
We describe the computer-aided classification of equitable partitions of the $12$-cube with quotient matrix $[[2,10],[6,6]]$, or, equivalently, simple orthogonal arrays OA$(1536,12,2,7)$, or order-$7$ correlation-immune Boolean functions in $12$ variables with $1536$ ones (which completes the classification of unbalanced order-$7$ correlation-immune Boolean functions in $12$ variables). We find that there are $103$ equivalence classes of the considered objects, and there are only two almost-OA$(1536,12,2,8)$ among them. Additionally, we find that there are $40$ equivalence classes of pairs of disjoint simple OA$(1536,12,2,7)$ (equivalently, equitable partitions of the $12$-cube with quotient matrix $[[2,6,4], [6,2,4], [6,6,0]]$) and discuss the existence of a non-simple OA$(1536,12,2,7)$.
Keywords: orthogonal arrays, correlation-immune Boolean functions, equitable partitions, perfect colorings, intriguing sets.
△ Less
Submitted 1 October, 2023; v1 submitted 30 November, 2020;
originally announced December 2020.
-
On minimal subspace Zp-null designs
Authors:
Denis S. Krotov
Abstract:
Let $q$ be a power of a prime $p$, and let $V$ be an $n$-dimensional space over the field GF$(q)$. A $Z_p$-valued function $C$ on the set of $k$-dimensional subspaces of $V$ is called a $k$-uniform $Z_p$-null design of strength $t$ if for every $t$-dimensional subspace $y$ of $V$ the sum of $C$ over the $k$-dimensional superspaces of $y$ equals $0$. For $q=p=2$ and $0\le t<k<n$, we prove that the…
▽ More
Let $q$ be a power of a prime $p$, and let $V$ be an $n$-dimensional space over the field GF$(q)$. A $Z_p$-valued function $C$ on the set of $k$-dimensional subspaces of $V$ is called a $k$-uniform $Z_p$-null design of strength $t$ if for every $t$-dimensional subspace $y$ of $V$ the sum of $C$ over the $k$-dimensional superspaces of $y$ equals $0$. For $q=p=2$ and $0\le t<k<n$, we prove that the minimum number of non-zeros of a non-void $k$-uniform $Z_p$-null design of strength $t$ equals $2^{t+1}$. For $q>2$, we give lower and upper bounds for that number.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Perfect colorings of the infinite square grid: coverings and twin colors
Authors:
Denis S. Krotov
Abstract:
A perfect coloring (equivalent concepts are equitable partition and partition design) of a graph $G$ is a function $f$ from the set of vertices onto some finite set (of colors) such that every node of color $i$ has exactly $S(i,j)$ neighbors of color $j$, where $S(i,j)$ are constants, forming the matrix $S$ called quotient. If $S$ is an adjacency matrix of some simple graph $T$ on the set of color…
▽ More
A perfect coloring (equivalent concepts are equitable partition and partition design) of a graph $G$ is a function $f$ from the set of vertices onto some finite set (of colors) such that every node of color $i$ has exactly $S(i,j)$ neighbors of color $j$, where $S(i,j)$ are constants, forming the matrix $S$ called quotient. If $S$ is an adjacency matrix of some simple graph $T$ on the set of colors, then $f$ is called a covering of the target graph $T$ by the cover graph $G$. We characterize all coverings by the infinite square grid, proving that every such coloring is either orbit (that is, corresponds to the orbit partition under the action of some group of graph automorphisms) or has twin colors (that is, two colors such that unifying them keeps the coloring perfect). The case of twin colors is separately classified.
Keywords: perfect coloring, equitable partition, partition design, square grid, rectangular grid, wallpaper group, twin colors, graph covering
△ Less
Submitted 8 April, 2023; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Large Associative Memory Problem in Neurobiology and Machine Learning
Authors:
Dmitry Krotov,
John Hopfield
Abstract:
Dense Associative Memories or modern Hopfield networks permit storage and reliable retrieval of an exponentially large (in the dimension of feature space) number of memories. At the same time, their naive implementation is non-biological, since it seemingly requires the existence of many-body synaptic junctions between the neurons. We show that these models are effective descriptions of a more mic…
▽ More
Dense Associative Memories or modern Hopfield networks permit storage and reliable retrieval of an exponentially large (in the dimension of feature space) number of memories. At the same time, their naive implementation is non-biological, since it seemingly requires the existence of many-body synaptic junctions between the neurons. We show that these models are effective descriptions of a more microscopic (written in terms of biological degrees of freedom) theory that has additional (hidden) neurons and only requires two-body interactions between them. For this reason our proposed microscopic theory is a valid model of large associative memory with a degree of biological plausibility. The dynamics of our network and its reduced dimensional equivalent both minimize energy (Lyapunov) functions. When certain dynamical variables (hidden neurons) are integrated out from our microscopic theory, one can recover many of the models that were previously discussed in the literature, e.g. the model presented in "Hopfield Networks is All You Need" paper. We also provide an alternative derivation of the energy function and the update rule proposed in the aforementioned paper and clarify the relationships between various models of this class.
△ Less
Submitted 27 April, 2021; v1 submitted 16 August, 2020;
originally announced August 2020.
-
On the number of frequency hypercubes $F^n(4;2,2)$
Authors:
Minjia Shi,
Shukai Wang,
Xiaoxiao Li,
Denis S. Krotov
Abstract:
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refi…
▽ More
A frequency $n$-cube $F^n(4;2,2)$ is an $n$-dimensional $4$-by-...-by-$4$ array filled by $0$s and $1$s such that each line contains exactly two $1$s. We classify the frequency $4$-cubes $F^4(4;2,2)$, find a testing set of size $25$ for $F^3(4;2,2)$, and derive an upper bound on the number of $F^n(4;2,2)$. Additionally, for any $n$ greater than $2$, we construct an $F^n(4;2,2)$ that cannot be refined to a latin hypercube, while each of its sub-$F^{n-1}(4;2,2)$ can.
Keywords: frequency hypercube, frequency square, latin hypercube, testing set, MDS code
△ Less
Submitted 21 April, 2021; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Bio-Inspired Hashing for Unsupervised Similarity Search
Authors:
Chaitanya K. Ryali,
John J. Hopfield,
Leopold Grinberg,
Dmitry Krotov
Abstract:
The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses rand…
▽ More
The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.
△ Less
Submitted 30 June, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Perfect 2-colorings of Hamming graphs
Authors:
Evgeny A. Bespalov,
Denis S. Krotov,
Aleksandr A. Matiushev,
Anna A. Taranenko,
Konstantin V. Vorob'ev
Abstract:
We consider the problem of existence of perfect $2$-colorings (equitable $2$-partitions) of Hamming graphs with given parameters. We start with conditions on parameters of graphs and colorings that are necessary for their existence. Next we observe known constructions of perfect colorings and propose some new ones giving new parameters. At last, we deduce which parameters of colorings are covered…
▽ More
We consider the problem of existence of perfect $2$-colorings (equitable $2$-partitions) of Hamming graphs with given parameters. We start with conditions on parameters of graphs and colorings that are necessary for their existence. Next we observe known constructions of perfect colorings and propose some new ones giving new parameters. At last, we deduce which parameters of colorings are covered by these constructions and give tables of admissible parameters of $2$-colorings in Hamming graphs $H(n,q)$ for small $n$ and $q$. Using the connection with perfect colorings, we construct an orthogonal array OA(2048,7,4,5).
△ Less
Submitted 6 February, 2021; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Local Unsupervised Learning for Image Analysis
Authors:
Leopold Grinberg,
John Hopfield,
Dmitry Krotov
Abstract:
Local Hebbian learning is believed to be inferior in performance to end-to-end training using a backpropagation algorithm. We question this popular belief by designing a local algorithm that can learn convolutional filters at scale on large image datasets. These filters combined with patch normalization and very steep non-linearities result in a good classification accuracy for shallow networks tr…
▽ More
Local Hebbian learning is believed to be inferior in performance to end-to-end training using a backpropagation algorithm. We question this popular belief by designing a local algorithm that can learn convolutional filters at scale on large image datasets. These filters combined with patch normalization and very steep non-linearities result in a good classification accuracy for shallow networks trained locally, as opposed to end-to-end. The filters learned by our algorithm contain both orientation selective units and unoriented color units, resembling the responses of pyramidal neurons located in the cytochrome oxidase 'interblob' and 'blob' regions in the primary visual cortex of primates. It is shown that convolutional networks with patch normalization significantly outperform standard convolutional networks on the task of recovering the original classes when shadows are superimposed on top of standard CIFAR-10 images. Patch normalization approximates the retinal adaptation to the mean light intensity, important for human vision. We also demonstrate a successful transfer of learned representations between CIFAR-10 and ImageNet 32x32 datasets. All these results taken together hint at the possibility that local unsupervised training might be a powerful tool for learning general representations (without specifying the task) directly from unlabeled data.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
On the number of resolvable Steiner triple systems of small 3-rank
Authors:
Minjia Shi,
Li Xu,
Denis S. Krotov
Abstract:
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
In a recent work, Jungnickel, Magliveras, Tonchev, and Wassermann derived an overexponential lower bound on the number of nonisomorphic resolvable Steiner triple systems (STS) of order $v$, where $v=3^k$, and $3$-rank $v-k$. We develop an approach to generalize this bound and estimate the number of isomorphism classes of STS$(v)$ of rank $v-k-1$ for an arbitrary $v$ of form $3^kT$.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
On the OA(1536,13,2,7) and related orthogonal arrays
Authors:
Denis S. Krotov
Abstract:
With a computer-aided approach based on the connection with equitable partitions, we establish the uniqueness of the orthogonal array OA$(1536,13,2,7)$, constructed in [D.G.Fon-Der-Flaass. Perfect $2$-Colorings of a Hypercube, Sib. Math. J. 48 (2007), 740-745] as an equitable partition of the $13$-cube with quotient matrix $[[0,13],[3,10]]$. By shortening the OA$(1536,13,2,7)$, we obtain $3$ inequ…
▽ More
With a computer-aided approach based on the connection with equitable partitions, we establish the uniqueness of the orthogonal array OA$(1536,13,2,7)$, constructed in [D.G.Fon-Der-Flaass. Perfect $2$-Colorings of a Hypercube, Sib. Math. J. 48 (2007), 740-745] as an equitable partition of the $13$-cube with quotient matrix $[[0,13],[3,10]]$. By shortening the OA$(1536,13,2,7)$, we obtain $3$ inequivalent orthogonal arrays OA$(768,12,2,6)$, which is a complete classification for these parameters too. After our computing, the first parameters of unclassified binary orthogonal arrays OA$(N,n,2,t)$ attending the Friedman bound $N\ge 2^n(1-n/2(t+1))$ are OA$(2048,14,2,7)$. Such array can be obtained by puncturing any binary $1$-perfect code of length $15$. We construct orthogonal arrays with these and similar parameters OA$(N=2^{n-m+1},n=2^m-2,2,t=2^{m-1}-1)$, $m\ge 4$, that are not punctured $1$-perfect codes. Additionally, we prove that any orthogonal array OA$(N,n,2,t)$ with even $t$ attending the bound $N \ge 2^n(1-(n+1)/2(t+2))$ induces an equitable $3$-partition of the $n$-cube.
△ Less
Submitted 9 December, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
The Steiner triple systems of order 21 with a transversal subdesign TD(3,6)
Authors:
Yue Guan,
Minjia Shi,
Denis S. Krotov
Abstract:
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
We prove several structural properties of Steiner triple systems (STS) of order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an exhaustive search, we find that there are 2004720 isomorphism classes of STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
△ Less
Submitted 23 May, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.