Skip to main content

Showing 1–5 of 5 results for author: Heinsen, F A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.05843  [pdf, ps, other

    cs.LG cs.CL

    Softmax Attention with Constant Cost per Token

    Authors: Franz A. Heinsen

    Abstract: We propose a simple modification to the conventional attention mechanism applied by Transformers: Instead of quantifying pairwise query-key similarity with scaled dot-products, we quantify it with the logarithms of scaled dot-products of exponentials. Our modification linearizes attention with exponential kernel feature maps, whose corresponding feature function is infinite dimensional. We show th… ▽ More

    Submitted 27 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Source code and instructions for replicating our results are online at https://github.com/glassroom/heinsen_attention

  2. arXiv:2311.06281  [pdf, other

    cs.DS cs.LG

    Efficient Parallelization of a Ubiquitous Sequential Computation

    Authors: Franz A. Heinsen

    Abstract: We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in sc… ▽ More

    Submitted 27 December, 2023; v1 submitted 27 October, 2023; originally announced November 2023.

    Comments: Source code for replicating our results is available online at https://github.com/glassroom/heinsen_sequence

  3. arXiv:2211.11754  [pdf, other

    cs.LG cs.AI

    An Algorithm for Routing Vectors in Sequences

    Authors: Franz A. Heinsen

    Abstract: We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size. Each output vector maximizes "bang per bit," the difference between a net benefit to use and net cost to ignore data, by better predicting the input vectors. We describe output vectors as geometric objects, as latent variables that assign credit, as query states in a m… ▽ More

    Submitted 21 December, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Source code and instructions for replicating our results are online at https://github.com/glassroom/heinsen_routing

  4. arXiv:2209.10288  [pdf, ps, other

    cs.LG cs.AI

    Tree Methods for Hierarchical Classification in Parallel

    Authors: Franz A. Heinsen

    Abstract: We propose methods that enable efficient hierarchical classification in parallel. Our methods transform a batch of classification scores and labels, corresponding to given nodes in a semantic tree, to scores and labels corresponding to all nodes in the ancestral paths going down the tree to every given node, relying only on tensor operations that execute efficiently on hardware accelerators. We im… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Source code and instructions for replicating our results are online at https://github.com/glassroom/heinsen_routing

  5. arXiv:1911.00792  [pdf, other

    cs.LG cs.AI cs.CV

    An Algorithm for Routing Capsules in All Domains

    Authors: Franz A. Heinsen

    Abstract: Building on recent work on capsule networks, we propose a new, general-purpose form of "routing by agreement" that activates output capsules in a layer as a function of their net benefit to use and net cost to ignore input capsules from earlier layers. To illustrate the usefulness of our routing algorithm, we present two capsule networks that apply it in different domains: vision and language. The… ▽ More

    Submitted 28 February, 2020; v1 submitted 2 November, 2019; originally announced November 2019.