-
Clustering in Causal Attention Masking
Abstract: This work presents a modification of the self-attention dynamics proposed by Geshkovski et al. (arXiv:2312.10794) to better reflect the practically relevant, causally masked attention used in transformer architectures for generative AI. This modification translates into an interacting particle system that cannot be interpreted as a mean-field gradient flow. Despite this loss of structure, we signi… ▽ More
Submitted 10 November, 2024; v1 submitted 7 November, 2024; originally announced November 2024.
Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024), 22 pages, 6 figures
MSC Class: 68T07; 35Q68; 37N99; 82C22
-
arXiv:2110.01046 [pdf, ps, other]
A limit theorem for the last exit time over a moving nonlinear boundary for a Gaussian process
Abstract: We prove a limit theorem on the convergence of the distributions of the scaled last exit time over a slowly moving nonlinear boundary for a class of Gaussian stationary processes. The limit is a double exponential (Gumbel) distribution.
Submitted 31 May, 2022; v1 submitted 3 October, 2021; originally announced October 2021.
Comments: 20 pages. Revised structure and fixed typos, results unchanged. arXiv admin note: substantial text overlap with arXiv:2012.03222
-
arXiv:2012.03222 [pdf, ps, other]
On the distribution of the last exit time over a slowly growing linear boundary for a Gaussian process
Abstract: For a class of Gaussian stationary processes, we prove a limit theorem on the convergence of the distributions of the scaled last exit time over a slowly growing linear boundary. The limit is a double exponential (Gumbel) distribution.
Submitted 6 December, 2020; originally announced December 2020.
Comments: 13 pages