Search | arXiv e-print repository

Convergence of Shallow ReLU Networks on Weakly Interacting Data

Authors: Léo Dana, Francis Bach, Loucas Pillaud-Vivien

Abstract: We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz vi… ▽ More We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2411.10115 [pdf, other]

Memorization in Attention-only Transformers

Authors: Léo Dana, Muni Sreenivas Pydi, Yann Chevaleyre

Abstract: Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, whi… ▽ More Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, while also introducing the concept of approximate memorization of distributions. Through experimental validation, we demonstrate that our proposed bounds more accurately reflect the true memorization capacity of language models, and provide a precise comparison with prior work. △ Less

Submitted 10 March, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

Comments: 16 pages, 6 figures, submitted to AISTATS 2025,

arXiv:2305.01023 [pdf, other]

More than technical support: the professional contexts of physics instructional labs

Authors: LM Dana, Benjamin Pollard, Sara Mueller

Abstract: Most, if not all, physics undergraduate degree programs include instructional lab experiences. Physics lab instructors, both faculty and staff, are instrumental to student learning in instructional physics labs. However, the faculty-staff dichotomy belies the complex, varied, and multifaceted landscape of positions that lab instructors hold in the fabrics of physics departments. Here we present th… ▽ More Most, if not all, physics undergraduate degree programs include instructional lab experiences. Physics lab instructors, both faculty and staff, are instrumental to student learning in instructional physics labs. However, the faculty-staff dichotomy belies the complex, varied, and multifaceted landscape of positions that lab instructors hold in the fabrics of physics departments. Here we present the results of a mixed methods study of the people who teach instructional labs and their professional contexts. Recruiting physics lab instructors across the US, we collected 84 survey responses and conducted 12 in-depth interviews about their job characteristics, professional identities, resources, and experiences. Our investigation reveals that lab instructors vary in terms of their official titles, job descriptions, formal duties, personal agency, and access to resources. We also identified common themes around the value of instructional labs, mismatched job descriptions, and a broad set of necessary skills and expertise. Our results suggest that instructors often occupy overlapping roles that fall in between more canonical jobs in physics departments. By understanding the professional contexts of physics lab instructors, the rest of the physics community can better promote and engage with their critical work, improving laboratory learning both for students and for the lab instructors who teach and support them. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Showing 1–3 of 3 results for author: Dana, L