-
Convergence of Shallow ReLU Networks on Weakly Interacting Data
Authors:
Léo Dana,
Francis Bach,
Loucas Pillaud-Vivien
Abstract:
We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz vi…
▽ More
We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Memorization in Attention-only Transformers
Authors:
Léo Dana,
Muni Sreenivas Pydi,
Yann Chevaleyre
Abstract:
Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, whi…
▽ More
Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, while also introducing the concept of approximate memorization of distributions. Through experimental validation, we demonstrate that our proposed bounds more accurately reflect the true memorization capacity of language models, and provide a precise comparison with prior work.
△ Less
Submitted 10 March, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
More than technical support: the professional contexts of physics instructional labs
Authors:
LM Dana,
Benjamin Pollard,
Sara Mueller
Abstract:
Most, if not all, physics undergraduate degree programs include instructional lab experiences. Physics lab instructors, both faculty and staff, are instrumental to student learning in instructional physics labs. However, the faculty-staff dichotomy belies the complex, varied, and multifaceted landscape of positions that lab instructors hold in the fabrics of physics departments. Here we present th…
▽ More
Most, if not all, physics undergraduate degree programs include instructional lab experiences. Physics lab instructors, both faculty and staff, are instrumental to student learning in instructional physics labs. However, the faculty-staff dichotomy belies the complex, varied, and multifaceted landscape of positions that lab instructors hold in the fabrics of physics departments. Here we present the results of a mixed methods study of the people who teach instructional labs and their professional contexts. Recruiting physics lab instructors across the US, we collected 84 survey responses and conducted 12 in-depth interviews about their job characteristics, professional identities, resources, and experiences. Our investigation reveals that lab instructors vary in terms of their official titles, job descriptions, formal duties, personal agency, and access to resources. We also identified common themes around the value of instructional labs, mismatched job descriptions, and a broad set of necessary skills and expertise. Our results suggest that instructors often occupy overlapping roles that fall in between more canonical jobs in physics departments. By understanding the professional contexts of physics lab instructors, the rest of the physics community can better promote and engage with their critical work, improving laboratory learning both for students and for the lab instructors who teach and support them.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.