-
Richardson tableaux and components of Springer fibers equal to Richardson varieties
Authors:
Steven N. Karp,
Martha E. Precup
Abstract:
Motivated by the study of Springer fibers and their totally nonnegative counterparts, we define a new subset of standard tableaux called Richardson tableaux. We characterize Richardson tableaux combinatorially using evacuation as well as in terms of a pair of associated reading words. We also characterize Richardson tableaux geometrically, proving that a tableau is Richardson if and only if the co…
▽ More
Motivated by the study of Springer fibers and their totally nonnegative counterparts, we define a new subset of standard tableaux called Richardson tableaux. We characterize Richardson tableaux combinatorially using evacuation as well as in terms of a pair of associated reading words. We also characterize Richardson tableaux geometrically, proving that a tableau is Richardson if and only if the corresponding component of a Springer fiber is a Richardson variety, which in turn holds if and only if its positive part is a top-dimensional cell of the totally nonnegative Springer fiber studied by Lusztig (2021). We prove that each such component is smooth by leveraging a combinatorial description of the corresponding pair of reading words, generalizing a result of Graham-Zierau (2011). Another application is that the cohomology classes of these components can be computed in the Schubert basis using Schubert calculus. Finally, we show that the enumeration of Richardson tableaux is surprisingly elegant: the number of Richardson tableaux of fixed partition shape is a product of binomial coefficients, and the number of Richardson tableaux of size $n$ is the $n$th Motzkin number. As a result, we obtain a novel refinement for the Motzkin numbers, as well as a formula for the number of top-dimensional cells in the totally nonnegative Springer fiber.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Gemini Robotics: Bringing AI into the Physical World
Authors:
Gemini Robotics Team,
Saminda Abeyruwan,
Joshua Ainslie,
Jean-Baptiste Alayrac,
Montserrat Gonzalez Arenas,
Travis Armstrong,
Ashwin Balakrishna,
Robert Baruch,
Maria Bauza,
Michiel Blokzijl,
Steven Bohez,
Konstantinos Bousmalis,
Anthony Brohan,
Thomas Buschmann,
Arunkumar Byravan,
Serkan Cabi,
Ken Caluwaerts,
Federico Casarini,
Oscar Chang,
Jose Enrique Chen,
Xi Chen,
Hao-Tien Lewis Chiang,
Krzysztof Choromanski,
David D'Ambrosio,
Sudeep Dasari
, et al. (93 additional authors not shown)
Abstract:
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Lang…
▽ More
Recent advancements in large multimodal models have led to the emergence of remarkable generalist capabilities in digital domains, yet their translation to physical agents such as robots remains a significant challenge. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2.0. We present Gemini Robotics, an advanced Vision-Language-Action (VLA) generalist model capable of directly controlling robots. Gemini Robotics executes smooth and reactive movements to tackle a wide range of complex manipulation tasks while also being robust to variations in object types and positions, handling unseen environments as well as following diverse, open vocabulary instructions. We show that with additional fine-tuning, Gemini Robotics can be specialized to new capabilities including solving long-horizon, highly dexterous tasks, learning new short-horizon tasks from as few as 100 demonstrations and adapting to completely novel robot embodiments. This is made possible because Gemini Robotics builds on top of the Gemini Robotics-ER model, the second model we introduce in this work. Gemini Robotics-ER (Embodied Reasoning) extends Gemini's multimodal reasoning capabilities into the physical world, with enhanced spatial and temporal understanding. This enables capabilities relevant to robotics including object detection, pointing, trajectory and grasp prediction, as well as multi-view correspondence and 3D bounding box predictions. We show how this novel combination can support a variety of robotics applications. We also discuss and address important safety considerations related to this new class of robotics foundation models. The Gemini Robotics family marks a substantial step towards developing general-purpose robots that realizes AI's potential in the physical world.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
On the Inductive Bias of Stacking Towards Improving Reasoning
Authors:
Nikunj Saunshi,
Stefani Karp,
Shankar Krishnan,
Sobhan Miryoosefi,
Sashank J. Reddi,
Sanjiv Kumar
Abstract:
Given the increasing scale of model sizes, novel training strategies like gradual stacking [Gong et al., 2019, Reddi et al., 2023] have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in stages and using layers from a smaller model in an earlier stage to initialize the next stage. Although efficient for training, the model biases induced by such gro…
▽ More
Given the increasing scale of model sizes, novel training strategies like gradual stacking [Gong et al., 2019, Reddi et al., 2023] have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in stages and using layers from a smaller model in an earlier stage to initialize the next stage. Although efficient for training, the model biases induced by such growing approaches are largely unexplored. In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits. We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40%. Furthermore we discover an intriguing phenomenon: MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks, especially tasks that require reasoning abilities like reading comprehension and math problems, despite having similar or slightly worse perplexity compared to baseline training. To further analyze this inductive bias, we construct reasoning primitives -- simple synthetic tasks that are building blocks for reasoning -- and find that a model pretrained with stacking is significantly better than standard pretraining on these primitives, with and without fine-tuning. This provides stronger and more robust evidence for this inductive bias towards reasoning. These findings of training efficiency and inductive bias towards reasoning are verified at 1B, 2B and 8B parameter language models. Finally, we conjecture the underlying reason for this inductive bias by exploring the connection of stacking to looped models and provide strong supporting empirical analysis.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Landscape-Aware Growing: The Power of a Little LAG
Authors:
Stefani Karp,
Nikunj Saunshi,
Sobhan Miryoosefi,
Sashank J. Reddi,
Sanjiv Kumar
Abstract:
Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have e…
▽ More
Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have extensively focused on loss- and/or function-preserving behavior at initialization or simply performance at the end of training. Instead, we identify that behavior at initialization can be misleading as a predictor of final performance and present an alternative perspective based on early training dynamics, which we call "landscape-aware growing (LAG)". We perform extensive analysis of correlation of the final performance with performance in the initial steps of training and find early and more accurate predictions of the optimal growing strategy (i.e., with only a small "lag" after initialization). This perspective also motivates an adaptive strategy for gradual stacking.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Positivity and universal Plücker coordinates for spaces of quasi-exponentials
Authors:
Steven N. Karp,
Evgeny Mukhin,
Vitaly Tarasov
Abstract:
A quasi-exponential is an entire function of the form $e^{cu}p(u)$, where $p(u)$ is a polynomial and $c \in \mathbb{C}$. Let $V = \langle e^{h_1u}p_1(u), \dots, e^{h_Nu}p_N(u) \rangle$ be a vector space with a basis of quasi-exponentials. We show that if $h_1, \dots, h_N$ are nonnegative and all of the complex zeros of the Wronskian $\operatorname{Wr}(V)$ are real, then $V$ is totally nonnegative…
▽ More
A quasi-exponential is an entire function of the form $e^{cu}p(u)$, where $p(u)$ is a polynomial and $c \in \mathbb{C}$. Let $V = \langle e^{h_1u}p_1(u), \dots, e^{h_Nu}p_N(u) \rangle$ be a vector space with a basis of quasi-exponentials. We show that if $h_1, \dots, h_N$ are nonnegative and all of the complex zeros of the Wronskian $\operatorname{Wr}(V)$ are real, then $V$ is totally nonnegative in the sense that all of its Grassmann-Plücker coordinates defined by the Taylor expansion about $u=t$ are nonnegative, for any real $t$ greater than all of the zeros of $\operatorname{Wr}(V)$. Our proof proceeds by showing that the higher Gaudin Hamiltonians $T_λ^G(t)$ introduced in [ALTZ14] are universal Plücker coordinates about $u=t$ for the Wronski map on spaces of quasi-exponentials. The result that $V$ is totally nonnegative follows from the fact that $T_λ^G(t)$ is positive semidefinite, which we establish using partial traces. We also show that if $h_1 = \cdots = h_N = 0$ then $T_λ^G(t)$ equals $β^λ(t)$, which is the universal Plücker coordinate for the Wronski map on spaces of polynomials introduced in [KP23].
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs
Authors:
Aakash Lahoti,
Stefani Karp,
Ezra Winston,
Aarti Singh,
Yuanzhi Li
Abstract:
Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural net…
▽ More
Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either they disregard the optimizer and only provide uniform convergence upper bounds with no separating lower bounds, or they consider simplistic tasks that do not truly mirror the locality and translation invariance as found in real-world vision tasks. To address these deficiencies, we introduce the Dynamic Signal Distribution (DSD) classification task that models an image as consisting of $k$ patches, each of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. On this task, for any orthogonally equivariant algorithm like gradient descent, we prove that CNNs require $\tilde{O}(k+d)$ samples, whereas LCNs require $Ω(kd)$ samples, establishing the statistical advantages of weight sharing in translation invariant tasks. Furthermore, LCNs need $\tilde{O}(k(k+d))$ samples, compared to $Ω(k^2d)$ samples for FCNs, showcasing the benefits of locality in local tasks. Additionally, we develop information theoretic tools for analyzing randomized algorithms, which may be of interest for statistical research.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Applying statistical learning theory to deep learning
Authors:
Cédric Gerbelot,
Avetik Karagulyan,
Stefani Karp,
Kavya Ravichandran,
Menachem Stern,
Nathan Srebro
Abstract:
Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep l…
▽ More
Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep learning from a learning theory perspective. After a brief reminder on statistical learning theory and stochastic optimization, we discuss implicit bias in the context of benign overfitting. We then move to a general description of the mirror descent algorithm, showing how we may go back and forth between a parameter space and the corresponding function space for a given learning problem, as well as how the geometry of the learning problem may be represented by a metric tensor. Building on this framework, we provide a detailed study of the implicit bias of gradient descent on linear diagonal networks for various regression tasks, showing how the loss function, scale of parameters at initialization and depth of the network may lead to various forms of implicit bias, in particular transitioning between kernel or feature learning.
△ Less
Submitted 25 March, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
Universal Plücker coordinates for the Wronski map and positivity in real Schubert calculus
Authors:
Steven N. Karp,
Kevin Purbhoo
Abstract:
Given a $d$-dimensional vector space $V \subset \mathbb{C}[u]$ of polynomials, its Wronskian is the polynomial $(u + z_1) \cdots (u + z_n)$ whose zeros $-z_i$ are the points of $\mathbb{C}$ such that $V$ contains a nonzero polynomial with a zero of order at least $d$ at $-z_i$. Equivalently, $V$ is a solution to the Schubert problem defined by osculating planes to the moment curve at…
▽ More
Given a $d$-dimensional vector space $V \subset \mathbb{C}[u]$ of polynomials, its Wronskian is the polynomial $(u + z_1) \cdots (u + z_n)$ whose zeros $-z_i$ are the points of $\mathbb{C}$ such that $V$ contains a nonzero polynomial with a zero of order at least $d$ at $-z_i$. Equivalently, $V$ is a solution to the Schubert problem defined by osculating planes to the moment curve at $z_1, \dots, z_n$. The inverse Wronski problem involves finding all $V$ with a given Wronskian $(u + z_1) \cdots (u + z_n)$. We solve this problem by providing explicit formulas for the Grassmann-Plücker coordinates of the general solution $V$, as commuting operators in the group algebra $\mathbb{C}[\mathfrak{S}_n]$ of the symmetric group. The Plücker coordinates of individual solutions over $\mathbb{C}$ are obtained by restricting to an eigenspace and replacing each operator by its eigenvalue. This generalizes work of Mukhin, Tarasov, and Varchenko (2013) and of Purbhoo (2022), which give formulas in $\mathbb{C}[\mathfrak{S}_n]$ for the differential equation satisfied by $V$. Moreover, if $z_1, \dots, z_n$ are real and nonnegative, then our operators are positive semidefinite, implying that the Plücker coordinates of $V$ are all real and nonnegative. This verifies several outstanding conjectures in real Schubert calculus, including the positivity conjectures of Mukhin and Tarasov (2017) and of Karp (2021), the disconjugacy conjecture of Eremenko (2015), and the divisor form of the secant conjecture of Sottile (2003). The proofs involve the representation theory of $\mathfrak{S}_n$, symmetric functions, and $τ$-functions of the KP hierarchy.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Symmetric Toda, gradient flows, and tridiagonalization
Authors:
Anthony M. Bloch,
Steven N. Karp
Abstract:
The Toda lattice (1967) is a Hamiltonian system given by $n$ points on a line governed by an exponential potential. Flaschka (1974) showed that the Toda lattice is integrable by interpreting it as a flow on the space of symmetric tridiagonal $n\times n$ matrices, while Moser (1975) showed that it is a gradient flow on a projective space. The symmetric Toda flow of Deift, Li, Nanda, and Tomei (1986…
▽ More
The Toda lattice (1967) is a Hamiltonian system given by $n$ points on a line governed by an exponential potential. Flaschka (1974) showed that the Toda lattice is integrable by interpreting it as a flow on the space of symmetric tridiagonal $n\times n$ matrices, while Moser (1975) showed that it is a gradient flow on a projective space. The symmetric Toda flow of Deift, Li, Nanda, and Tomei (1986) generalizes the Toda lattice flow from tridiagonal to all symmetric matrices. They showed the flow is integrable, in the classical sense of having $d$ integrals in involution on its $2d$-dimensional phase space. The system may be viewed as integrable in other ways as well. Firstly, Symes (1980, 1982) solved it explicitly via $QR$-factorization and conjugation. Secondly, Deift, Li, Nanda, and Tomei (1986) 'tridiagonalized' the system into a family of tridiagonal Toda lattices which are solvable and integrable. In this paper we derive their tridiagonalization procedure in a natural way using the fact that the symmetric Toda flow is diffeomorphic to a twisted gradient flow on a flag variety, which may then be decomposed into flows on a product of Grassmannians. These flows may in turn be embedded into projective spaces via Plücker embeddings, and mapped back to tridiagonal Toda lattice flows using Moser's construction. In addition, we study the tridiagonalized flows projected onto a product of permutohedra, using the twisted moment map of Bloch, Flaschka, and Ratiu (1990). These ideas are facilitated in a natural way by the theory of total positivity, building on our previous work (2023).
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Anisotropic Satellite Galaxy Quenching: A Unique Signature of Energetic Feedback by Supermassive Black Holes?
Authors:
Juliana S. M. Karp,
Johannes U. Lange,
Risa H. Wechsler
Abstract:
The quenched fraction of satellite galaxies is aligned with the orientation of the halo's central galaxy, such that on average, satellites form stars at a lower rate along the major axis of the central. This effect, called anisotropic satellite galaxy quenching (ASGQ), has been found in observational data and cosmological simulations. Analyzing the IllustrisTNG simulation, Martín-Navarro et al. (2…
▽ More
The quenched fraction of satellite galaxies is aligned with the orientation of the halo's central galaxy, such that on average, satellites form stars at a lower rate along the major axis of the central. This effect, called anisotropic satellite galaxy quenching (ASGQ), has been found in observational data and cosmological simulations. Analyzing the IllustrisTNG simulation, Martín-Navarro et al. (2021) recently argued that ASGQ is caused by anisotropic energetic feedback and constitutes "compelling observational evidence for the role of black holes in regulating galaxy evolution." In this letter, we study the causes of ASGQ in state-of-the-art galaxy formation simulations to evaluate this claim. We show that cosmological simulations predict that on average, satellite galaxies along the major axis of the dark matter halo tend to have been accreted at earlier cosmic times and are hosted by subhalos of larger peak halo masses. As a result, a modulation of the quenched fraction with respect to the major axis of the central galaxy is a natural prediction of hierarchical structure formation. We show that ASGQ is predicted by the UniverseMachine galaxy formation model, a model without anisotropic feedback. Furthermore, we demonstrate that even in the IllustrisTNG simulation, anisotropic satellite accretion properties are the main cause of ASGQ. Ultimately, we argue that ASGQ is not a reliable indicator of supermassive black hole feedback in galaxy formation simulations and, thus, should not be interpreted as such in observational data.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
q-Whittaker functions, finite fields, and Jordan forms
Authors:
Steven N. Karp,
Hugh Thomas
Abstract:
The $q$-Whittaker function $W_λ(\mathbf{x};q)$ associated to a partition $λ$ is a $q$-analogue of the Schur function $s_λ(\mathbf{x})$, and is defined as the $t=0$ specialization of the Macdonald polynomial $P_λ(\mathbf{x};q,t)$. We show combinatorially how to expand $W_λ(\mathbf{x};q)$ in terms of partial flags compatible with a nilpotent endomorphism over the finite field of size $1/q$. This yie…
▽ More
The $q$-Whittaker function $W_λ(\mathbf{x};q)$ associated to a partition $λ$ is a $q$-analogue of the Schur function $s_λ(\mathbf{x})$, and is defined as the $t=0$ specialization of the Macdonald polynomial $P_λ(\mathbf{x};q,t)$. We show combinatorially how to expand $W_λ(\mathbf{x};q)$ in terms of partial flags compatible with a nilpotent endomorphism over the finite field of size $1/q$. This yields an expression analogous to a well-known formula for the Hall-Littlewood functions. We show that considering pairs of partial flags and taking Jordan forms leads to a probabilistic bijection between nonnegative-integer matrices and pairs of semistandard tableaux of the same shape, proving the Cauchy identity for $q$-Whittaker functions. We call our probabilistic bijection the $q$-Burge correspondence, and prove that in the limit as $q\to 0$, we recover a description of the classical Burge correspondence (also known as column RSK) due to Rosso (2012). A key step in the proof is the enumeration of an arbitrary double coset of $\text{GL}_n$ modulo two parabolic subgroups, which we find to be of independent interest. As an application, we use the $q$-Burge correspondence to count isomorphism classes of certain modules over the preprojective algebra of a type $A$ quiver (i.e. a path), refined according to their socle filtrations. This develops a connection between the combinatorics of symmetric functions and the representation theory of preprojective algebras.
△ Less
Submitted 10 February, 2025; v1 submitted 25 July, 2022;
originally announced July 2022.
-
On two notions of total positivity for partial flag varieties
Authors:
Anthony M. Bloch,
Steven N. Karp
Abstract:
Given integers $1 \le k_1 < \cdots < k_l \le n-1$, let $\text{Fl}_{k_1,\dots,k_l;n}$ denote the type $A$ partial flag variety consisting of all chains of subspaces $(V_{k_1}\subset\cdots\subset V_{k_l})$ inside $\mathbb{R}^n$, where each $V_k$ has dimension $k$. Lusztig (1994, 1998) introduced the totally positive part $\text{Fl}_{k_1,\dots,k_l;n}^{>0}$ as the subset of partial flags which can be…
▽ More
Given integers $1 \le k_1 < \cdots < k_l \le n-1$, let $\text{Fl}_{k_1,\dots,k_l;n}$ denote the type $A$ partial flag variety consisting of all chains of subspaces $(V_{k_1}\subset\cdots\subset V_{k_l})$ inside $\mathbb{R}^n$, where each $V_k$ has dimension $k$. Lusztig (1994, 1998) introduced the totally positive part $\text{Fl}_{k_1,\dots,k_l;n}^{>0}$ as the subset of partial flags which can be represented by a totally positive $n\times n$ matrix, and defined the totally nonnegative part $\text{Fl}_{k_1,\dots,k_l;n}^{\ge 0}$ as the closure of $\text{Fl}_{k_1,\dots,k_l;n}^{>0}$. On the other hand, following Postnikov (2007), we define $\text{Fl}_{k_1,\dots,k_l;n}^{Δ>0}$ and $\text{Fl}_{k_1,\dots,k_l;n}^{Δ\ge 0}$ as the subsets of $\text{Fl}_{k_1,\dots,k_l;n}$ where all Plücker coordinates are positive and nonnegative, respectively. It follows from the definitions that Lusztig's total positivity implies Plücker positivity, and it is natural to ask when these two notions of positivity agree. Rietsch (2009) proved that they agree in the case of the Grassmannian $\text{Fl}_{k;n}$, and Chevalier (2011) showed that the two notions are distinct for $\text{Fl}_{1,3;4}$. We show that in general, the two notions agree if and only if $k_1, \dots, k_l$ are consecutive integers. We give an elementary proof of this result (including for the case of Grassmannians) based on classical results in linear algebra and the theory of total positivity. We also show that the cell decomposition of $\text{Fl}_{k_1,\dots,k_l;n}^{\ge 0}$ coincides with its matroid decomposition if and only if $k_1,\dots,k_l$ are consecutive integers, which was previously only known for complete flag varieties, Grassmannians, and $\text{Fl}_{1,3;4}$. Finally, we determine which notions of positivity are compatible with a natural action of the cyclic group of order $n$ that rotates the index set.
△ Less
Submitted 10 October, 2022; v1 submitted 12 June, 2022;
originally announced June 2022.
-
Agnostic Learnability of Halfspaces via Logistic Loss
Authors:
Ziwei Ji,
Kwangjun Ahn,
Pranjal Awasthi,
Satyen Kale,
Stefani Karp
Abstract:
We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where…
▽ More
We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions on the examples, Diakonikolas et al. (2020) proved an $\tildeΩ(\textrm{OPT})$ lower bound, while Frei et al. (2021) proved an $\tilde{O}(\sqrt{\textrm{OPT}})$ upper bound, where $\textrm{OPT}$ denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $Ω(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches $\tilde{O}(\textrm{OPT})$ misclassification risk. Our techniques also show for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the $Ω(\sqrt{\textrm{OPT}})$ lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain $\tilde{O}(\textrm{OPT})$ misclassification risk. This two-step convex optimization algorithm is simpler than previous methods obtaining this guarantee, all of which require solving $O(\log(1/\textrm{OPT}))$ minimization problems.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
Wronskians, total positivity, and real Schubert calculus
Authors:
Steven N. Karp
Abstract:
A complete flag in $\mathbb{R}^n$ is a sequence of nested subspaces $V_1 \subset \cdots \subset V_{n-1}$ such that each $V_k$ has dimension $k$. It is called totally nonnegative if all its Plücker coordinates are nonnegative. We may view each $V_k$ as a subspace of polynomials in $\mathbb{R}[x]$ of degree at most $n-1$, by associating a vector $(a_1, \dots, a_n)$ in $\mathbb{R}^n$ to the polynomia…
▽ More
A complete flag in $\mathbb{R}^n$ is a sequence of nested subspaces $V_1 \subset \cdots \subset V_{n-1}$ such that each $V_k$ has dimension $k$. It is called totally nonnegative if all its Plücker coordinates are nonnegative. We may view each $V_k$ as a subspace of polynomials in $\mathbb{R}[x]$ of degree at most $n-1$, by associating a vector $(a_1, \dots, a_n)$ in $\mathbb{R}^n$ to the polynomial $a_1 + a_2x + \cdots + a_nx^{n-1}$. We show that a complete flag is totally nonnegative if and only if each of its Wronskian polynomials $\mathsf{Wr}(V_k)$ is nonzero on the interval $(0, \infty)$. In the language of Chebyshev systems, this means that the flag forms a Markov system or $ECT$-system on $(0, \infty)$. This gives a new characterization and membership test for the totally nonnegative flag variety. Similarly, we show that a complete flag is totally positive if and only if each $\mathsf{Wr}(V_k)$ is nonzero on $[0, \infty]$. We use these results to show that a conjecture of Eremenko (2015) in real Schubert calculus is equivalent to the following conjecture: if $V$ is a finite-dimensional subspace of polynomials such that all complex zeros of $\mathsf{Wr}(V)$ lie in the interval $(-\infty, 0)$, then all Plücker coordinates of $V$ are real and positive. This conjecture is a totally positive strengthening of a result of Mukhin, Tarasov, and Varchenko (2009), and can be reformulated as saying that all complex solutions to a certain family of Schubert problems in the Grassmannian are real and totally positive. We also show that our conjecture is equivalent to a totally positive version of the secant conjecture of Sottile (2003).
△ Less
Submitted 1 September, 2023; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Gradient flows, adjoint orbits, and the topology of totally nonnegative flag varieties
Authors:
Anthony M. Bloch,
Steven N. Karp
Abstract:
One can view a partial flag variety in $\mathbb{C}^n$ as an adjoint orbit $\mathcal{O}_λ$ inside the Lie algebra of $n \times n$ skew-Hermitian matrices. We use the orbit context to study the totally nonnegative part of a partial flag variety from an algebraic, geometric, and dynamical perspective. The paper has three main parts:
(1) We introduce the totally nonnegative part of $\mathcal{O}_λ$,…
▽ More
One can view a partial flag variety in $\mathbb{C}^n$ as an adjoint orbit $\mathcal{O}_λ$ inside the Lie algebra of $n \times n$ skew-Hermitian matrices. We use the orbit context to study the totally nonnegative part of a partial flag variety from an algebraic, geometric, and dynamical perspective. The paper has three main parts:
(1) We introduce the totally nonnegative part of $\mathcal{O}_λ$, and describe it explicitly in several cases. We define a twist map on it, which generalizes (in type $A$) a map of Bloch, Flaschka, and Ratiu (1990) on an isospectral manifold of Jacobi matrices.
(2) We study gradient flows on $\mathcal{O}_λ$ which preserve positivity, working in three natural Riemannian metrics. In the Kähler metric, positivity is preserved in many cases of interest, extending results of Galashin, Karp, and Lam (2017, 2019). In the normal metric, positivity is essentially never preserved on a generic orbit. In the induced metric, whether positivity is preserved appears to depends on the spacing of the eigenvalues defining the orbit.
(3) We present two applications. First, we discuss the topology of totally nonnegative flag varieties and amplituhedra. Galashin, Karp, and Lam (2017, 2019) showed that the former are homeomorphic to closed balls, and we interpret their argument in the orbit framework. We also show that a new family of amplituhedra, which we call twisted Vandermonde amplituhedra, are homeomorphic to closed balls. Second, we discuss the symmetric Toda flow on $\mathcal{O}_λ$. We show that it preserves positivity, and that on the totally nonnegative part, it is a gradient flow in the Kähler metric up to applying the twist map. This extends a result of Bloch, Flaschka, and Ratiu (1990).
△ Less
Submitted 22 November, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Shelling the m=1 amplituhedron
Authors:
Steven N. Karp,
John Machacek
Abstract:
The amplituhedron $\mathcal{A}_{n,k,m}$ was introduced by Arkani-Hamed and Trnka (2014) in order to give a geometric basis for calculating scattering amplitudes in planar $\mathcal{N}=4$ supersymmetric Yang-Mills theory. It is a projection inside the Grassmannian $\text{Gr}_{k,k+m}$ of the totally nonnegative part of $\text{Gr}_{k,n}$. Karp and Williams (2019) studied the $m=1$ amplituhedron…
▽ More
The amplituhedron $\mathcal{A}_{n,k,m}$ was introduced by Arkani-Hamed and Trnka (2014) in order to give a geometric basis for calculating scattering amplitudes in planar $\mathcal{N}=4$ supersymmetric Yang-Mills theory. It is a projection inside the Grassmannian $\text{Gr}_{k,k+m}$ of the totally nonnegative part of $\text{Gr}_{k,n}$. Karp and Williams (2019) studied the $m=1$ amplituhedron $\mathcal{A}_{n,k,1}$, giving a regular CW decomposition of it. Its face poset $R_{n,l}$ (with $l := n-k-1$) consists of all projective sign vectors of length $n$ with exactly $l$ sign changes. We show that $R_{n,l}$ is EL-shellable, resolving a problem posed by Karp and Williams. This gives a new proof that $\mathcal{A}_{n,k,1}$ is homeomorphic to a closed ball, which was originally proved by Karp and Williams. We also give explicit formulas for the $f$-vector and $h$-vector of $R_{n,l}$, and show that it is rank-log-concave and strongly Sperner. Finally, we consider a related poset $P_{n,l}$ introduced by Machacek (2019), consisting of all projective sign vectors of length $n$ with at most $l$ sign changes. We show that it is rank-log-concave, and conjecture that it is Sperner.
△ Less
Submitted 10 October, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Regularity theorem for totally nonnegative flag varieties
Authors:
Pavel Galashin,
Steven N. Karp,
Thomas Lam
Abstract:
We show that the totally nonnegative part of a partial flag variety $G/P$ (in the sense of Lusztig) is a regular CW complex, confirming a conjecture of Williams. In particular, the closure of each positroid cell inside the totally nonnegative Grassmannian is homeomorphic to a ball, confirming a conjecture of Postnikov.
We show that the totally nonnegative part of a partial flag variety $G/P$ (in the sense of Lusztig) is a regular CW complex, confirming a conjecture of Williams. In particular, the closure of each positroid cell inside the totally nonnegative Grassmannian is homeomorphic to a ball, confirming a conjecture of Postnikov.
△ Less
Submitted 12 April, 2021; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Moment curves and cyclic symmetry for positive Grassmannians
Authors:
Steven N. Karp
Abstract:
We show that for each k and n, the cyclic shift map on the complex Grassmannian Gr(k,n) has exactly $\binom{n}{k}$ fixed points. There is a unique totally nonnegative fixed point, given by taking n equally spaced points on the trigonometric moment curve (if k is odd) or the symmetric moment curve (if k is even). We introduce a parameter q, and show that the fixed points of a q-deformation of the c…
▽ More
We show that for each k and n, the cyclic shift map on the complex Grassmannian Gr(k,n) has exactly $\binom{n}{k}$ fixed points. There is a unique totally nonnegative fixed point, given by taking n equally spaced points on the trigonometric moment curve (if k is odd) or the symmetric moment curve (if k is even). We introduce a parameter q, and show that the fixed points of a q-deformation of the cyclic shift map are precisely the critical points of the mirror-symmetric superpotential $\mathcal{F}_q$ on Gr(k,n). This follows from results of Rietsch about the quantum cohomology ring of Gr(k,n). We survey many other diverse contexts which feature moment curves and the cyclic shift map.
△ Less
Submitted 10 July, 2019; v1 submitted 15 May, 2018;
originally announced May 2018.
-
The totally nonnegative part of G/P is a ball
Authors:
Pavel Galashin,
Steven N. Karp,
Thomas Lam
Abstract:
We show that the totally nonnegative part of a partial flag variety (in the sense of Lusztig) is homeomorphic to a closed ball.
We show that the totally nonnegative part of a partial flag variety (in the sense of Lusztig) is homeomorphic to a closed ball.
△ Less
Submitted 9 April, 2019; v1 submitted 26 January, 2018;
originally announced January 2018.
-
Decompositions of amplituhedra
Authors:
Steven N. Karp,
Lauren K. Williams,
Yan X Zhang
Abstract:
The (tree) amplituhedron A(n,k,m) is the image in the Grassmannian Gr(k,k+m) of the totally nonnegative part of Gr(k,n), under a (map induced by a) linear map which is totally positive. It was introduced by Arkani-Hamed and Trnka in 2013 in order to give a geometric basis for the computation of scattering amplitudes in N=4 supersymmetric Yang-Mills theory. In the case relevant to physics (m=4), th…
▽ More
The (tree) amplituhedron A(n,k,m) is the image in the Grassmannian Gr(k,k+m) of the totally nonnegative part of Gr(k,n), under a (map induced by a) linear map which is totally positive. It was introduced by Arkani-Hamed and Trnka in 2013 in order to give a geometric basis for the computation of scattering amplitudes in N=4 supersymmetric Yang-Mills theory. In the case relevant to physics (m=4), there is a collection of recursively-defined 4k-dimensional BCFW cells in the totally nonnegative part of Gr(k,n), whose images conjecturally "triangulate" the amplituhedron--that is, their images are disjoint and cover a dense subset of A(n,k,4). In this paper, we approach this problem by first giving an explicit (as opposed to recursive) description of the BCFW cells. We then develop sign-variational tools which we use to prove that when k=2, the images of these cells are disjoint in A(n,k,4). We also conjecture that for arbitrary even m, there is a decomposition of the amplituhedron A(n,k,m) involving precisely M(k, n-k-m, m/2) top-dimensional cells (of dimension km), where M(a,b,c) is the number of plane partitions contained in an a x b x c box. This agrees with the fact that when m=4, the number of BCFW cells is the Narayana number N(n-3, k+1).
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
The totally nonnegative Grassmannian is a ball
Authors:
Pavel Galashin,
Steven N. Karp,
Thomas Lam
Abstract:
We prove that three spaces of importance in topological combinatorics are homeomorphic to closed balls: the totally nonnegative Grassmannian, the compactification of the space of electrical networks, and the cyclically symmetric amplituhedron.
We prove that three spaces of importance in topological combinatorics are homeomorphic to closed balls: the totally nonnegative Grassmannian, the compactification of the space of electrical networks, and the cyclically symmetric amplituhedron.
△ Less
Submitted 7 July, 2021; v1 submitted 6 July, 2017;
originally announced July 2017.
-
The m=1 amplituhedron and cyclic hyperplane arrangements
Authors:
Steven N. Karp,
Lauren K. Williams
Abstract:
The (tree) amplituhedron A(n,k,m) is the image in the Grassmannian Gr(k,k+m) of the totally nonnegative part of Gr(k,n), under a (map induced by a) linear map which is totally positive. It was introduced by Arkani-Hamed and Trnka in 2013 in order to give a geometric basis for the computation of scattering amplitudes in N=4 supersymmetric Yang-Mills theory. When k+m=n, the amplituhedron is isomorph…
▽ More
The (tree) amplituhedron A(n,k,m) is the image in the Grassmannian Gr(k,k+m) of the totally nonnegative part of Gr(k,n), under a (map induced by a) linear map which is totally positive. It was introduced by Arkani-Hamed and Trnka in 2013 in order to give a geometric basis for the computation of scattering amplitudes in N=4 supersymmetric Yang-Mills theory. When k+m=n, the amplituhedron is isomorphic to the totally nonnegative Grassmannian, and when k=1, the amplituhedron is a cyclic polytope. While the case m=4 is most relevant to physics, the amplituhedron is an interesting mathematical object for any m. In this paper we study it in the case m=1. We start by taking an orthogonal point of view and define a related "B-amplituhedron" B(n,k,m), which we show is isomorphic to A(n,k,m). We use this reformulation to describe the amplituhedron in terms of sign variation. We then give a cell decomposition of the amplituhedron A(n,k,1) using the images of a collection of distinguished cells of the totally nonnegative Grassmannian. We also show that A(n,k,1) can be identified with the complex of bounded faces of a cyclic hyperplane arrangement, and describe how its cells fit together. We deduce that A(n,k,1) is homeomorphic to a ball.
△ Less
Submitted 9 April, 2019; v1 submitted 29 August, 2016;
originally announced August 2016.
-
Sign variation, the Grassmannian, and total positivity
Authors:
Steven N. Karp
Abstract:
The totally nonnegative Grassmannian is the set of k-dimensional subspaces V of R^n whose nonzero Pluecker coordinates all have the same sign. Gantmakher and Krein (1950) and Schoenberg and Whitney (1951) independently showed that V is totally nonnegative iff every vector in V, when viewed as a sequence of n numbers and ignoring any zeros, changes sign at most k-1 times. We generalize this result…
▽ More
The totally nonnegative Grassmannian is the set of k-dimensional subspaces V of R^n whose nonzero Pluecker coordinates all have the same sign. Gantmakher and Krein (1950) and Schoenberg and Whitney (1951) independently showed that V is totally nonnegative iff every vector in V, when viewed as a sequence of n numbers and ignoring any zeros, changes sign at most k-1 times. We generalize this result from the totally nonnegative Grassmannian to the entire Grassmannian, showing that if V is generic (i.e. has no zero Pluecker coordinates), then the vectors in V change sign at most m times iff certain sequences of Pluecker coordinates of V change sign at most m-k+1 times. We also give an algorithm which, given a non-generic V whose vectors change sign at most m times, perturbs V into a generic subspace whose vectors also change sign at most m times. We deduce that among all V whose vectors change sign at most m times, the generic subspaces are dense. These results generalize to oriented matroids. As an application of our results, we characterize when a generalized amplituhedron construction, in the sense of Arkani-Hamed and Trnka (2013), is well defined. We also give two ways of obtaining the positroid cell of each V in the totally nonnegative Grassmannian from the sign patterns of vectors in V.
△ Less
Submitted 10 August, 2016; v1 submitted 18 March, 2015;
originally announced March 2015.