-
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents
Authors:
Yueying Li,
Jim Dai,
Tianyi Peng
Abstract:
As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have focused on system-level engineering, little is explored from a mathematical modeling and queuing perspective.
In this paper, we aim to develop the queuing fundamentals for large language model (LLM) inference, bridging the gap bet…
▽ More
As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have focused on system-level engineering, little is explored from a mathematical modeling and queuing perspective.
In this paper, we aim to develop the queuing fundamentals for large language model (LLM) inference, bridging the gap between the queueing theory and LLM system communities. In particular, we study the throughput aspect in LLM inference systems. We prove that a large class of 'work-conserving' scheduling algorithms can achieve maximum throughput for individual inference LLM engine, highlighting 'work-conserving' as a key design principle in practice. In a network of LLM agents, work-conserving scheduling alone is insufficient, particularly when facing specific workload structures and multi-class workflows that require more sophisticated scheduling strategies. Evaluations of real-world systems show that Orca and Sarathi-serve are throughput-optimal, reassuring practitioners, while FasterTransformer and vanilla vLLM are not maximally stable and should be used with caution. Our results highlight the substantial benefits that the queueing community can offer in improving LLM inference systems and call for more interdisciplinary development.
△ Less
Submitted 24 April, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
The extremal problem for weighted combined energy and $ρ-$Nitsche type inequality
Authors:
Ting Peng,
Chaochuan Wang,
Xiaogao Feng
Abstract:
Let $A_1$ and $A_2$ be two circular annuli and let $ρ$ be a radial metric defined in the annuli $A_2$. We study the existence and uniqueness of the extremal problem for weighted combined energy between $A_1$ and $A_2$, and obtain that the extremal mapping is a certain radial mapping. In fact, this extremal mapping generalizes the $ρ-$harmonic mapping and satisfies equation (2.7) obtained by mean o…
▽ More
Let $A_1$ and $A_2$ be two circular annuli and let $ρ$ be a radial metric defined in the annuli $A_2$. We study the existence and uniqueness of the extremal problem for weighted combined energy between $A_1$ and $A_2$, and obtain that the extremal mapping is a certain radial mapping. In fact, this extremal mapping generalizes the $ρ-$harmonic mapping and satisfies equation (2.7) obtained by mean of variation for weighted combined energy. Meanwhile, we get a $ρ-$Nitsche type inequality. This extends the results of Kalaj (J. Differential Equations, 268(2020)) and YTF (Arch. Math., 122(2024)), where they considered the case $ρ=1$ and $ρ=\frac{1}{|h|^{2}}$, respectively.
Moreover, in the course of proving the extremal problem for weighted combined energy we also investigate the extremal problem for the weighted combined distortion (see Theorem 4.1). This extends the result obtained by Kalaj (J. London Math. Soc., 93(2016)).
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
The extremal problem for weighted combined energy and the generalization of Nitsche inequality
Authors:
Xiaogao Feng,
Ruyue Tang,
Ting Peng
Abstract:
We consider the existence and uniqueness of a minimizer of the extremal problem for weighted combined energy between two concentric annuli and obtain that the extremal mapping is a certain radial mapping. Meanwhile, this in turn implies a Nitsche type phenomenon and we get a $\frac{1}{|w|^λ}-$Nitsche type inequality ($λ\neq1$). As an application, on the basis of the relationship between weighted c…
▽ More
We consider the existence and uniqueness of a minimizer of the extremal problem for weighted combined energy between two concentric annuli and obtain that the extremal mapping is a certain radial mapping. Meanwhile, this in turn implies a Nitsche type phenomenon and we get a $\frac{1}{|w|^λ}-$Nitsche type inequality ($λ\neq1$). As an application, on the basis of the relationship between weighted combined energy and weighted combined distortion, we also investigate the extremal problem for weighted combined distortion on annuli. This extends the result obtained by Kalaj in \cite{Ka1}.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Fair coins tend to land on the same side they started: Evidence from 350,757 flips
Authors:
František Bartoš,
Alexandra Sarafoglou,
Henrik R. Godmann,
Amir Sahrani,
David Klein Leunk,
Pierre Y. Gui,
David Voss,
Kaleem Ullah,
Malte J. Zoubek,
Franziska Nippold,
Frederik Aust,
Felipe F. Vieira,
Chris-Gabriel Islam,
Anton J. Zoubek,
Sara Shabani,
Jonas Petter,
Ingeborg B. Roos,
Adam Finnemann,
Aaron B. Lob,
Madlen F. Hoffstadt,
Jason Nak,
Jill de Ron,
Koen Derks,
Karoline Huth,
Sjoerd Terpstra
, et al. (25 additional authors not shown)
Abstract:
Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. We collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started…
▽ More
Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. We collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51\%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95\% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95\% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started.
△ Less
Submitted 17 April, 2025; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Pupil-driven quantitative differential phase contrast imaging
Authors:
Shuhe Zhang,
Hao Wu,
Tao Peng,
Zeyu Ke,
Meng Shao,
Tos T. J. M. Berendschot,
Jinhua Zhou
Abstract:
In this research, we reveal the inborn but hitherto ignored properties of quantitative differential phase contrast (qDPC) imaging: the phase transfer function being an edge detection filter. Inspired by this, we highlighted the duality of qDPC between optics and pattern recognition, and propose a simple and effective qDPC reconstruction algorithm, termed Pupil-Driven qDPC (pd-qDPC), to facilitate…
▽ More
In this research, we reveal the inborn but hitherto ignored properties of quantitative differential phase contrast (qDPC) imaging: the phase transfer function being an edge detection filter. Inspired by this, we highlighted the duality of qDPC between optics and pattern recognition, and propose a simple and effective qDPC reconstruction algorithm, termed Pupil-Driven qDPC (pd-qDPC), to facilitate the phase reconstruction quality for the family of qDPC-based phase reconstruction algorithms. We formed a new cost function in which modified L0-norm was used to represent the pupil-driven edge sparsity, and the qDPC convolution operator is duplicated in the data fidelity term to achieve automatic background removal. Further, we developed the iterative reweighted soft-threshold algorithms based on split Bregman method to solve this modified L0-norm problem. We tested pd-qDPC on both simulated and experimental data and compare against state-of-the-art (SOTA) methods including L2-norm, total variation regularization (TV-qDPC), isotropic-qDPC, and Retinex qDPC algorithms. Results show that our proposed model is superior in terms of phase reconstruction quality and implementation efficiency, in which it significantly increases the experimental robustness while maintaining the data fidelity. In general, the pd-qDPC enables the high-quality qDPC reconstruction without any modification of the optical system. It simplifies the system complexity and benefits the qDPC community and beyond including but not limited to cell segmentation and PTF learning based on the edge filtering property.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
High-fidelity quantitative differential phase contrast deconvolution using dark-field sparse prior
Authors:
Shuhe Zhang,
Tao Peng,
Zeyu Ke,
Meng Shao,
Tos T. J. M. Berendschot,
Jinhua Zhou
Abstract:
Differential phase contrast (DPC) imaging plays an important role in the family of quantitative phase measurement. However, the reconstruction algorithm for quantitative DPC (qDPC) imaging is not yet optimized, as it does not incorporate the inborn properties of qDPC imaging. In this research, we propose a simple but effective image prior, the dark-field sparse prior (DSP), to facilitate the phase…
▽ More
Differential phase contrast (DPC) imaging plays an important role in the family of quantitative phase measurement. However, the reconstruction algorithm for quantitative DPC (qDPC) imaging is not yet optimized, as it does not incorporate the inborn properties of qDPC imaging. In this research, we propose a simple but effective image prior, the dark-field sparse prior (DSP), to facilitate the phase reconstruction quality for all DPC-based phase reconstruction algorithms. The DSP is based on the key observation that most pixel values for an idea differential phase contrast image are zeros since the subtraction of two images under anti-symmetric illumination cancels all background components. With this DSP prior, we formed a new cost function in which L0-norm was used to represent the DSP. Further, we developed two different algorithms based on (1) the Half Quadratic Splitting, and (2) the Richardson-Lucy deconvolution to solve this NP-hard L0-norm problem. We tested our new model on both simulated and experimental data and compare against state-of-the-art methods including L2-norm and total variation regularizations. Results show that our proposed model is superior in terms of phase reconstruction quality and implementation efficiency, in which it significantly increases the experimental robustness, while maintaining the data fidelity.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Abelian Automorphism Groups of Quartic Surfaces and Cubic Fourfolds
Authors:
Tianzhen Peng,
Zhiwei Zheng
Abstract:
In this paper, we develop a new method to classify abelian automorphism groups of hypersurfaces. We use this method to classify (Theorem 4.2) abelian groups that admit a liftable action on a smooth cubic fourfold. A parallel result (Theorem 5.1) is obtained for quartic surfaces.
In this paper, we develop a new method to classify abelian automorphism groups of hypersurfaces. We use this method to classify (Theorem 4.2) abelian groups that admit a liftable action on a smooth cubic fourfold. A parallel result (Theorem 5.1) is obtained for quartic surfaces.
△ Less
Submitted 6 September, 2021; v1 submitted 27 June, 2021;
originally announced June 2021.