Skip to main content

Showing 1–8 of 8 results for author: Omlor, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00339  [pdf, other

    cs.DS cs.LG stat.ML

    Turnstile $\ell_p$ leverage score sampling with applications

    Authors: Alexander Munteanu, Simon Omlor

    Abstract: The turnstile data stream model offers the most flexible framework where data can be manipulated dynamically, i.e., rows, columns, and even single entries of an input matrix can be added, deleted, or updated multiple times in a data stream. We develop a novel algorithm for sampling rows $a_i$ of a matrix $A\in\mathbb{R}^{n\times d}$, proportional to their $\ell_p$ norm, when $A$ is presented in a… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  2. arXiv:2406.00328  [pdf, other

    cs.DS cs.LG stat.ML

    Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

    Authors: Alexander Munteanu, Simon Omlor

    Abstract: Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension $d$ times the total sensi… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  3. arXiv:2304.00051  [pdf, other

    cs.DS cs.LG stat.ML

    Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression

    Authors: Alexander Munteanu, Simon Omlor, David Woodruff

    Abstract: We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant $c>0$ a sketching dimension of $\tilde{O}(d^{1+c})$ for $\ell_1$ regression and $\tilde{O}(μd^{1+c})$ for lo… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Comments: ICLR 2023

  4. arXiv:2206.12802  [pdf, other

    cs.LG cs.DS stat.ML

    Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis

    Authors: Alexander Munteanu, Simon Omlor, Zhao Song, David P. Woodruff

    Abstract: A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  5. arXiv:2203.13568  [pdf, other

    cs.DS cs.LG stat.ML

    $p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets

    Authors: Alexander Munteanu, Simon Omlor, Christian Peters

    Abstract: We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a $p$-generalized normal distribution for $p\in[1, \infty)$. The $p$-generalized normal distributions \citep{Sub23} are of special interest in statistical modeling because they fit much more… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: AISTATS 2022

  6. arXiv:2107.06615  [pdf, other

    cs.DS cs.LG stat.ML

    Oblivious sketching for logistic regression

    Authors: Alexander Munteanu, Simon Omlor, David Woodruff

    Abstract: What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression. Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(μd\log n)$ weighted points, where $μ$ is a use… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

    Comments: ICML 2021

  7. arXiv:1911.12350  [pdf, other

    cs.DS

    Single Machine Batch Scheduling to Minimize the Weighted Number of Tardy Jobs

    Authors: Danny Hermelin, Matthias Mnich, Simon Omlor

    Abstract: The $1|B,r_j|\sum w_jU_j$ scheduling problem takes as input a batch setup time $Δ$ and a set of $n$ jobs, each having a processing time, a release date, a weight, and a due date; the task is to find a sequence of batches that minimizes the weighted number of tardy jobs. This problem was introduced by Hochbaum and Landy in 1994; as a wide generalization of {\sc Knapsack}, it is $\mathsf{NP}$-hard.… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  8. arXiv:1911.12138  [pdf, other

    cs.DS cs.DM

    Scheduling with Non-Renewable Resources: Minimizing the Sum of Completion Times

    Authors: Kristóf Bérczi, Tamás Király, Simon Omlor

    Abstract: The paper considers single-machine scheduling problems with a non-renewable resource. In this setting, we are given a set jobs, each of which is characterized by a processing time, a weight, and the job also has some resource requirement. At fixed points in time, a certain amount of the resource is made available to be consumed by the jobs. The goal is to assign the jobs non-preemptively to time s… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 19 pages, 2 figures