Skip to main content

Showing 1–3 of 3 results for author: Zaharia, M

Searching in archive math. Search in all archives.
.
  1. arXiv:2107.12525  [pdf, ps, other

    math.ST cs.DB cs.LG stat.ML

    Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

    Abstract: Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where… ▽ More

    Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  2. arXiv:2104.00282  [pdf, other

    math.OC cs.DC

    Allocation of Fungible Resources via a Fast, Scalable Price Discovery Method

    Authors: Akshay Agrawal, Stephen Boyd, Deepak Narayanan, Fiodar Kazhamiaka, Matei Zaharia

    Abstract: We consider the problem of assigning or allocating resources to a set of jobs. We consider the case when the resources are fungible, that is, the job can be done with any mix of the resources, but with different efficiencies. In our formulation we maximize a total utility subject to a given limit on the resource usage, which is a convex optimization problem and so is tractable. In this paper we de… ▽ More

    Submitted 18 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  3. arXiv:2102.09127  [pdf, other

    cs.LG cs.AI cs.DS math.OC

    Efficient Online ML API Selection for Multi-Label Classification Tasks

    Authors: Lingjiao Chen, Matei Zaharia, James Zou

    Abstract: Multi-label classification tasks such as OCR and multi-object recognition are a major focus of the growing machine learning as a service industry. While many multi-label prediction APIs are available, it is challenging for users to decide which API to use for their own data and budget, due to the heterogeneity in those APIs' price and performance. Recent work shows how to select from single-label… ▽ More

    Submitted 16 July, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: Accepted to ICML 2022