Skip to main content

Showing 1–3 of 3 results for author: Luan, F S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.12407  [pdf, other

    cs.DC cs.LG

    The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

    Authors: Frank Sifei Luan, Ziming Mao, Ron Yifeng Wang, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang

    Abstract: While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assume homogeneous resource requirements. They excel at CPU-based computation but either under-utilize heterogeneous resources or impose high overheads on failure and reconfiguration. We introduce the str… ▽ More

    Submitted 16 February, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

  2. arXiv:2301.03734  [pdf, other

    cs.DC cs.OS

    Exoshuffle-CloudSort

    Authors: Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, SangBin Cho, Eric Liang, Ion Stoica

    Abstract: We present Exoshuffle-CloudSort, a sorting application running on top of Ray using the Exoshuffle architecture. Exoshuffle-CloudSort runs on Amazon EC2, with input and output data stored on Amazon S3. Using 40 i4i.4xlarge workers, Exoshuffle-CloudSort completes the 100 TB CloudSort Benchmark (Indy category) in 5378 seconds, with an average total cost of $97.

    Submitted 9 January, 2023; originally announced January 2023.

  3. arXiv:2203.05072  [pdf, other

    cs.DC

    Exoshuffle: An Extensible Shuffle Architecture

    Authors: Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, SangBin Cho, Eric Liang, Ion Stoica

    Abstract: Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML train… ▽ More

    Submitted 17 August, 2023; v1 submitted 9 March, 2022; originally announced March 2022.