Skip to main content

Showing 1–4 of 4 results for author: Elnikety, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.20828  [pdf, other

    cs.AI

    Ascendra: Dynamic Request Prioritization for Efficient LLM Serving

    Authors: Azam Ikram, Xiang Li, Sameh Elnikety, Saurabh Bagchi

    Abstract: The rapid advancement of Large Language Models (LLMs) has driven the need for more efficient serving strategies. In this context, efficiency refers to the proportion of requests that meet their Service Level Objectives (SLOs), particularly for Time To First Token (TTFT) and Time Between Tokens (TBT). However, existing systems often prioritize one metric at the cost of the other. We present Ascendr… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  2. arXiv:2403.03377  [pdf, other

    cs.DC

    Junctiond: Extending FaaS Runtimes with Kernel-Bypass

    Authors: Enrique Saurez, Joshua Fried, Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Sameh Elnikety, Adam Belay, Rodrigo Fonseca

    Abstract: This report explores the use of kernel-bypass networking in FaaS runtimes and demonstrates how using Junction, a novel kernel-bypass system, as the backend for executing components in faasd can enhance performance and isolation. Junction achieves this by reducing network and compute overheads and minimizing interactions with the host operating system. Junctiond, the integration of Junction with fa… ▽ More

    Submitted 7 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  3. arXiv:2401.02920  [pdf, other

    cs.DC cs.AI

    Analytically-Driven Resource Management for Cloud-Native Microservices

    Authors: Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou

    Abstract: Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We pr… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  4. arXiv:1506.05172  [pdf, other

    cs.DC

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Authors: Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, Sameh Elnikety

    Abstract: Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the qualit… ▽ More

    Submitted 16 June, 2015; originally announced June 2015.

    Comments: Technical Report