-
Evaluation of a Foundational Model and Stochastic Models for Forecasting Sporadic or Spiky Production Outages of High-Performance Machine Learning Services
Authors:
Keun Soo Yim
Abstract:
Time series forecasting models have diverse real world applications (e.g., from electricity metrics to software workload). Latest foundational models trained for time series forecasting show strengths (e.g., for long sequences and in zero-shot settings). However, foundational model was not yet used for forecasting rare, spiky events, i.e., a challenging target because those are a corner case of ex…
▽ More
Time series forecasting models have diverse real world applications (e.g., from electricity metrics to software workload). Latest foundational models trained for time series forecasting show strengths (e.g., for long sequences and in zero-shot settings). However, foundational model was not yet used for forecasting rare, spiky events, i.e., a challenging target because those are a corner case of extreme events. In this paper, we optimize a state-of-the-art foundational model to forecast sporadic or spiky production outages of high-performance machine learning services powering billions of client devices. We evaluate the forecasting errors of the foundational model compared with classical stochastic forecasting models (e.g., moving average and autoregressive). The analysis helps us understand how each of the evaluated models performs for the sporadic or spiky events. For example, it identifies the key patterns in the target data that are well tracked by the foundational model vs. each of the stochastic models. We use the models with optimal parameters to estimate a year-long outage statistics of a particular root cause with less than 6% value errors.
△ Less
Submitted 30 June, 2025;
originally announced July 2025.
-
TREBLE: Fast Software Updates by Creating an Equilibrium in an Active Software Ecosystem of Globally Distributed Stakeholders
Authors:
Keun Soo Yim,
Iliyan Malchev,
Andrew Hsieh,
Dave Burke
Abstract:
This paper presents our experience with TREBLE, a two-year initiative to build the modular base in Android, a Java-based mobile platform running on the Linux kernel. Our TREBLE architecture splits the hardware independent core framework written in Java from the hardware dependent vendor implementations (e.g., user space device drivers, vendor native libraries, and kernel written in C/C++). Cross-l…
▽ More
This paper presents our experience with TREBLE, a two-year initiative to build the modular base in Android, a Java-based mobile platform running on the Linux kernel. Our TREBLE architecture splits the hardware independent core framework written in Java from the hardware dependent vendor implementations (e.g., user space device drivers, vendor native libraries, and kernel written in C/C++). Cross-layer communications between them are done via versioned, stable inter-process communication interfaces whose backward compatibility is tested by using two API compliance suites. Based on this architecture, we repackage the key Android software components that suffered from crucial post-launch security bugs as separate images. That not only enables separate ownerships but also independent updates of each image by interested ecosystem entities. We discuss our experience of delivering TREBLE architectural changes to silicon vendors and device makers using a yearly release model. Our experiments and industry rollouts support our hypothesis that giving more freedom to all ecosystem entities and creating an equilibrium are a transformation necessary to further scale the world largest open ecosystem with over two billion active devices.
△ Less
Submitted 19 September, 2024;
originally announced October 2024.
-
The Task-oriented Queries Benchmark (ToQB)
Authors:
Keun Soo Yim
Abstract:
Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on ta…
▽ More
Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public. We further discuss new domains that can be added to ToQB by community contributors and its practical applications.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Predicting Likely-Vulnerable Code Changes: Machine Learning-based Vulnerability Protections for Android Open Source Project
Authors:
Keun Soo Yim
Abstract:
This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a c…
▽ More
This paper presents a framework that selectively triggers security reviews for incoming source code changes. Functioning as a review bot within a code review service, the framework can automatically request additional security reviews at pre-submit time before the code changes are submitted to a source code repository. Because performing such secure code reviews add cost, the framework employs a classifier trained to identify code changes with a high likelihood of vulnerabilities. The online classifier leverages various types of input features to analyze the review patterns, track the software engineering process, and mine specific text patterns within given code changes. The classifier and its features are meticulously chosen and optimized using data from the submitted code changes and reported vulnerabilities in Android Open Source Project (AOSP). The evaluation results demonstrate that our Vulnerability Prevention (VP) framework identifies approximately 80% of the vulnerability-inducing code changes in the dataset with a precision ratio of around 98% and a false positive rate of around 1.7%. We discuss the implications of deploying the VP framework in multi-project settings and future directions for Android security research. This paper explores and validates our approach to code change-granularity vulnerability prediction, offering a preventive technique for software security by preemptively detecting vulnerable code changes before submission.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.