-
Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification
Authors:
Kapilan Balagopalan,
Tuan Ngo Nguyen,
Yao Zhao,
Kwang-Sung Jun
Abstract:
The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes. In the fixed confidence setting, an algorithm must stop data-dependently and return the estimated best arm with a correctness guarantee. Since this st…
▽ More
The best arm identification problem requires identifying the best alternative (i.e., arm) in active experimentation using the smallest number of experiments (i.e., arm pulls), which is crucial for cost-efficient and timely decision-making processes. In the fixed confidence setting, an algorithm must stop data-dependently and return the estimated best arm with a correctness guarantee. Since this stopping time is random, we desire its distribution to have light tails. Unfortunately, many existing studies focus on high probability or in expectation bounds on the stopping time, which allow heavy tails and, for high probability bounds, even not stopping at all. We first prove that this never-stopping event can indeed happen for some popular algorithms. Motivated by this, we propose algorithms that provably enjoy an exponential-tailed stopping time, which improves upon the polynomial tail bound reported by Kalyanakrishnan et al. (2012). The first algorithm is based on a fixed budget algorithm called Sequential Halving along with a doubling trick. The second algorithm is a meta algorithm that takes in any fixed confidence algorithm with a high probability stopping guarantee and turns it into one that enjoys an exponential-tailed stopping time. Our results imply that there is much more to be desired for contemporary fixed confidence algorithms.
△ Less
Submitted 14 June, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search
Authors:
Tuan Ngo Nguyen,
Jay Barrett,
Kwang-Sung Jun
Abstract:
We study the problem of estimating the \emph{value} of the largest mean among K distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their bi…
▽ More
We study the problem of estimating the \emph{value} of the largest mean among K distributions via samples from them (rather than estimating \emph{which} distribution has the largest mean), which arises from various machine learning tasks including Q-learning and Monte Carlo Tree Search (MCTS). While there have been a few proposed algorithms, their performance analyses have been limited to their biases rather than a precise error metric. In this paper, we propose a novel algorithm called HAVER (Head AVERaging) and analyze its mean squared error. Our analysis reveals that HAVER has a compelling performance in two respects. First, HAVER estimates the maximum mean as well as the oracle who knows the identity of the best distribution and reports its sample mean. Second, perhaps surprisingly, HAVER exhibits even better rates than this oracle when there are many distributions near the best one. Both of these improvements are the first of their kind in the literature, and we also prove that the naive algorithm that reports the largest empirical mean does not achieve these bounds. Finally, we confirm our theoretical findings via numerical experiments where we implement HAVER in bandit, Q-learning, and MCTS algorithms. In these experiments, HAVER consistently outperforms the baseline methods, demonstrating its effectiveness across different applications.
△ Less
Submitted 28 April, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
A Meta-Analysis of Solar Forecasting Based on Skill Score
Authors:
Thi Ngoc Nguyen,
Felix Müsgens
Abstract:
We conduct the first comprehensive meta-analysis of deterministic solar forecasting based on skill score, screening 1,447 papers from Google Scholar and reviewing the full texts of 320 papers for data extraction. A database of 4,687 points was built and analyzed with multivariate adaptive regression spline modelling, partial dependence plots, and linear regression. The marginal impacts on skill sc…
▽ More
We conduct the first comprehensive meta-analysis of deterministic solar forecasting based on skill score, screening 1,447 papers from Google Scholar and reviewing the full texts of 320 papers for data extraction. A database of 4,687 points was built and analyzed with multivariate adaptive regression spline modelling, partial dependence plots, and linear regression. The marginal impacts on skill score of ten factors were quantified. The analysis shows the non-linearity and complex interaction between variables in the database. Forecast horizon has a central impact and dominates other factors' impacts. Therefore, the analysis of solar forecasts should be done separately for each horizon. Climate zone variables have statistically significant correlation with skill score. Regarding inputs, historical data and spatial temporal information are highly helpful. For intra-day, sky and satellite images show the most importance. For day-ahead, numerical weather predictions and locally measured meteorological data are very efficient. All forecast models were compared. Ensemble-hybrid models achieve the most accurate forecasts for all horizons. Hybrid models show superiority for intra-hour while image-based methods are the most efficient for intra-day forecasts. More training data can enhance skill score. However, over-fitting is observed when there is too much training data (longer than 2000 days). There has been a substantial improvement in solar forecast accuracy, especially in recent years. More improvement is observed for intra-hour and intra-day than day-ahead forecasts. By controlling for the key differences between forecasts, including location variables, our findings can be applied globally.
△ Less
Submitted 12 April, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
What drives the accuracy of PV output forecasts?
Authors:
Thi Ngoc Nguyen,
Felix Müsgens
Abstract:
Due to the stochastic nature of photovoltaic (PV) power generation, there is high demand for forecasting PV output to better integrate PV generation into power grids. Systematic knowledge regarding the factors influencing forecast accuracy is crucially important, but still mostly unknown. In this paper, we review 180 papers on PV forecasts and extract a database of forecast errors for statistical…
▽ More
Due to the stochastic nature of photovoltaic (PV) power generation, there is high demand for forecasting PV output to better integrate PV generation into power grids. Systematic knowledge regarding the factors influencing forecast accuracy is crucially important, but still mostly unknown. In this paper, we review 180 papers on PV forecasts and extract a database of forecast errors for statistical analysis. We show that among the forecast models, hybrid models consistently outperform the others and will most likely be the future of PV output forecasting. The use of data processing techniques is positively correlated with the forecast quality, while the lengths of the forecast horizon and out-of-sample test set have negative effects on the forecast accuracy. We also found that the inclusion of numerical weather prediction variables, data normalization, and data resampling are the most effective data processing techniques. Furthermore, we found some evidence for cherry picking in reporting errors and recommend that the test sets be at least one year to better assess model performance. The paper also takes the first step towards establishing a benchmark for assessing PV output forecasts.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Proactive DP: A Multple Target Optimization Framework for DP-SGD
Authors:
Marten van Dijk,
Nhuong V. Nguyen,
Toan N. Nguyen,
Lam M. Nguyen,
Phuong Ha Nguyen
Abstract:
We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) th…
▽ More
We introduce a multiple target optimization framework for DP-SGD referred to as pro-active DP. In contrast to traditional DP accountants, which are used to track the expenditure of privacy budgets, the pro-active DP scheme allows one to a-priori select parameters of DP-SGD based on a fixed privacy budget (in terms of $ε$ and $δ$) in such a way to optimize the anticipated utility (test accuracy) the most. To achieve this objective, we first propose significant improvements to the moment account method, presenting a closed-form $(ε,δ)$-DP guarantee that connects all parameters in the DP-SGD setup. We show that DP-SGD is $(ε<0.5,δ=1/N)$-DP if $σ=\sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ and $(2/e)^2k^2-1/2\geq \ln(N)$, where $T$ is the total number of rounds, and $K=kN$ is the total number of gradient computations where $k$ measures $K$ in number of epochs of size $N$ of the local data set. We prove that our expression is close to tight in that if $T$ is more than a constant factor $\approx 4$ smaller than the lower bound $\approx 2k^2/ε$, then the $(ε,δ)$-DP guarantee is violated. The above DP guarantee can be enhanced in thatDP-SGD is $(ε, δ)$-DP if $σ= \sqrt{2(ε+\ln(1/δ))/ε}$ with $T$ at least $\approx 2k^2/ε$ together with two additional, less intuitive, conditions that allow larger $ε\geq 0.5$. Our DP theory allows us to create a utility graph and DP calculator. These tools link privacy and utility objectives and search for optimal experiment setups, efficiently taking into account both accuracy and privacy objectives, as well as implementation goals. We furnish a comprehensive implementation flow of our proactive DP, with rigorous experiments to showcase the proof-of-concept.
△ Less
Submitted 4 June, 2024; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes
Authors:
Marten van Dijk,
Nhuong V. Nguyen,
Toan N. Nguyen,
Lam M. Nguyen,
Quoc Tran-Dinh,
Phuong Ha Nguyen
Abstract:
Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computation…
▽ More
Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.
△ Less
Submitted 26 February, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise
Authors:
Marten van Dijk,
Nhuong V. Nguyen,
Toan N. Nguyen,
Lam M. Nguyen,
Quoc Tran-Dinh,
Phuong Ha Nguyen
Abstract:
The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between cl…
▽ More
The feasibility of federated learning is highly constrained by the server-clients infrastructure in terms of network communication. Most newly launched smartphones and IoT devices are equipped with GPUs or sufficient computing hardware to run powerful AI models. However, in case of the original synchronous federated learning, client devices suffer waiting times and regular communication between clients and server is required. This implies more sensitivity to local model training times and irregular or missed updates, hence, less or limited scalability to large numbers of clients and convergence rates measured in real time will suffer. We propose a new algorithm for asynchronous federated learning which eliminates waiting times and reduces overall network communication - we provide rigorous theoretical analysis for strongly convex objective functions and provide simulation results. By adding Gaussian noise we show how our algorithm can be made differentially private -- new theorems show how the aggregated added Gaussian noise is significantly reduced.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.