-
Empirical Error Estimates for Graph Sparsification
Authors:
Siyao Wang,
Miles E. Lopes
Abstract:
Graph sparsification is a well-established technique for accelerating graph-based learning algorithms, which uses edge sampling to approximate dense graphs with sparse ones. Because the sparsification error is random and unknown, users must contend with uncertainty about the reliability of downstream computations. Although it is possible for users to obtain conceptual guidance from theoretical err…
▽ More
Graph sparsification is a well-established technique for accelerating graph-based learning algorithms, which uses edge sampling to approximate dense graphs with sparse ones. Because the sparsification error is random and unknown, users must contend with uncertainty about the reliability of downstream computations. Although it is possible for users to obtain conceptual guidance from theoretical error bounds in the literature, such results are typically impractical at a numerical level. Taking an alternative approach, we propose to address these issues from a data-driven perspective by computing empirical error estimates. The proposed error estimates are highly versatile, and we demonstrate this in four use cases: Laplacian matrix approximation, graph cut queries, graph-structured regression, and spectral clustering. Moreover, we provide two theoretical guarantees for the error estimates, and explain why the cost of computing them is manageable in comparison to the overall cost of a typical graph sparsification workflow.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Symbol-Error Probability Constrained Power Minimization for Reconfigurable Intelligent Surfaces-based Passive Transmitter
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau
Abstract:
This study considers a virtual multiuser multiple-input multiple-output system with PSK modulation realized via the reconfigurable intelligent surface-based passive transmitter setup. Under this framework, the study derives the formulation for the union-bound symbol-error probability, which is an upper bound on the actual symbol-error probability. Based on this, a symbol-level precoding power mini…
▽ More
This study considers a virtual multiuser multiple-input multiple-output system with PSK modulation realized via the reconfigurable intelligent surface-based passive transmitter setup. Under this framework, the study derives the formulation for the union-bound symbol-error probability, which is an upper bound on the actual symbol-error probability. Based on this, a symbol-level precoding power minimization problem under the condition that the union-bound symbol-error probability is below a given requirement is proposed. The problem is formulated as a constrained optimization on an oblique manifold, and solved via a bisection method. The method consists of successively optimizing transmit power while evaluating the feasibility of the union-bound symbol-error probability requisite by solving, via the Riemannian conjugate gradient algorithm, an auxiliary problem dependent only on the reflection coefficients of the reconfigurable intelligent surface elements. Numerical results demonstrate the effectiveness of the proposed approach in minimizing the transmit power for different symbol-error probability requirements.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software
Authors:
Riley Murray,
James Demmel,
Michael W. Mahoney,
N. Benjamin Erichson,
Maksim Melnichenko,
Osman Asif Malik,
Laura Grigori,
Piotr Luszczek,
Michał Dereziński,
Miles E. Lopes,
Tianyu Liang,
Hengrui Luo,
Jack Dongarra
Abstract:
Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations.
The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef…
▽ More
Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations.
The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKit-Learn.
For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available MATLAB and Python implementations.
△ Less
Submitted 12 April, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Error Estimation for Random Fourier Features
Authors:
Junwen Yao,
N. Benjamin Erichson,
Miles E. Lopes
Abstract:
Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into…
▽ More
Random Fourier Features (RFF) is among the most popular and broadly applicable approaches for scaling up kernel methods. In essence, RFF allows the user to avoid costly computations on a large kernel matrix via a fast randomized approximation. However, a pervasive difficulty in applying RFF is that the user does not know the actual error of the approximation, or how this error will propagate into downstream learning tasks. Up to now, the RFF literature has primarily dealt with these uncertainties using theoretical error bounds, but from a user's standpoint, such results are typically impractical -- either because they are highly conservative or involve unknown quantities. To tackle these general issues in a data-driven way, this paper develops a bootstrap approach to numerically estimate the errors of RFF approximations. Three key advantages of this approach are: (1) The error estimates are specific to the problem at hand, avoiding the pessimism of worst-case bounds. (2) The approach is flexible with respect to different uses of RFF, and can even estimate errors in downstream learning tasks. (3) The approach enables adaptive computation, so that the user can quickly inspect the error of a rough initial kernel approximation and then predict how much extra work is needed. Lastly, in exchange for all of these benefits, the error estimates can be obtained at a modest computational cost.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
MMSE Symbol Level Precoding Under a Per Antenna Power Constraint for Multiuser MIMO Systems With PSK Modulation
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau
Abstract:
This study proposes a symbol-level precoding algorithm based on the minimum mean squared error design objective under a strict per antenna power constraint for PSK modulation. The proposed design is then formulated in the standard form of a second-order cone program, allowing for an optimal solution via the interior point method. Numerical results indicate that the proposed design is superior to t…
▽ More
This study proposes a symbol-level precoding algorithm based on the minimum mean squared error design objective under a strict per antenna power constraint for PSK modulation. The proposed design is then formulated in the standard form of a second-order cone program, allowing for an optimal solution via the interior point method. Numerical results indicate that the proposed design is superior to the existing approaches in terms of bit-error-rate for the low and intermediate SNR regime.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Improving Experience Replay through Modeling of Similar Transitions' Sets
Authors:
Daniel Eugênio Neves,
João Pedro Oliveira Batisteli,
Eduardo Felipe Lopes,
Lucila Ishitani,
Zenilton Kleber Gonçalves do Patrocínio Júnior
Abstract:
In this work, we propose and evaluate a new reinforcement learning method, COMPact Experience Replay (COMPER), which uses temporal difference learning with predicted target values based on recurrence over sets of similar transitions, and a new approach for experience replay based on two transitions memories. Our objective is to reduce the required number of experiences to agent training regarding…
▽ More
In this work, we propose and evaluate a new reinforcement learning method, COMPact Experience Replay (COMPER), which uses temporal difference learning with predicted target values based on recurrence over sets of similar transitions, and a new approach for experience replay based on two transitions memories. Our objective is to reduce the required number of experiences to agent training regarding the total accumulated rewarding in the long run. Its relevance to reinforcement learning is related to the small number of observations that it needs to achieve results similar to that obtained by relevant methods in the literature, that generally demand millions of video frames to train an agent on the Atari 2600 games. We report detailed results from five training trials of COMPER for just 100,000 frames and about 25,000 iterations with a small experiences memory on eight challenging games of Arcade Learning Environment (ALE). We also present results for a DQN agent with the same experimental protocol on the same games set as the baseline. To verify the performance of COMPER on approximating a good policy from a smaller number of observations, we also compare its results with that obtained from millions of frames presented on the benchmark of ALE.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Minimum Symbol Error Probability Low-Resolution Precoding for MU-MIMO Systems With PSK Modulation
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau,
Amine Mezghani
Abstract:
We propose an optimal low-resolution precoding technique that minimizes the symbol error probability of the users. Unlike existing approaches that rely on QPSK modulation, for the derivation of the minimum symbol error probability objective function the current approach allows for any PSK modulation order. Moreover, the proposed method solves the corresponding discrete optimization problem optimal…
▽ More
We propose an optimal low-resolution precoding technique that minimizes the symbol error probability of the users. Unlike existing approaches that rely on QPSK modulation, for the derivation of the minimum symbol error probability objective function the current approach allows for any PSK modulation order. Moreover, the proposed method solves the corresponding discrete optimization problem optimally via a sophisticated branch-and-bound method. Moreover, we propose different approaches based on the greedy search method to compute practical solutions. Numerical simulations confirm the superiority of the proposed minimum symbol error probability criteria in terms of symbol error rate when compared with the established MMDDT and MMSE approaches.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Discrete MMSE Precoding for Multiuser MIMO Systems with PSK Modulation
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau
Abstract:
We propose an optimal MMSE precoding technique using quantized signals with constant envelope. Unlike the existing MMSE design that relies on 1-bit resolution, the proposed approach employs uniform phase quantization and the bounding step in the branch-and-bound method is different in terms of considering the most restrictive relaxation of the nonconvex problem, which is then utilized for a subopt…
▽ More
We propose an optimal MMSE precoding technique using quantized signals with constant envelope. Unlike the existing MMSE design that relies on 1-bit resolution, the proposed approach employs uniform phase quantization and the bounding step in the branch-and-bound method is different in terms of considering the most restrictive relaxation of the nonconvex problem, which is then utilized for a suboptimal design also. Moreover, unlike prior studies, we propose three different soft detection methods and an iterative detection and decoding scheme that allow the utilization of channel coding in conjunction with low-resolution precoding. Besides an exact approach for computing the extrinsic information, we propose two approximations with reduced computational complexity. Numerical simulations show that utilizing the MMSE criterion instead of the established maximum-minimum distance to the decision threshold yields a lower bit-error-rate in many scenarios. Furthermore, when using the MMSE criterion, a smaller number of bound evaluations in the branch-and-bound method is required for low and medium SNR. Finally, results based on an LDPC block code indicate that the receive processing schemes yield a lower bit-error-rate compared to the conventional design.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Randomized Algorithms for Scientific Computing (RASC)
Authors:
Aydin Buluc,
Tamara G. Kolda,
Stefan M. Wild,
Mihai Anitescu,
Anthony DeGennaro,
John Jakeman,
Chandrika Kamath,
Ramakrishnan Kannan,
Miles E. Lopes,
Per-Gunnar Martinsson,
Kary Myers,
Jelani Nelson,
Juan M. Restrepo,
C. Seshadhri,
Draguna Vrabie,
Brendt Wohlberg,
Stephen J. Wright,
Chao Yang,
Peter Zwart
Abstract:
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc…
▽ More
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
△ Less
Submitted 21 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Iterative Detection and Decoding for Multiuser MIMO Systems with Low Resolution Precoding and PSK Modulation
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau
Abstract:
Low-resolution precoding techniques have gained considerable attention in the wireless communications area recently. Vital but hardly discussed in literature, discrete precoding in conjunction with channel coding is the subject of this study. Unlike prior studies, we propose three different soft detection methods and an iterative detection and decoding scheme that allow the utilization of channel…
▽ More
Low-resolution precoding techniques have gained considerable attention in the wireless communications area recently. Vital but hardly discussed in literature, discrete precoding in conjunction with channel coding is the subject of this study. Unlike prior studies, we propose three different soft detection methods and an iterative detection and decoding scheme that allow the utilization of channel coding in conjunction with low-resolution precoding. Besides an exact approach for computing the extrinsic information, we propose two approximations with reduced computational complexity. Numerical results based on PSK modulation and an LDPC block code indicate a superior performance as compared to the system design based on the common AWGN channel model in terms of bit-error-rate.
△ Less
Submitted 19 May, 2021; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Error Estimation for Sketched SVD via the Bootstrap
Authors:
Miles E. Lopes,
N. Benjamin Erichson,
Michael W. Mahoney
Abstract:
In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bo…
▽ More
In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach. However, a key practical difficulty of sketching an SVD is that the user does not know how far the sketched singular vectors/values are from the exact ones. Indeed, the user may be forced to rely on analytical worst-case error bounds, which do not account for the unique structure of a given problem. As a result, the lack of tools for error estimation often leads to much more computation than is really necessary. To overcome these challenges, this paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. In particular, this allows the user to inspect the quality of a rough initial sketched SVD, and then adaptively predict how much extra work is needed to reach a given error tolerance. Furthermore, the method is computationally inexpensive, because it operates only on sketched objects, and it requires no passes over the full matrix being factored. Lastly, the method is supported by theoretical guarantees and a very encouraging set of experimental results.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Optimal Precoding for Multiuser MIMO Systems With Phase Quantization and PSK Modulation via Branch-and-Bound
Authors:
Erico S. P. Lopes,
Lukas T. N. Landau
Abstract:
MIMO systems are considered as most promising for wireless communications. However, with an increasing number of radio front ends the corresponding energy consumption and costs become an issue, which can be relieved by the utilization of low-resolution quantizers. In this study we propose an optimal precoding algorithm constrained to constant envelope signals and phase quantization that maximizes…
▽ More
MIMO systems are considered as most promising for wireless communications. However, with an increasing number of radio front ends the corresponding energy consumption and costs become an issue, which can be relieved by the utilization of low-resolution quantizers. In this study we propose an optimal precoding algorithm constrained to constant envelope signals and phase quantization that maximizes the minimum distance to the decision threshold at the receivers using a branch-and-bound strategy. The proposed algorithm is superior to the existing methods in terms of bit error rate. Numerical results show that the proposed approach has significantly lower complexity than exhaustive search.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting
Authors:
Miles E. Lopes,
Suofei Wu,
Thomas C. M. Lee
Abstract:
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem…
▽ More
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.
△ Less
Submitted 3 August, 2019;
originally announced August 2019.
-
Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap
Authors:
Miles E. Lopes,
Shusen Wang,
Michael W. Mahoney
Abstract:
Over the course of the past decade, a variety of randomized algorithms have been proposed for computing approximate least-squares (LS) solutions in large-scale settings. A longstanding practical issue is that, for any given input, the user rarely knows the actual error of an approximate solution (relative to the exact solution). Likewise, it is difficult for the user to know precisely how much com…
▽ More
Over the course of the past decade, a variety of randomized algorithms have been proposed for computing approximate least-squares (LS) solutions in large-scale settings. A longstanding practical issue is that, for any given input, the user rarely knows the actual error of an approximate solution (relative to the exact solution). Likewise, it is difficult for the user to know precisely how much computation is needed to achieve the desired error tolerance. Consequently, the user often appeals to worst-case error bounds that tend to offer only qualitative guidance. As a more practical alternative, we propose a bootstrap method to compute a posteriori error estimates for randomized LS algorithms. These estimates permit the user to numerically assess the error of a given solution, and to predict how much work is needed to improve a "preliminary" solution. In addition, we provide theoretical consistency results for the method, which are the first such results in this context (to the best of our knowledge). From a practical standpoint, the method also has considerable flexibility, insofar as it can be applied to several popular sketching algorithms, as well as a variety of error metrics. Moreover, the extra step of error estimation does not add much cost to an underlying sketching algorithm. Finally, we demonstrate the effectiveness of the method with empirical results.
△ Less
Submitted 6 September, 2018; v1 submitted 21 March, 2018;
originally announced March 2018.
-
A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
Authors:
Miles E. Lopes,
Shusen Wang,
Michael W. Mahoney
Abstract:
In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems. Typically, the essential ingredient of these methods is some form of randomized dimension reduction, which accelerates computations, but also creates random approximation error. In this way, the dimension reduction step encodes a tradeoff between cost and a…
▽ More
In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems. Typically, the essential ingredient of these methods is some form of randomized dimension reduction, which accelerates computations, but also creates random approximation error. In this way, the dimension reduction step encodes a tradeoff between cost and accuracy. However, the exact numerical relationship between cost and accuracy is typically unknown, and consequently, it may be difficult for the user to precisely know (1) how accurate a given solution is, or (2) how much computation is needed to achieve a given level of accuracy. In the current paper, we study randomized matrix multiplication (sketching) as a prototype setting for addressing these general problems. As a solution, we develop a bootstrap method for \emph{directly estimating} the accuracy as a function of the reduced dimension (as opposed to deriving worst-case bounds on the accuracy in terms of the reduced dimension). From a computational standpoint, the proposed method does not substantially increase the cost of standard sketching methods, and this is made possible by an "extrapolation" technique. In addition, we provide both theoretical and empirical results to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 3 April, 2019; v1 submitted 6 August, 2017;
originally announced August 2017.
-
Unknown sparsity in compressed sensing: Denoising and inference
Authors:
Miles E. Lopes
Abstract:
The theory of Compressed Sensing (CS) asserts that an unknown signal $x\in\mathbb{R}^p$ can be accurately recovered from an underdetermined set of $n$ linear measurements with $n\ll p$, provided that $x$ is sufficiently sparse. However, in applications, the degree of sparsity $\|x\|_0$ is typically unknown, and the problem of directly estimating $\|x\|_0$ has been a longstanding gap between theory…
▽ More
The theory of Compressed Sensing (CS) asserts that an unknown signal $x\in\mathbb{R}^p$ can be accurately recovered from an underdetermined set of $n$ linear measurements with $n\ll p$, provided that $x$ is sufficiently sparse. However, in applications, the degree of sparsity $\|x\|_0$ is typically unknown, and the problem of directly estimating $\|x\|_0$ has been a longstanding gap between theory and practice. A closely related issue is that $\|x\|_0$ is a highly idealized measure of sparsity, and for real signals with entries not equal to 0, the value $\|x\|_0=p$ is not a useful description of compressibility. In our previous conference paper [Lop13] that examined these problems, we considered an alternative measure of "soft" sparsity, $\|x\|_1^2/\|x\|_2^2$, and designed a procedure to estimate $\|x\|_1^2/\|x\|_2^2$ that does not rely on sparsity assumptions.
The present work offers a new deconvolution-based method for estimating unknown sparsity, which has wider applicability and sharper theoretical guarantees. In particular, we introduce a family of entropy-based sparsity measures $s_q(x):=\big(\frac{\|x\|_q}{\|x\|_1}\big)^{\frac{q}{1-q}}$ parameterized by $q\in[0,\infty]$. This family interpolates between $\|x\|_0=s_0(x)$ and $\|x\|_1^2/\|x\|_2^2=s_2(x)$ as $q$ ranges over $[0,2]$. For any $q\in (0,2]\setminus\{1\}$, we propose an estimator $\hat{s}_q(x)$ whose relative error converges at the dimension-free rate of $1/\sqrt{n}$, even when $p/n\to\infty$. Our main results also describe the limiting distribution of $\hat{s}_q(x)$, as well as some connections to Basis Pursuit Denosing, the Lasso, deterministic measurement matrices, and inference problems in CS.
△ Less
Submitted 31 August, 2017; v1 submitted 25 July, 2015;
originally announced July 2015.
-
iReclass - An automatic system for recording classes
Authors:
Edson Lopes,
José Caetano,
António Abreu,
Frederico Grilo
Abstract:
This paper presents the details of a system capable of recording on video a traditional class. By traditional class it is meant a teacher, a blackboard and a white canvas where course notes are projected. The system is able to track the movements of the lecturer, while recording it on video at the required frame rate (e.g., 25 fps). The system is also capable of understanding five arm gestures mad…
▽ More
This paper presents the details of a system capable of recording on video a traditional class. By traditional class it is meant a teacher, a blackboard and a white canvas where course notes are projected. The system is able to track the movements of the lecturer, while recording it on video at the required frame rate (e.g., 25 fps). The system is also capable of understanding five arm gestures made by the lecturer with the intent of controlling which scenario is recorded: himself, the blackboard or the white canvas. The remaining two gestures are for start/stop the recorder. The system is composed by a Kinect sensor, a video camera, a microphone, one pan-tilt system and one pan system, using a total of three step motors.
△ Less
Submitted 31 December, 2014;
originally announced January 2015.
-
Estimating a sharp convergence bound for randomized ensembles
Authors:
Miles E. Lopes
Abstract:
When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers…
▽ More
When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers a way to quantify this convergence in terms of "algorithmic variance," i.e. the variance of prediction error due only to the randomized training algorithm. Specifically, we study a theoretical upper bound on this variance, and show that it is sharp --- in the sense that it is attained by a specific family of randomized classifiers. Next, we address the problem of estimating the unknown value of the bound, which leads to a unique twist on the classical problem of non-parametric density estimation. In particular, we develop an estimator for the bound and show that its MSE matches optimal non-parametric rates under certain conditions. (Concurrent with this work, some closely related results have also been considered in Cannings and Samworth (2017) and Lopes (2019).)
△ Less
Submitted 30 April, 2019; v1 submitted 4 March, 2013;
originally announced March 2013.
-
Estimating Unknown Sparsity in Compressed Sensing
Authors:
Miles E. Lopes
Abstract:
In the theory of compressed sensing (CS), the sparsity ||x||_0 of the unknown signal x\in\R^p is commonly assumed to be a known parameter. However, it is typically unknown in practice. Due to the fact that many aspects of CS depend on knowing ||x||_0, it is important to estimate this parameter in a data-driven way. A second practical concern is that ||x||_0 is a highly unstable function of x. In p…
▽ More
In the theory of compressed sensing (CS), the sparsity ||x||_0 of the unknown signal x\in\R^p is commonly assumed to be a known parameter. However, it is typically unknown in practice. Due to the fact that many aspects of CS depend on knowing ||x||_0, it is important to estimate this parameter in a data-driven way. A second practical concern is that ||x||_0 is a highly unstable function of x. In particular, for real signals with entries not exactly equal to 0, the value ||x||_0=p is not a useful description of the effective number of coordinates. In this paper, we propose to estimate a stable measure of sparsity s(x):=||x||_1^2/||x||_2^2, which is a sharp lower bound on ||x||_0. Our estimation procedure uses only a small number of linear measurements, does not rely on any sparsity assumptions, and requires very little computation. A confidence interval for s(x) is provided, and its width is shown to have no dependence on the signal dimension p. Moreover, this result extends naturally to the matrix recovery setting, where a soft version of matrix rank can be estimated with analogous guarantees. Finally, we show that the use of randomized measurements is essential to estimating s(x). This is accomplished by proving that the minimax risk for estimating s(x) with deterministic measurements is large when n<<p.
△ Less
Submitted 25 February, 2013; v1 submitted 18 April, 2012;
originally announced April 2012.
-
On simulating nondeterministic stochastic activity networks
Authors:
Valmir C. Barbosa,
Fernando M. L. Ferreira,
Daniel V. Kling,
Eduardo Lopes,
Fabio Protti,
Eber A. Schmitz
Abstract:
In this work we deal with a mechanism for process simulation called a NonDeterministic Stochastic Activity Network (NDSAN). An NDSAN consists basically of a set of activities along with precedence relations involving these activities, which determine their order of execution. Activity durations are stochastic, given by continuous, nonnegative random variables. The nondeterministic behavior of an…
▽ More
In this work we deal with a mechanism for process simulation called a NonDeterministic Stochastic Activity Network (NDSAN). An NDSAN consists basically of a set of activities along with precedence relations involving these activities, which determine their order of execution. Activity durations are stochastic, given by continuous, nonnegative random variables. The nondeterministic behavior of an NDSAN is based on two additional possibilities: (i) by associating choice probabilities with groups of activities, some branches of execution may not be taken; (ii) by allowing iterated executions of groups of activities according to predetermined probabilities, the number of times an activity must be executed is not determined a priori. These properties lead to a rich variety of activity networks, capable of modeling many real situations in process engineering, project design, and troubleshooting. We describe a recursive simulation algorithm for NDSANs, whose repeated execution produces a close approximation to the probability distribution of the completion time of the entire network. We also report on real-world case studies.
△ Less
Submitted 28 March, 2007; v1 submitted 28 December, 2006;
originally announced December 2006.