-
Quasicyclic Principal Component Analysis
Authors:
Susanna E. Rumsey,
Stark C. Draper,
Frank R. Kschischang
Abstract:
We present quasicyclic principal component analysis (QPCA), a generalization of principal component analysis (PCA), that determines an optimized basis for a dataset in terms of families of shift-orthogonal principal vectors. This is of particular interest when analyzing cyclostationary data, whose cyclic structure is not exploited by the standard PCA algorithm. We first formulate QPCA as an optimi…
▽ More
We present quasicyclic principal component analysis (QPCA), a generalization of principal component analysis (PCA), that determines an optimized basis for a dataset in terms of families of shift-orthogonal principal vectors. This is of particular interest when analyzing cyclostationary data, whose cyclic structure is not exploited by the standard PCA algorithm. We first formulate QPCA as an optimization problem, which we show may be decomposed into a series of PCA problems in the frequency domain. We then formalize our solution as an explicit algorithm and analyze its computational complexity. Finally, we provide some examples of applications of QPCA to cyclostationary signal processing data, including an investigation of carrier pulse recovery, a presentation of methods for estimating an unknown oversampling rate, and a discussion of an appropriate approach for pre-processing data with a non-integer oversampling rate in order to better apply the QPCA algorithm.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block Model
Authors:
Nafiseh Ghoroghchian,
Gautam Dasarathy,
Stark C. Draper
Abstract:
We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our…
▽ More
We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closed-form asymptotic conditions to achieve the perfect recovery of the coarse graph communities.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Anytime Minibatch with Delayed Gradients
Authors:
Haider Al-Lawati,
Stark C. Draper
Abstract:
Distributed optimization is widely deployed in practice to solve a broad range of problems. In a typical asynchronous scheme, workers calculate gradients with respect to out-of-date optimization parameters while the master uses stale (i.e., delayed) gradients to update the parameters. While using stale gradients can slow the convergence, asynchronous methods speed up the overall optimization with…
▽ More
Distributed optimization is widely deployed in practice to solve a broad range of problems. In a typical asynchronous scheme, workers calculate gradients with respect to out-of-date optimization parameters while the master uses stale (i.e., delayed) gradients to update the parameters. While using stale gradients can slow the convergence, asynchronous methods speed up the overall optimization with respect to wall clock time by allowing more frequent updates and reducing idling times. In this paper, we present a variable per-epoch minibatch scheme called Anytime Minibatch with Delayed Gradients (AMB-DG). In AMB-DG, workers compute gradients in epochs of a fixed time while the master uses stale gradients to update the optimization parameters. We analyze AMB-DG in terms of its regret bound and convergence rate. We prove that for convex smooth objective functions, AMB-DG achieves the optimal regret bound and convergence rate. We compare the performance of AMB-DG with that of Anytime Minibatch (AMB) which is similar to AMB-DG but does not use stale gradients. In AMB, workers stay idle after each gradient transmission to the master until they receive the updated parameters from the master while in AMB-DG workers never idle. We also extend AMB-DG to the fully distributed setting. We compare AMB-DG with AMB when the communication delay is long and observe that AMB-DG converges faster than AMB in wall clock time. We also compare the performance of AMB-DG with the state-of-the-art fixed minibatch approach that uses delayed gradients. We run our experiments on a real distributed system and observe that AMB-DG converges more than two times.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Anytime MiniBatch: Exploiting Stragglers in Online Distributed Optimization
Authors:
Nuwan Ferdinand,
Haider Al-Lawati,
Stark C. Draper,
Matthew Nokleby
Abstract:
Distributed optimization is vital in solving large-scale machine learning problems. A widely-shared feature of distributed optimization techniques is the requirement that all nodes complete their assigned tasks in each computational epoch before the system can proceed to the next epoch. In such settings, slow nodes, called stragglers, can greatly slow progress. To mitigate the impact of stragglers…
▽ More
Distributed optimization is vital in solving large-scale machine learning problems. A widely-shared feature of distributed optimization techniques is the requirement that all nodes complete their assigned tasks in each computational epoch before the system can proceed to the next epoch. In such settings, slow nodes, called stragglers, can greatly slow progress. To mitigate the impact of stragglers, we propose an online distributed optimization method called Anytime Minibatch. In this approach, all nodes are given a fixed time to compute the gradients of as many data samples as possible. The result is a variable per-node minibatch size. Workers then get a fixed communication time to average their minibatch gradients via several rounds of consensus, which are then used to update primal variables via dual averaging. Anytime Minibatch prevents stragglers from holding up the system without wasting the work that stragglers can complete. We present a convergence analysis and analyze the wall time performance. Our numerical results show that our approach is up to 1.5 times faster in Amazon EC2 and it is up to five times faster when there is greater variability in compute node performance.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Hardware-Based Linear Program Decoding with the Alternating Direction Method of Multipliers
Authors:
Mitchell Wasson,
Mario Milicevic,
Stark C. Draper,
Glenn Gulak
Abstract:
We present a hardware-based implementation of Linear Program (LP) decoding for binary linear codes. LP decoding frames error-correction as an optimization problem. In contrast, variants of Belief Propagation (BP) decoding frame error-correction as a problem of graphical inference. LP decoding has several advantages over BP-based methods, including convergence guarantees and better error-rate perfo…
▽ More
We present a hardware-based implementation of Linear Program (LP) decoding for binary linear codes. LP decoding frames error-correction as an optimization problem. In contrast, variants of Belief Propagation (BP) decoding frame error-correction as a problem of graphical inference. LP decoding has several advantages over BP-based methods, including convergence guarantees and better error-rate performance in high-reliability channels. The latter makes LP decoding attractive for optical transport and storage applications. However, LP decoding, when implemented with general solvers, does not scale to large blocklengths and is not suitable for a parallelized implementation in hardware. It has been recently shown that the Alternating Direction Method of Multipliers (ADMM) can be applied to decompose the LP decoding problem. The result is a message-passing algorithm with a structure very similar to BP. We present new intuition for this decoding algorithm as well as for its major computational primitive: projection onto the parity polytope. Furthermore, we present results for a fixed-point Verilog implementation of ADMM-LP decoding. This implementation targets a Field-Programmable Gate Array (FPGA) platform to evaluate error-rate performance and estimate resource usage. We show that Frame Error Rate (FER) performance well within 0.5dB of double-precision implementations is possible with 10-bit messages. Finally, we outline a number of research opportunities that should be explored en-route to the realization of an Application Specific Integrated Circuit (ASIC) implementation capable of gigabit per second throughput.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
Queuing Theoretic Analysis of Power-performance Tradeoff in Power-efficient Computing
Authors:
Yanpei Liu,
Stark C. Draper,
Nam Sung Kim
Abstract:
In this paper we study the power-performance relationship of power-efficient computing from a queuing theoretic perspective. We investigate the interplay of several system operations including processing speed, system on/off decisions, and server farm size. We identify that there are oftentimes "sweet spots" in power-efficient operations: there exist optimal combinations of processing speed and sy…
▽ More
In this paper we study the power-performance relationship of power-efficient computing from a queuing theoretic perspective. We investigate the interplay of several system operations including processing speed, system on/off decisions, and server farm size. We identify that there are oftentimes "sweet spots" in power-efficient operations: there exist optimal combinations of processing speed and system settings that maximize power efficiency. For the single server case, a widely deployed threshold mechanism is studied. We show that there exist optimal processing speed and threshold value pairs that minimize the power consumption. This holds for the threshold mechanism with job batching. For the multi-server case, it is shown that there exist best processing speed and server farm size combinations.
△ Less
Submitted 6 March, 2013;
originally announced March 2013.
-
Decomposition Methods for Large Scale LP Decoding
Authors:
Siddharth Barman,
Xishuo Liu,
Stark C. Draper,
Benjamin Recht
Abstract:
When binary linear error-correcting codes are used over symmetric channels, a relaxed version of the maximum likelihood decoding problem can be stated as a linear program (LP). This LP decoder can be used to decode error-correcting codes at bit-error-rates comparable to state-of-the-art belief propagation (BP) decoders, but with significantly stronger theoretical guarantees. However, LP decoding w…
▽ More
When binary linear error-correcting codes are used over symmetric channels, a relaxed version of the maximum likelihood decoding problem can be stated as a linear program (LP). This LP decoder can be used to decode error-correcting codes at bit-error-rates comparable to state-of-the-art belief propagation (BP) decoders, but with significantly stronger theoretical guarantees. However, LP decoding when implemented with standard LP solvers does not easily scale to the block lengths of modern error correcting codes. In this paper we draw on decomposition methods from optimization theory, specifically the Alternating Directions Method of Multipliers (ADMM), to develop efficient distributed algorithms for LP decoding.
The key enabling technical result is a "two-slice" characterization of the geometry of the parity polytope, which is the convex hull of all codewords of a single parity check code. This new characterization simplifies the representation of points in the polytope. Using this simplification, we develop an efficient algorithm for Euclidean norm projection onto the parity polytope. This projection is required by ADMM and allows us to use LP decoding, with all its theoretical guarantees, to decode large-scale error correcting codes efficiently.
We present numerical results for LDPC codes of lengths more than 1000. The waterfall region of LP decoding is seen to initiate at a slightly higher signal-to-noise ratio than for sum-product BP, however an error floor is not observed for LP decoding, which is not the case for BP. Our implementation of LP decoding using ADMM executes as fast as our baseline sum-product BP decoder, is fully parallelizable, and can be seen to implement a type of message-passing with a particularly simple schedule.
△ Less
Submitted 23 September, 2013; v1 submitted 2 April, 2012;
originally announced April 2012.