-
Wrapping nonspherical vesicles at bio-membranes
Authors:
Ajit Kumar Sahu,
Rajkumar Malik,
Jiarul Midya
Abstract:
The wrapping of particles and vesicles by lipid bilayer membranes is a fundamental process in cellular transport and targeted drug delivery. Here, we investigate the wrapping behavior of nonspherical vesicles, such as ellipsoidal, prolate, oblate, and stomatocytes, by systematically varying the bending rigidity of the vesicle membrane and the tension of the planar membrane. Using the Helfrich Hami…
▽ More
The wrapping of particles and vesicles by lipid bilayer membranes is a fundamental process in cellular transport and targeted drug delivery. Here, we investigate the wrapping behavior of nonspherical vesicles, such as ellipsoidal, prolate, oblate, and stomatocytes, by systematically varying the bending rigidity of the vesicle membrane and the tension of the planar membrane. Using the Helfrich Hamiltonian, triangulated membrane models, and energy minimization techniques, we predict multiple stable wrapping states and identify the conditions for their coexistence. Our results demonstrate that softer vesicles bind more easily to planar membranes; however, achieving complete wrapping requires significantly higher adhesion strengths compared to rigid particles. As membrane tension increases, deep-wrapped states disappear at a triple point where shallow-wrapped, deep-wrapped, and complete-wrapped states coexist. The coordinates of the triple point are highly sensitive to the vesicle shape and stiffness. For stomatocytes, increasing stiffness shifts the triple point to higher adhesion strengths and membrane tensions, while for oblates it shifts to lower values, influenced by shape changes during wrapping. Oblate shapes are preferred in shallow-wrapped states and stomatocytes in deep-wrapped states. In contrast to hard particles, where optimal adhesion strength for complete wrapping occurs at tensionless membranes, complete wrapping of soft vesicles requires finite membrane tension for optimal adhesion strength. These findings provide new insights into the interplay between vesicle deformability, shape, and membrane properties, advancing our understanding of endocytosis and the design of advanced biomimetic delivery systems.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Authors:
Utsav Singh,
Souradip Chakraborty,
Wesley A. Suttle,
Brian M. Sadler,
Anit Kumar Sahu,
Mubarak Shah,
Vinay P. Namboodiri,
Amrit Singh Bedi
Abstract:
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) that addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. HPO leverages maximum entropy reinforcement learning combined with token-level Direct Preference Optimization (DPO), eliminating the need for pre-trained re…
▽ More
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) that addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. HPO leverages maximum entropy reinforcement learning combined with token-level Direct Preference Optimization (DPO), eliminating the need for pre-trained reference policies that are typically unavailable in challenging robotic scenarios. Mathematically, we formulate HRL as a bi-level optimization problem and transform it into a primitive-regularized DPO formulation, ensuring feasible subgoal generation and avoiding degenerate solutions. Extensive experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines. Furthermore, ablation studies validate our design choices, and quantitative analyses confirm the ability of HPO to mitigate non-stationarity and infeasible subgoal generation issues in HRL.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Authors:
Feiyang Kang,
Hoang Anh Just,
Yifan Sun,
Himanshu Jahagirdar,
Yuanzhi Zhang,
Rongxing Du,
Anit Kumar Sahu,
Ruoxi Jia
Abstract:
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerg…
▽ More
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerging methods do cater to language data scales. However, they often prioritize data that aligns with the target distribution. While this strategy may be effective when training a model from scratch, it can yield limited results when the model has already been pre-trained on a different distribution. Differing from prior work, our key idea is to select data that nudges the pre-training distribution closer to the target distribution. We show the optimality of this approach for fine-tuning tasks under certain conditions. We demonstrate the efficacy of our methodology across a diverse array of tasks (NLU, NLG, zero-shot) with models up to 2.7B, showing that it consistently surpasses other selection methods. Moreover, our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour. Our code is open-sourced (Code repository: https://anonymous.4open.science/r/DV4LLM-D761/ ). While fine-tuning offers significant potential for enhancing performance across diverse tasks, its associated costs often limit its widespread adoption; with this work, we hope to lay the groundwork for cost-effective fine-tuning, making its benefits more accessible.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
CFD analysis of the influence of solvent viscosity ratio on the creeping flow of viscoelastic fluid over a channel-confined circular cylinder
Authors:
Pratyush Kumar Mohanty,
Akhilesh Kumar Sahu,
Ram Prakash Bharti
Abstract:
In this study, the role of solvent viscosity ratio ($β$) on the creeping flow characteristics of Oldroyd-B fluid over a channel-confined circular cylinder has been explored numerically. The hydrodynamic model equations have been solved by RheoTool, an open-source toolbox based on OpenFOAM, employing the finite volume method for extensive ranges of Deborah number ($De = 0.025-1.5$) and solvent visc…
▽ More
In this study, the role of solvent viscosity ratio ($β$) on the creeping flow characteristics of Oldroyd-B fluid over a channel-confined circular cylinder has been explored numerically. The hydrodynamic model equations have been solved by RheoTool, an open-source toolbox based on OpenFOAM, employing the finite volume method for extensive ranges of Deborah number ($De = 0.025-1.5$) and solvent viscosity ratio ($β= 0.1-0.9$) for the fixed wall blockage ($B = 0.5$). The present investigation has undergone extensive validation, with available literature under specific limited conditions, before obtaining detailed results for the relevant flow phenomena such as streamline, pressure and stress contour profiles, pressure coefficient ($C_p$), wall shear stress ($τ_w$), normal stress ($τ_{xx}$), first normal stress difference ($N_{1}$), and drag coefficient ($C_{\text{D}}$).The flow profiles have exhibited a distinctive behavior characterized by a loss of symmetry in the presence of pronounced viscoelastic and polymeric effects. The results for low $De$ notably align closely with those for Newtonian fluids, and the drag coefficient ($C_D$) remains relatively constant regardless of $β$, as the viscoelastic influence is somewhat subdued. As $De$ increases, the influence of viscoelasticity becomes more pronounced, while a decrease in $β$ leads to an escalation in polymeric effects; an increase in the $C_D$ value is observed as $β$ increases. Within this parameter range, the prevailing force governing the flow is the pressure drag force.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution
Authors:
Marco Bornstein,
Amrit Singh Bedi,
Anit Kumar Sahu,
Furqan Khan,
Furong Huang
Abstract:
Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we…
▽ More
Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we propose RealFM: the first federated mechanism that (1) realistically models device utility, (2) incentivizes data contribution and device participation, (3) provably removes the free-rider dilemma, and (4) relaxes assumptions on data homogeneity and data sharing. Compared to previous FL mechanisms, RealFM allows for a non-linear relationship between model accuracy and utility, which improves the utility gained by the server and participating devices. On real-world data, RealFM improves device and server utility, as well as data contribution, by over 3 and 4 magnitudes respectively compared to baselines.
△ Less
Submitted 22 May, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Federated Representation Learning for Automatic Speech Recognition
Authors:
Guruprasad V Ramesh,
Gopinath Chennupati,
Milind Rao,
Anit Kumar Sahu,
Ariya Rastrow,
Jasha Droppo
Abstract:
Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respec…
▽ More
Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training.
△ Less
Submitted 7 August, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources
Authors:
Feiyang Kang,
Hoang Anh Just,
Anit Kumar Sahu,
Ruoxi Jia
Abstract:
Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size a…
▽ More
Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size and data source composition using the limited available samples. However, these scaling functions are black-box, computationally expensive to fit, highly susceptible to overfitting, or/and difficult to optimize for data selection. This paper proposes a framework called <projektor>, which predicts model performance and supports data selection decisions based on partial samples of prospective data sources. Our approach distinguishes itself from existing work by introducing a novel *two-stage* performance inference process. In the first stage, we leverage the Optimal Transport distance to predict the model's performance for any data mixture ratio within the range of disclosed data sizes. In the second stage, we extrapolate the performance to larger undisclosed data sizes based on a novel parameter-free mapping technique inspired by neural scaling laws. We further derive an efficient gradient-based method to select data sources based on the projected model performance. Evaluation over a diverse range of applications demonstrates that <projektor> significantly improves existing performance scaling approaches in terms of both the accuracy of performance inference and the computation costs associated with constructing the performance predictor. Also, <projektor> outperforms by a wide margin in data selection effectiveness compared to a range of other off-the-shelf solutions.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Federated Self-Learning with Weak Supervision for Speech Recognition
Authors:
Milind Rao,
Gopinath Chennupati,
Gautam Tiwari,
Anit Kumar Sahu,
Anirudh Raju,
Ariya Rastrow,
Jasha Droppo
Abstract:
Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine t…
▽ More
Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model. In particular, we study the performance of a self-learning based scheme, with a paired teacher model updated through an exponential moving average of ASR models. Further, we propose using possibly noisy weak-supervision signals such as feedback scores and natural language understanding semantics determined from user behavior across multiple turns in a session of interactions with the conversational agent. These signals are leveraged in a multi-task policy-gradient training approach to improve the performance of self-learning for ASR. Finally, we show how catastrophic forgetting can be mitigated by combining on-device learning with a memory-replay approach using selected historical datasets. These innovations allow for 10% relative improvement in WER on new use cases with minimal degradation on other test sets in the absence of strong-supervision signals such as ground-truth transcriptions.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Learning When to Trust Which Teacher for Weakly Supervised ASR
Authors:
Aakriti Agrawal,
Milind Rao,
Anit Kumar Sahu,
Gopinath Chennupati,
Andreas Stolcke
Abstract:
Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by t…
▽ More
Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by the expert teachers. In this paper, we exploit supervision from multiple domain experts in training student ASR models. This training strategy is especially useful in scenarios where few or no human transcriptions are available. To that end, we propose a Smart-Weighter mechanism that selects an appropriate expert based on the input audio, and then trains the student model in an unsupervised setting. We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25\% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Influence of Gold-Selenium Precursor Ratio on Synthesis and Structural Stability of α- and β-AuSe
Authors:
Aditya Kumar Sahu,
Satyabrata Raj
Abstract:
Gold selenide (AuSe) is a multilayer compound yet to be thoroughly studied. The colloidal synthesis and characterization of gold selenide nanoparticles are described, emphasizing the effect of different gold-to-selenium precursor ratios and temperatures on the crystal structure and form. The structural characterization is done using an X-ray diffraction pattern. The coexistence of the α- and β-AuS…
▽ More
Gold selenide (AuSe) is a multilayer compound yet to be thoroughly studied. The colloidal synthesis and characterization of gold selenide nanoparticles are described, emphasizing the effect of different gold-to-selenium precursor ratios and temperatures on the crystal structure and form. The structural characterization is done using an X-ray diffraction pattern. The coexistence of the α- and β-AuSe phases is observed in all synthesized samples. The morphologies of the mainly α-AuSe sample are nanobelts, whereas the primarily β-AuSe phase sample has a nanoplate-like structure, according to the TEM and SEM data. All of the samples had Raman vibrational modes with mixed phases. The effect of high pressure on as-prepared AuSe samples has been studied in this work. The introduction of external pressure and temperature allows both phases to transition. Pressure lowers the existence of other phase modes, and the corresponding dominating sample modes are entirely significant in our sample. The phase transition pressure was observed using Raman scattering. Our findings show that 2D AuSe has a lot of promise for multifunctional applications, encouraging more research on these systems.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Effect of confinement on flow around a rotating elliptic cylinder in laminar flow regime
Authors:
Prateek Gupta,
Sibasish Panda,
Akhilesh Kumar Sahu,
Deepak Kumar
Abstract:
The flow phenomena around a rotating elliptic cylinder in a channel is studied numerically. The value of the confinement parameter βis varied as \frac{1}{k}, where k = 2, 4, 6, and 8 respectively, to demonstrate the vortex-shedding patterns around the cylinder in the downstream wake. The non-dimensional rotation rate αtakes up 0.5, 1, and 2 as its value. Additionally, the Reynolds number (\textit{…
▽ More
The flow phenomena around a rotating elliptic cylinder in a channel is studied numerically. The value of the confinement parameter βis varied as \frac{1}{k}, where k = 2, 4, 6, and 8 respectively, to demonstrate the vortex-shedding patterns around the cylinder in the downstream wake. The non-dimensional rotation rate αtakes up 0.5, 1, and 2 as its value. Additionally, the Reynolds number (\textit{Re}) based on the cylinder diameter is taken to be 50, 100, and 150 respectively. A parametric study is performed to explain the changes in drag coefficient \textit{(C_{D})}, lift coefficient \textit{(C_{L})}, and moment coefficient \textit{(C_{M})} with variations of β, α, and \textit{Re}. The Fast-Fourier transform (FFT) of the time-periodic lift signals is presented to understand the shedding frequency characteristics, and the \textit{C_{M}} values are analyzed for cases of autorotation. Despite the introduction of significant confinement and cylinder rotation, complete suppression of vortex shedding is not observed for the considered parameter space. Autorotation is observed and becomes prominent with decrease in non-dimensional rotation rate and increase in confinement and Reynolds number.
△ Less
Submitted 12 December, 2022; v1 submitted 18 August, 2022;
originally announced August 2022.
-
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Authors:
Gopinath Chennupati,
Milind Rao,
Gurpreet Chadha,
Aaron Eakin,
Anirudh Raju,
Gautam Tiwari,
Anit Kumar Sahu,
Ariya Rastrow,
Jasha Droppo,
Andy Oberlin,
Buddha Nandanoor,
Prahalad Venkataramanan,
Zheng Wu,
Pankaj Sitpure
Abstract:
Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems…
▽ More
Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR). By privacy preserving, we mean, usage of ephemeral data which are not human annotated. This system is a step forward for production levelASR models for incremental/continual learning that offers near real-time test-bed for experimentation in the cloud for end-to-end ASR, while adhering to privacy-preserving policies. We show that the proposed system can improve the production models significantly(3%) over a new time period of six months even in the absence of human annotated labels with varying levels of weak supervision and large batch sizes in incremental learning. This improvement is 20% over test sets with new words and phrases in the new time period. We demonstrate the effectiveness of model building in a privacy-preserving incremental fashion for ASR while further exploring the utility of having an effective teacher model and use of large batch sizes.
△ Less
Submitted 22 July, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus
Authors:
Amrit Singh Bedi,
Chen Fan,
Alec Koppel,
Anit Kumar Sahu,
Brian M. Sadler,
Furong Huang,
Dinesh Manocha
Abstract:
In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin…
▽ More
In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considering the Lagrangian relaxation of this problem, we develop a novel primal-dual method called Federated Learning Beyond Consensus (\texttt{FedBC}). Theoretically, we establish that \texttt{FedBC} converges to a first-order stationary point at rates that matches the state of the art, up to an additional error term that depends on a tolerance parameter introduced to scalarize the multi-criterion formulation. Finally, we demonstrate that \texttt{FedBC} balances the global and local model test accuracy metrics across a suite of datasets (Synthetic, MNIST, CIFAR-10, Shakespeare), achieving competitive performance with state-of-the-art.
△ Less
Submitted 1 February, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Self-Aware Personalized Federated Learning
Authors:
Huili Chen,
Jie Ding,
Eric Tramel,
Shuang Wu,
Anit Kumar Sahu,
Salman Avestimehr,
Tao Zhang
Abstract:
In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global mo…
▽ More
In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients' training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. A larger inter-client variation implies more personalization is needed. Correspondingly, our method uses uncertainty-driven local training steps and aggregation rule instead of conventional local fine-tuning and sample size-based aggregation. With experimental studies on synthetic data, Amazon Alexa audio data, and public datasets such as MNIST, FEMNIST, CIFAR10, and Sent140, we show that our proposed method can achieve significantly improved personalization performance compared with the existing counterparts.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Anit Kumar Sahu,
Soummya Kar,
Nemanja Milosevic,
Dusan Stamenkovic
Abstract:
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum…
▽ More
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clipping, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Federated Learning Challenges and Opportunities: An Outlook
Authors:
Jie Ding,
Eric Tramel,
Anit Kumar Sahu,
Shuang Wu,
Salman Avestimehr,
Tao Zhang
Abstract:
Federated learning (FL) has been developed as a promising framework to leverage the resources of edge devices, enhance customers' privacy, comply with regulations, and reduce development costs. Although many methods and applications have been developed for FL, several critical challenges for practical FL systems remain unaddressed. This paper provides an outlook on FL development, categorized into…
▽ More
Federated learning (FL) has been developed as a promising framework to leverage the resources of edge devices, enhance customers' privacy, comply with regulations, and reduce development costs. Although many methods and applications have been developed for FL, several critical challenges for practical FL systems remain unaddressed. This paper provides an outlook on FL development, categorized into five emerging directions of FL, namely algorithm foundation, personalization, hardware and security constraints, lifelong learning, and nonstandard data. Our unique perspectives are backed by practical observations from large-scale federated systems for edge devices.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Partial Model Averaging in Federated Learning: Performance Guarantees and Benefits
Authors:
Sunwoo Lee,
Anit Kumar Sahu,
Chaoyang He,
Salman Avestimehr
Abstract:
Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss c…
▽ More
Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
Effect of Plasmonic Coupling in Different Assembly of Gold Nanorods Studied by FDTD
Authors:
Aditya K. Sahu,
Satyabrata Raj
Abstract:
The influence of the orientation of gold nanorods in different assemblies has been investigated using the Finite Difference Time Domain (FDTD) simulation method. To understand the relative orientation, we vary the size and angle in dimer geometries. Significant effects of plasmon coupling emerged in longitudinal resonances having end-to-end configurations of gold nanorods. The effect of orientatio…
▽ More
The influence of the orientation of gold nanorods in different assemblies has been investigated using the Finite Difference Time Domain (FDTD) simulation method. To understand the relative orientation, we vary the size and angle in dimer geometries. Significant effects of plasmon coupling emerged in longitudinal resonances having end-to-end configurations of gold nanorods. The effect of orientational plasmon coupling in dimers gives rise to both bonding and anti-bonding plasmon modes. Effects of various geometries like primary monomer, dimer, trimer, and tetramer structures have been explored and compared with their higher nanorod ensembles. The asymmetric spectral response in a 4 * 4 gold nanorods array indicates a Fano-like resonance. The variation of gap distance in ordered arrays allowed modulation of the Fano resonance mode. The plasmon modes' resonance wavelength and field enhancement have been tuned by varying the gap distance, angular orientation, size irregularity between the nanorods, and nanorod numbers in an array. The integrated nanostructures studied here are not only significant for fundamental research but also applications in plasmon-based devices.
△ Less
Submitted 2 November, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Understanding the Coupling Mechanism of Gold Nanostructures by Finite-Difference Time-Domain Method
Authors:
Aditya K. Sahu,
Satyabrata Raj
Abstract:
Gold nanoparticle assemblies show a strong plasmonic response due to the combined effects of the individual nanoparticles' plasmon modes. Increasing the number of nanoparticles in structured assemblies leads to significant shifts in the optical and physical properties. We use Finite-Difference Time-Domain (FDTD) simulations to analyze the electromagnetic response of structurally ordered gold nanor…
▽ More
Gold nanoparticle assemblies show a strong plasmonic response due to the combined effects of the individual nanoparticles' plasmon modes. Increasing the number of nanoparticles in structured assemblies leads to significant shifts in the optical and physical properties. We use Finite-Difference Time-Domain (FDTD) simulations to analyze the electromagnetic response of structurally ordered gold nanorods in monomer and dimer configurations. The plasmonic coupling between nanorods in monomers or dimers configurations provides a unique technique for tuning the spectrum intensity, spatial distribution, and polarisation of local electric fields within and surrounding nanostructures. Our study shows an exponential coupling behavior when two gold nanorods are assembled in end-to-end and side-by-side dimer configurations with a small separation distance. The maximum electric field in the gaps between adjacent nanorods in end-to-end dimer configuration describes a more significant enhancement factor than the individual gold nanorod. Our FDTD simulation on dimer in end-to-end assembly for small separation distance up to ~ 40 nm can well explain the observed experimental growth dynamics of gold nanorods.
△ Less
Submitted 14 August, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Understanding blue shift of the longitudinal surface plasmon resonance during growth of gold nanorods
Authors:
Aditya K Sahu,
Anwesh Das,
Anirudha Ghosh,
Satyabrata Raj
Abstract:
We have investigated in detail the growth dynamics of gold nanorods with various aspect ratios in different surrounding environments. Surprisingly, a blue shift in the temporal evolution of colloidal gold nanorods in aqueous medium has been observed during the growth of nanorods by UV visible absorption spectroscopy. The longitudinal surface plasmon resonance peak evolves as soon as the nanorods s…
▽ More
We have investigated in detail the growth dynamics of gold nanorods with various aspect ratios in different surrounding environments. Surprisingly, a blue shift in the temporal evolution of colloidal gold nanorods in aqueous medium has been observed during the growth of nanorods by UV visible absorption spectroscopy. The longitudinal surface plasmon resonance peak evolves as soon as the nanorods start to grow from spheres, and the system undergoes a blue shift in the absorption spectra. Although a red-shift is expected as a natural phenomenon during the growth process of all nanosystems, our blue shift observation is regarded as a consequence of competition between the parameters of growth solution and actual growth of nanorods. The growth of nanorods contributes to the red-shift which is hidden under the dominating contribution of the growth solution responsible for the observed massive blue shift.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
You Only Query Once: Effective Black Box Adversarial Attacks with Minimal Repeated Queries
Authors:
Devin Willmott,
Anit Kumar Sahu,
Fatemeh Sheikholeslami,
Filipe Condessa,
Zico Kolter
Abstract:
Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thous…
▽ More
Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thousands of times or more, making it easy for defenders to detect an ensuing attack. In this work, we instead show that it is possible to craft (universal) adversarial perturbations in the black-box setting by querying a sequence of different images only once. This attack prevents detection from high number of similar queries and produces a perturbation that causes misclassification when applied to any input to the classifier. In experiments, we show that attacks that adhere to this restriction can produce untargeted adversarial perturbations that fool the vast majority of MNIST and CIFAR-10 classifier inputs, as well as in excess of $60-70\%$ of inputs on ImageNet classifiers. In the targeted setting, we exhibit targeted black-box universal attacks on ImageNet classifiers with success rates above $20\%$ when only allowed one query per image, and $66\%$ when allowed two queries per image.
△ Less
Submitted 29 January, 2021;
originally announced February 2021.
-
Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks
Authors:
Anit Kumar Sahu,
Satya Narayan Shukla,
J. Zico Kolter
Abstract:
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. Although this setting has been investigated in previous work, most past approaches using zeroth order optimization implicitly assume that the gradients of the loss function with respect to the input images are \emph{unstruc…
▽ More
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. Although this setting has been investigated in previous work, most past approaches using zeroth order optimization implicitly assume that the gradients of the loss function with respect to the input images are \emph{unstructured}. In this work, we show that in fact substantial correlations exist within these gradients, and we propose to capture these correlations via a Gaussian Markov random field (GMRF). Given the intractability of the explicit covariance structure of the MRF, we show that the covariance structure can be efficiently represented using the Fast Fourier Transform (FFT), along with low-rank updates to perform exact posterior estimation under this model. We use this modeling technique to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method~(FGSM), and show that the method uses fewer queries and achieves higher attack success rates than the current state of the art. We also highlight the general applicability of this gradient modeling setup.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Simple and Efficient Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes
Authors:
Satya Narayan Shukla,
Anit Kumar Sahu,
Devin Willmott,
J. Zico Kolter
Abstract:
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avo…
▽ More
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace. We demonstrate the efficacy of our proposed attack method by evaluating both $\ell_\infty$ and $\ell_2$ norm constrained untargeted and targeted hard label black-box attacks on three standard datasets - MNIST, CIFAR-10 and ImageNet. Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black-box adversarial attacks.
△ Less
Submitted 11 June, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Data-driven Thermal Model Inference with ARMAX, in Smart Environments, based on Normalized Mutual Information
Authors:
Zhanhong Jiang,
Jonathan Francis,
Anit Kumar Sahu,
Sirajum Munir,
Charles Shelton,
Anthony Rowe,
Mario Bergés
Abstract:
Understanding the models that characterize the thermal dynamics in a smart building is important for the comfort of its occupants and for its energy optimization. A significant amount of research has attempted to utilize thermodynamics (physical) models for smart building control, but these approaches remain challenging due to the stochastic nature of the intermittent environmental disturbances. T…
▽ More
Understanding the models that characterize the thermal dynamics in a smart building is important for the comfort of its occupants and for its energy optimization. A significant amount of research has attempted to utilize thermodynamics (physical) models for smart building control, but these approaches remain challenging due to the stochastic nature of the intermittent environmental disturbances. This paper presents a novel data-driven approach for indoor thermal model inference, which combines an Autoregressive Moving Average with eXogenous inputs model (ARMAX) with a Normalized Mutual Information scheme (NMI). Based on this information-theoretic method, NMI, causal dependencies between the indoor temperature and exogenous inputs are explicitly obtained as a guideline for the ARMAX model to find the dominating inputs. For validation, we use three datasets based on building energy systems-against which we compare our method to an autoregressive model with exogenous inputs (ARX), a regularized ARMAX model, and state-space models.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
FedDANE: A Federated Newton-Type Method
Authors:
Tian Li,
Anit Kumar Sahu,
Manzil Zaheer,
Maziar Sanjabi,
Ameet Talwalkar,
Virginia Smith
Abstract:
Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions.…
▽ More
Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Black-box Adversarial Attacks with Bayesian Optimization
Authors:
Satya Narayan Shukla,
Anit Kumar Sahu,
Devin Willmott,
J. Zico Kolter
Abstract:
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high…
▽ More
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to $80\%$ with respect to the state of the art methods.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
Noisy Batch Active Learning with Deterministic Annealing
Authors:
Gaurav Gupta,
Anit Kumar Sahu,
Wan-Yi Lin
Abstract:
We study the problem of training machine learning models incrementally with batches of samples annotated with noisy oracles. We select each batch of samples that are important and also diverse via clustering and importance sampling. More importantly, we incorporate model uncertainty into the sampling probability to compensate for poor estimation of the importance scores when the training data is t…
▽ More
We study the problem of training machine learning models incrementally with batches of samples annotated with noisy oracles. We select each batch of samples that are important and also diverse via clustering and importance sampling. More importantly, we incorporate model uncertainty into the sampling probability to compensate for poor estimation of the importance scores when the training data is too small to build a meaningful model. Experiments on benchmark image classification datasets (MNIST, SVHN, CIFAR10, and EMNIST) show improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements.
△ Less
Submitted 28 October, 2020; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Federated Learning: Challenges, Methods, and Future Directions
Authors:
Tian Li,
Anit Kumar Sahu,
Ameet Talwalkar,
Virginia Smith
Abstract:
Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving da…
▽ More
Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling
Authors:
Jianyu Wang,
Anit Kumar Sahu,
Zhouyi Yang,
Gauri Joshi,
Soummya Kar
Abstract:
This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that ca…
▽ More
This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that can achieve a win-win in this error-runtime trade-off for any arbitrary network topology. The main idea of MATCHA is to parallelize inter-node communication by decomposing the topology into matchings. To preserve fast error convergence speed, it identifies and communicates more frequently over critical links, and saves communication time by using other links less frequently. Experiments on a suite of datasets and deep neural networks validate the theoretical analyses and demonstrate that MATCHA takes up to $5\times$ less time than vanilla decentralized SGD to reach the same training loss.
△ Less
Submitted 18 November, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Distributed stochastic optimization with gradient tracking over strongly-connected networks
Authors:
Ran Xin,
Anit Kumar Sahu,
Usman A. Khan,
Soummya Kar
Abstract:
In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxilia…
▽ More
In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxiliary variable to asymptotically track the gradient of the global cost in expectation. The $\mathcal{S}$-$\mathcal{AB}$ algorithm employs row- and column-stochastic weights simultaneously to ensure both consensus and optimality. Since doubly-stochastic weights are not used, $\mathcal{S}$-$\mathcal{AB}$ is applicable to arbitrary strongly-connected graphs. We show that under a sufficiently small constant step-size, $\mathcal{S}$-$\mathcal{AB}$ converges linearly (in expected mean-square sense) to a neighborhood of the global minimizer. We present numerical simulations based on real-world data sets to illustrate the theoretical results.
△ Less
Submitted 9 April, 2019; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Federated Optimization in Heterogeneous Networks
Authors:
Tian Li,
Anit Kumar Sahu,
Manzil Zaheer,
Maziar Sanjabi,
Ameet Talwalkar,
Virginia Smith
Abstract:
Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedPr…
▽ More
Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedProx, to tackle heterogeneity in federated networks. FedProx can be viewed as a generalization and re-parametrization of FedAvg, the current state-of-the-art method for federated learning. While this re-parameterization makes only minor modifications to the method itself, these modifications have important ramifications both in theory and in practice. Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity). Practically, we demonstrate that FedProx allows for more robust convergence than FedAvg across a suite of realistic federated datasets. In particular, in highly heterogeneous settings, FedProx demonstrates significantly more stable and accurate convergence behavior relative to FedAvg---improving absolute test accuracy by 22% on average.
△ Less
Submitted 21 April, 2020; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Managing App Install Ad Campaigns in RTB: A Q-Learning Approach
Authors:
Anit Kumar Sahu,
Shaunak Mishra,
Narayan Bhamidipati
Abstract:
Real time bidding (RTB) enables demand side platforms (bidders) to scale ad campaigns across multiple publishers affiliated to an RTB ad exchange. While driving multiple campaigns for mobile app install ads via RTB, the bidder typically has to: (i) maintain each campaign's efficiency (i.e., meet advertiser's target cost-per-install), (ii) be sensitive to advertiser's budget, and (iii) make profit…
▽ More
Real time bidding (RTB) enables demand side platforms (bidders) to scale ad campaigns across multiple publishers affiliated to an RTB ad exchange. While driving multiple campaigns for mobile app install ads via RTB, the bidder typically has to: (i) maintain each campaign's efficiency (i.e., meet advertiser's target cost-per-install), (ii) be sensitive to advertiser's budget, and (iii) make profit after payouts to the ad exchange. In this process, there is a sense of delayed rewards for the bidder's actions; the exchange charges the bidder right after the ad is shown, but the bidder gets to know about resultant installs after considerable delay. This makes it challenging for the bidder to decide beforehand the bid (and corresponding cost charged to advertiser) for each ad display opportunity. To jointly handle the objectives mentioned above, we propose a state space based policy which decides the exchange bid and advertiser cost for each opportunity. The state space captures the current efficiency, budget utilization and profit. The policy based on this state space is trained on past decisions and outcomes via a novel Q-learning algorithm which accounts for the delay in install notifications. In our experiments based on data from app install campaigns managed by Yahoo's Gemini advertising platform, the Q-learning based policy led to a significant increase in the profit and number of efficient campaigns.
△ Less
Submitted 11 November, 2018;
originally announced November 2018.
-
Towards Gradient Free and Projection Free Stochastic Optimization
Authors:
Anit Kumar Sahu,
Manzil Zaheer,
Soummya Kar
Abstract:
This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate…
▽ More
This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate $O\left(1/T^{1/3}\right)$, where $T$ denotes the iteration count. In particular, the primal sub-optimality gap is shown to have a dimension dependence of $O\left(d^{1/3}\right)$, which is the best known dimension dependence among all zeroth order optimization algorithms with one directional derivative per iteration. For non-convex functions, we obtain the \emph{Frank-Wolfe} gap to be $O\left(d^{1/3}T^{-1/4}\right)$. Experiments on black-box optimization setups demonstrate the efficacy of the proposed algorithm.
△ Less
Submitted 18 February, 2019; v1 submitted 7 October, 2018;
originally announced October 2018.
-
Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates
Authors:
Anit Kumar Sahu,
Dusan Jakovetic,
Dragana Bajovic,
Soummya Kar
Abstract:
We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop…
▽ More
We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop novel distributed stochastic optimization methods for zeroth and first order strongly convex optimization by utilizing a probabilistic inter-agent communication protocol that increasingly sparsifies communications among agents as time progresses. Under standard assumptions on the cost functions and the noise statistics, we establish with the proposed method the $O(1/(C_{\mathrm{comm}})^{4/3-ζ})$ and $O(1/(C_{\mathrm{comm}})^{8/9-ζ})$ mean square error convergence rates, for the first and zeroth order optimization, respectively, where $C_{\mathrm{comm}}$ is the expected number of network communications and $ζ>0$ is arbitrarily small. The methods are shown to achieve order-optimal convergence rates in terms of computational cost~$C_{\mathrm{comp}}$, $O(1/C_{\mathrm{comp}})$ (first order optimization) and $O(1/(C_{\mathrm{comp}})^{2/3})$ (zeroth order optimization), while achieving the order-optimal convergence rates in terms of iterations. Experiments on real-life datasets illustrate the efficacy of the proposed algorithms.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Distributed Zeroth Order Optimization Over Random Networks: A Kiefer-Wolfowitz Stochastic Approximation Approach
Authors:
Anit Kumar Sahu,
Dusan Jakovetic,
Dragana Bajovic,
Soummya Kar
Abstract:
We study a standard distributed optimization framework where $N$ networked nodes collaboratively minimize the sum of their local convex costs. The main body of existing work considers the described problem when the underling network is either static or deterministically varying, and the distributed optimization algorithm is of first or second order, i.e., it involves the local costs' gradients and…
▽ More
We study a standard distributed optimization framework where $N$ networked nodes collaboratively minimize the sum of their local convex costs. The main body of existing work considers the described problem when the underling network is either static or deterministically varying, and the distributed optimization algorithm is of first or second order, i.e., it involves the local costs' gradients and possibly the local Hessians. In this paper, we consider the currently understudied but highly relevant scenarios when: 1) only noisy function values' estimates are available (no gradients nor Hessians can be evaluated); and 2) the underlying network is randomly varying (according to an independent, identically distributed process). For the described random networks-zeroth order optimization setting, we develop a distributed stochastic approximation method of the Kiefer-Wolfowitz type. Furthermore, under standard smoothness and strong convexity assumptions on the local costs, we establish the $O(1/k^{1/2})$ mean square convergence rate for the method -- the rate that matches that of the method's centralized counterpart under equivalent conditions.
△ Less
Submitted 21 March, 2018;
originally announced March 2018.
-
Convergence rates for distributed stochastic optimization over random networks
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Anit Kumar Sahu,
Soummya Kar
Abstract:
We establish the O($\frac{1}{k}$) convergence rate for distributed stochastic gradient methods that operate over strongly convex costs and random networks. The considered class of methods is standard each node performs a weighted average of its own and its neighbors solution estimates (consensus), and takes a negative step with respect to a noisy version of its local functions gradient (innovation…
▽ More
We establish the O($\frac{1}{k}$) convergence rate for distributed stochastic gradient methods that operate over strongly convex costs and random networks. The considered class of methods is standard each node performs a weighted average of its own and its neighbors solution estimates (consensus), and takes a negative step with respect to a noisy version of its local functions gradient (innovation). The underlying communication network is modeled through a sequence of temporally independent identically distributed (i.i.d.) Laplacian matrices connected on average, while the local gradient noises are also i.i.d. in time, have finite second moment, and possibly unbounded support. We show that, after a careful setting of the consensus and innovations potentials (weights), the distributed stochastic gradient method achieves a (order-optimal) O($\frac{1}{k}$) convergence rate in the mean square distance from the solution. This is the first order-optimal convergence rate result on distributed strongly convex stochastic optimization when the network is random and/or the gradient noises have unbounded support. Simulation examples confirm the theoretical findings.
△ Less
Submitted 21 March, 2018;
originally announced March 2018.
-
$\mathcal{CIRFE}$: A Distributed Random Fields Estimator
Authors:
Anit Kumar Sahu,
Dusan Jakovetic,
Soummya Kar
Abstract:
This paper presents a communication efficient distributed algorithm, $\mathcal{CIRFE}$ of the \emph{consensus}+\emph{innovations} type, to estimate a high-dimensional parameter in a multi-agent network, in which each agent is interested in reconstructing only a few components of the parameter. This problem arises for example when monitoring the high-dimensional distributed state of a large-scale i…
▽ More
This paper presents a communication efficient distributed algorithm, $\mathcal{CIRFE}$ of the \emph{consensus}+\emph{innovations} type, to estimate a high-dimensional parameter in a multi-agent network, in which each agent is interested in reconstructing only a few components of the parameter. This problem arises for example when monitoring the high-dimensional distributed state of a large-scale infrastructure with a network of limited capability sensors and where each sensor is tasked with estimating some local components of the state. At each observation sampling epoch, each agent updates its local estimate of the parameter components in its interest set by simultaneously processing the latest locally sensed information~(\emph{innovations}) and the parameter estimates from agents~(\emph{consensus}) in its communication neighborhood given by a time-varying possibly sparse graph. Under minimal conditions on the inter-agent communication network and the sensing models, almost sure convergence of the estimate sequence at each agent to the components of the true parameter in its interest set is established. Furthermore, the paper establishes the performance of $\mathcal{CIRFE}$ in terms of asymptotic covariance of the estimate sequences and specifically characterizes the dependencies of the component wise asymptotic covariance in terms of the number of agents tasked with estimating it. Finally, simulation experiments demonstrate the efficacy of $\mathcal{CIRFE}$.
△ Less
Submitted 11 June, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Communication Optimality Trade-offs For Distributed Estimation
Authors:
Anit Kumar Sahu,
Dusan Jakovetic,
Soummya Kar
Abstract:
This paper proposes $\mathbf{C}$ommunication efficient $\mathbf{RE}$cursive $\mathbf{D}$istributed estimati$\mathbf{O}$n algorithm, $\mathcal{CREDO}$, for networked multi-worker setups without a central master node. $\mathcal{CREDO}$ is designed for scenarios in which the worker nodes aim to collaboratively estimate a vector parameter of interest using distributed online time-series data at the in…
▽ More
This paper proposes $\mathbf{C}$ommunication efficient $\mathbf{RE}$cursive $\mathbf{D}$istributed estimati$\mathbf{O}$n algorithm, $\mathcal{CREDO}$, for networked multi-worker setups without a central master node. $\mathcal{CREDO}$ is designed for scenarios in which the worker nodes aim to collaboratively estimate a vector parameter of interest using distributed online time-series data at the individual worker nodes. The individual worker nodes iteratively update their estimate of the parameter by assimilating latest locally sensed information and estimates from neighboring worker nodes exchanged over a (possibly sparse) time-varying communication graph. The underlying inter-worker communication protocol is adaptive, making communications increasingly (probabilistically) sparse as time progresses. Under minimal conditions on the inter-worker information exchange network and the sensing models, almost sure convergence of the estimate sequences at the worker nodes to the true parameter is established. Further, the paper characterizes the performance of $\mathcal{CREDO}$ in terms of asymptotic covariance of the estimate sequences and specifically establishes the achievability of optimal asymptotic covariance. The analysis reveals an interesting interplay between the algorithm's communication cost~$\mathcal{C}_{t}$ (over $t$ time-steps) and the asymptotic covariance. Most notably, it is shown that $\mathcal{CREDO}$ may be designed to achieve a $Θ\left(\mathcal{C}_{t}^{-2+ζ}\right)$ decay of the mean square error~($ζ>0$, arbitrarily small) at each worker node, which significantly improves over the existing $ Θ\left(\mathcal{C}_{t}^{-1}\right)$ rates. Simulation examples on both synthetic and real data sets demonstrate $\mathcal{CREDO}$'s communication efficiency.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
Automatic Pill Reminder for Easy Supervision
Authors:
A. Jabeena,
Animesh Kumar Sahu,
Rohit Roy,
N. Sardar Basha
Abstract:
In this paper we present a working model of an automatic pill reminder and dispenser setup that can alleviate irregularities in taking prescribed dosage of medicines at the right time dictated by the medical practitioner and switch from approaches predominantly dependent on human memory to automation with negligible supervision, thus relieving persons from error-prone tasks of giving wrong medicin…
▽ More
In this paper we present a working model of an automatic pill reminder and dispenser setup that can alleviate irregularities in taking prescribed dosage of medicines at the right time dictated by the medical practitioner and switch from approaches predominantly dependent on human memory to automation with negligible supervision, thus relieving persons from error-prone tasks of giving wrong medicine at the wrong time in the wrong amount.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
Distributed Constrained Recursive Nonlinear Least-Squares Estimation: Algorithms and Asymptotics
Authors:
Anit Kumar Sahu,
Soummya Kar,
Jose' M. F. Moura,
H. Vincent Poor
Abstract:
This paper focuses on the problem of recursive nonlinear least squares parameter estimation in multi-agent networks, in which the individual agents observe sequentially over time an independent and identically distributed (i.i.d.) time-series consisting of a nonlinear function of the true but unknown parameter corrupted by noise. A distributed recursive estimator of the \emph{consensus} + \emph{in…
▽ More
This paper focuses on the problem of recursive nonlinear least squares parameter estimation in multi-agent networks, in which the individual agents observe sequentially over time an independent and identically distributed (i.i.d.) time-series consisting of a nonlinear function of the true but unknown parameter corrupted by noise. A distributed recursive estimator of the \emph{consensus} + \emph{innovations} type, namely $\mathcal{CIWNLS}$, is proposed, in which the agents update their parameter estimates at each observation sampling epoch in a collaborative way by simultaneously processing the latest locally sensed information~(\emph{innovations}) and the parameter estimates from other agents~(\emph{consensus}) in the local neighborhood conforming to a pre-specified inter-agent communication topology. Under rather weak conditions on the connectivity of the inter-agent communication and a \emph{global observability} criterion, it is shown that at every network agent, the proposed algorithm leads to consistent parameter estimates. Furthermore, under standard smoothness assumptions on the local observation functions, the distributed estimator is shown to yield order-optimal convergence rates, i.e., as far as the order of pathwise convergence is concerned, the local parameter estimates at each agent are as good as the optimal centralized nonlinear least squares estimator which would require access to all the observations across all the agents at all times. In order to benchmark the performance of the proposed distributed $\mathcal{CIWNLS}$ estimator with that of the centralized nonlinear least squares estimator, the asymptotic normality of the estimate sequence is established and the asymptotic covariance of the distributed estimator is evaluated. Finally, simulation results are presented which illustrate and verify the analytical findings.
△ Less
Submitted 19 October, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
Recursive Distributed Detection for Composite Hypothesis Testing: Nonlinear Observation Models in Additive Gaussian Noise
Authors:
Anit Kumar Sahu,
Soummya Kar
Abstract:
This paper studies recursive composite hypothesis testing in a network of sparsely connected agents. The network objective is to test a simple null hypothesis against a composite alternative concerning the state of the field, modeled as a vector of (continuous) unknown parameters determining the parametric family of probability measures induced on the agents' observation spaces under the hypothese…
▽ More
This paper studies recursive composite hypothesis testing in a network of sparsely connected agents. The network objective is to test a simple null hypothesis against a composite alternative concerning the state of the field, modeled as a vector of (continuous) unknown parameters determining the parametric family of probability measures induced on the agents' observation spaces under the hypotheses. Specifically, under the alternative hypothesis, each agent sequentially observes an independent and identically distributed time-series consisting of a (nonlinear) function of the true but unknown parameter corrupted by Gaussian noise, whereas, under the null, they obtain noise only. Two distributed recursive generalized likelihood ratio test type algorithms of the \emph{consensus+innovations} form are proposed, namely $\mathcal{CIGLRT-L}$ and $\mathcal{CIGLRT-NL}$, in which the agents estimate the underlying parameter and in parallel also update their test decision statistics by simultaneously processing the latest local sensed information and information obtained from neighboring agents. For $\mathcal{CIGLRT-NL}$, for a broad class of nonlinear observation models and under a global observability condition, algorithm parameters which ensure asymptotically decaying probabilities of errors~(probability of miss and probability of false detection) are characterized. For $\mathcal{CIGLRT-L}$, a linear observation model is considered and upper bounds on large deviations decay exponent for the error probabilities are obtained.
△ Less
Submitted 21 February, 2017; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Distributed Sequential Detection for Gaussian Shift-in-Mean Hypothesis Testing
Authors:
Anit Kumar Sahu,
Soummya Kar
Abstract:
This paper studies the problem of sequential Gaussian shift-in-mean hypothesis testing in a distributed multi-agent network. A sequential probability ratio test (SPRT) type algorithm in a distributed framework of the \emph{consensus}+\emph{innovations} form is proposed, in which the agents update their decision statistics by simultaneously processing latest observations (innovations) sensed sequen…
▽ More
This paper studies the problem of sequential Gaussian shift-in-mean hypothesis testing in a distributed multi-agent network. A sequential probability ratio test (SPRT) type algorithm in a distributed framework of the \emph{consensus}+\emph{innovations} form is proposed, in which the agents update their decision statistics by simultaneously processing latest observations (innovations) sensed sequentially over time and information obtained from neighboring agents (consensus). For each pre-specified set of type I and type II error probabilities, local decision parameters are derived which ensure that the algorithm achieves the desired error performance and terminates in finite time almost surely (a.s.) at each network agent. Large deviation exponents for the tail probabilities of the agent stopping time distributions are obtained and it is shown that asymptotically (in the number of agents or in the high signal-to-noise-ratio regime) these exponents associated with the distributed algorithm approach that of the optimal centralized detector. The expected stopping time for the proposed algorithm at each network agent is evaluated and is benchmarked with respect to the optimal centralized algorithm. The efficiency of the proposed algorithm in the sense of the expected stopping times is characterized in terms of network connectivity. Finally, simulation studies are presented which illustrate and verify the analytical findings.
△ Less
Submitted 31 August, 2015; v1 submitted 27 November, 2014;
originally announced November 2014.
-
Fast and Accurate Frequency Estimation Using Sliding DFT
Authors:
Anit Kumar Sahu,
Mrityunjoy Chakraborty
Abstract:
Frequency Estimation of a complex exponential is a problem relevant to a large number of fields. In this paper a computationally efficient and accurate frequency estimator is presented using the guaranteed stable Sliding DFT which gives stability as well as computational efficiency. The estimator approaches Jacobsen's estimator and Candan's estimator for large N with an extra correction term multi…
▽ More
Frequency Estimation of a complex exponential is a problem relevant to a large number of fields. In this paper a computationally efficient and accurate frequency estimator is presented using the guaranteed stable Sliding DFT which gives stability as well as computational efficiency. The estimator approaches Jacobsen's estimator and Candan's estimator for large N with an extra correction term multiplied to it for the stabilization of the sliding DFT. Simulation results show that the performance of the proposed estimator were found to be better than Jacobsen's estimator and Candan's estimator.
△ Less
Submitted 20 February, 2012;
originally announced February 2012.