-
A Pseudo-random Number Generator for Multi-Sequence Generation with Programmable Statistics
Authors:
Jianan Wu,
Ahmet Yusuf Salim,
Eslam Elmitwalli,
Selçuk Köse,
Zeljko Ignjatovic
Abstract:
Pseudo-random number generators (PRNGs) are essential in a wide range of applications, from cryptography to statistical simulations and optimization algorithms. While uniform randomness is crucial for security-critical areas like cryptography, many domains, such as simulated annealing and CMOS-based Ising Machines, benefit from controlled or non-uniform randomness to enhance solution exploration a…
▽ More
Pseudo-random number generators (PRNGs) are essential in a wide range of applications, from cryptography to statistical simulations and optimization algorithms. While uniform randomness is crucial for security-critical areas like cryptography, many domains, such as simulated annealing and CMOS-based Ising Machines, benefit from controlled or non-uniform randomness to enhance solution exploration and optimize performance. This paper presents a hardware PRNG that can simultaneously generate multiple uncorrelated sequences with programmable statistics tailored to specific application needs. Designed in 65nm process, the PRNG occupies an area of approximately 0.0013mm^2 and has an energy consumption of 0.57pJ/bit. Simulations confirm the PRNG's effectiveness in modulating the statistical distribution while demonstrating high-quality randomness properties.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Phi-4 Technical Report
Authors:
Marah Abdin,
Jyoti Aneja,
Harkirat Behl,
Sébastien Bubeck,
Ronen Eldan,
Suriya Gunasekar,
Michael Harrison,
Russell J. Hewett,
Mojan Javaheripi,
Piero Kauffmann,
James R. Lee,
Yin Tat Lee,
Yuanzhi Li,
Weishung Liu,
Caio C. T. Mendes,
Anh Nguyen,
Eric Price,
Gustavo de Rosa,
Olli Saarikivi,
Adil Salim,
Shital Shah,
Xin Wang,
Rachel Ward,
Yue Wu,
Dingli Yu
, et al. (2 additional authors not shown)
Abstract:
We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabil…
▽ More
We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
SKI-SAT: A CMOS-compatible Hardware for Solving SAT Problems
Authors:
Ahmet Yusuf Salim,
Bart Selman,
Henry Kautz,
Zeljko Ignjatovic,
Selçuk Köse
Abstract:
Nature-inspired computation is receiving increasing attention. Various Ising machine implementations have recently been proven to be effective in solving numerous combinatorial optimization problems including maximum cut, low density parity check (LDPC) decoding, and Boolean satisfiability (SAT) problems. In this paper, a novel method is presented to solve SAT or MAX-SAT problems with a CMOS circu…
▽ More
Nature-inspired computation is receiving increasing attention. Various Ising machine implementations have recently been proven to be effective in solving numerous combinatorial optimization problems including maximum cut, low density parity check (LDPC) decoding, and Boolean satisfiability (SAT) problems. In this paper, a novel method is presented to solve SAT or MAX-SAT problems with a CMOS circuit implementation. The technique solves a SAT problem by mapping the SAT variables onto quantized capacitor voltages generated by an array of nodes that interact through a network of coupling units. The nodal interaction is achieved through coupling currents produced by the coupling units, which charge or discharge capacitor voltages, implementing a gradient descent along the SAT problem's cost function to minimize the number of unsatisfied clauses. The system also incorporates a unique low-complexity perturbation scheme to avoid settling in local minima, greatly enhancing the performance of the system. The simulation results demonstrate that the proposed SKI-SAT is a high-performance and low-energy alternative that surpasses existing solvers by significant margins, achieving more than 10 times faster solution and 300 times less power.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Long-time asymptotics of noisy SVGD outside the population limit
Authors:
Victor Priser,
Pascal Bianchi,
Adil Salim
Abstract:
Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., a…
▽ More
Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.
△ Less
Submitted 21 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Jyoti Aneja,
Hany Awadalla,
Ahmed Awadallah,
Ammar Ahmad Awan,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Martin Cai,
Qin Cai,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Weizhu Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Hao Cheng,
Parul Chopra,
Xiyang Dai
, et al. (104 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.
△ Less
Submitted 30 August, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
A PSO Based Method to Generate Actionable Counterfactuals for High Dimensional Data
Authors:
Shashank Shekhar,
Asif Salim,
Adesh Bansode,
Vivaswan Jinturkar,
Anirudha Nayak
Abstract:
Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based o…
▽ More
Counterfactual explanations (CFE) are methods that explain a machine learning model by giving an alternate class prediction of a data point with some minimal changes in its features. It helps the users to identify their data attributes that caused an undesirable prediction like a loan or credit card rejection. We describe an efficient and an actionable counterfactual (CF) generation method based on particle swarm optimization (PSO). We propose a simple objective function for the optimization of the instance-centric CF generation problem. The PSO brings in a lot of flexibility in terms of carrying out multi-objective optimization in large dimensions, capability for multiple CF generation, and setting box constraints or immutability of data attributes. An algorithm is proposed that incorporates these features and it enables greater control over the proximity and sparsity properties over the generated CFs. The proposed algorithm is evaluated with a set of action-ability metrics in real-world datasets, and the results were superior compared to that of the state-of-the-arts.
△ Less
Submitted 30 November, 2023; v1 submitted 30 September, 2023;
originally announced November 2023.
-
Gaussian random field approximation via Stein's method with applications to wide random neural networks
Authors:
Krishnakumar Balasubramanian,
Larry Goldstein,
Nathan Ross,
Adil Salim
Abstract:
We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructe…
▽ More
We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.
△ Less
Submitted 30 April, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Textbooks Are All You Need
Authors:
Suriya Gunasekar,
Yi Zhang,
Jyoti Aneja,
Caio César Teodoro Mendes,
Allie Del Giorno,
Sivakanth Gopi,
Mojan Javaheripi,
Piero Kauffmann,
Gustavo de Rosa,
Olli Saarikivi,
Adil Salim,
Shital Shah,
Harkirat Singh Behl,
Xin Wang,
Sébastien Bubeck,
Ronen Eldan,
Adam Tauman Kalai,
Yin Tat Lee,
Yuanzhi Li
Abstract:
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu…
▽ More
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.
△ Less
Submitted 2 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
The probability flow ODE is provably fast
Authors:
Sitan Chen,
Sinho Chewi,
Holden Lee,
Yuanzhi Li,
Jianfeng Lu,
Adil Salim
Abstract:
We provide the first polynomial-time convergence guarantees for the probability flow ODE implementation (together with a corrector step) of score-based generative modeling. Our analysis is carried out in the wake of recent results obtaining such guarantees for the SDE-based implementation (i.e., denoising diffusion probabilistic modeling or DDPM), but requires the development of novel techniques f…
▽ More
We provide the first polynomial-time convergence guarantees for the probability flow ODE implementation (together with a corrector step) of score-based generative modeling. Our analysis is carried out in the wake of recent results obtaining such guarantees for the SDE-based implementation (i.e., denoising diffusion probabilistic modeling or DDPM), but requires the development of novel techniques for studying deterministic dynamics without contractivity. Through the use of a specially chosen corrector step based on the underdamped Langevin diffusion, we obtain better dimension dependence than prior works on DDPM ($O(\sqrt{d})$ vs. $O(d)$, assuming smoothness of the data distribution), highlighting potential advantages of the ODE framework.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space
Authors:
Michael Diao,
Krishnakumar Balasubramanian,
Sinho Chewi,
Adil Salim
Abstract:
Variational inference (VI) seeks to approximate a target distribution $π$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $π$ by minimizing the Kullback-Leibler (KL) divergence to $π$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-G…
▽ More
Variational inference (VI) seeks to approximate a target distribution $π$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $π$ by minimizing the Kullback-Leibler (KL) divergence to $π$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when $π$ is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when $π$ is only log-smooth.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Understanding how the use of AI decision support tools affect critical thinking and over-reliance on technology by drug dispensers in Tanzania
Authors:
Ally Salim Jr,
Megan Allen,
Kelvin Mariki,
Kevin James Masoy,
Jafary Liana
Abstract:
The use of AI in healthcare is designed to improve care delivery and augment the decisions of providers to enhance patient outcomes. When deployed in clinical settings, the interaction between providers and AI is a critical component for measuring and understanding the effectiveness of these digital tools on broader health outcomes. Even in cases where AI algorithms have high diagnostic accuracy,…
▽ More
The use of AI in healthcare is designed to improve care delivery and augment the decisions of providers to enhance patient outcomes. When deployed in clinical settings, the interaction between providers and AI is a critical component for measuring and understanding the effectiveness of these digital tools on broader health outcomes. Even in cases where AI algorithms have high diagnostic accuracy, healthcare providers often still rely on their experience and sometimes gut feeling to make a final decision. Other times, providers rely unquestioningly on the outputs of the AI models, which leads to a concern about over-reliance on the technology. The purpose of this research was to understand how reliant drug shop dispensers were on AI-powered technologies when determining a differential diagnosis for a presented clinical case vignette. We explored how the drug dispensers responded to technology that is framed as always correct in an attempt to measure whether they begin to rely on it without any critical thought of their own. We found that dispensers relied on the decision made by the AI 25 percent of the time, even when the AI provided no explanation for its decision.
△ Less
Submitted 22 February, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
Authors:
Sitan Chen,
Sinho Chewi,
Jerry Li,
Yuanzhi Li,
Adil Salim,
Anru R. Zhang
Abstract:
We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to…
▽ More
We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to prior works, our results (1) hold for an $L^2$-accurate score estimate (rather than $L^\infty$-accurate); (2) do not require restrictive functional inequality conditions that preclude substantial non-log-concavity; (3) scale polynomially in all relevant problem parameters; and (4) match state-of-the-art complexity guarantees for discretization of the Langevin diffusion, provided that the score error is sufficiently small. We view this as strong theoretical justification for the empirical success of SGMs. We also examine SGMs based on the critically damped Langevin diffusion (CLD). Contrary to conventional wisdom, we provide evidence that the use of the CLD does not reduce the complexity of SGMs.
△ Less
Submitted 15 April, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Federated Learning with a Sampling Algorithm under Isoperimetry
Authors:
Lukang Sun,
Adil Salim,
Peter Richtárik
Abstract:
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data. These techniques critically rely on reducing the communication cost -- the main bottleneck -- between the devices and a central server. Federated learning algorithms usually take an optimization approach: they are algorithms for minim…
▽ More
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices, who own the training data. These techniques critically rely on reducing the communication cost -- the main bottleneck -- between the devices and a central server. Federated learning algorithms usually take an optimization approach: they are algorithms for minimizing the training loss subject to communication (and other) constraints. In this work, we instead take a Bayesian approach for the training task, and propose a communication-efficient variant of the Langevin algorithm to sample a posteriori. The latter approach is more robust and provides more knowledge of the \textit{a posteriori} distribution than its optimization counterpart. We analyze our algorithm without assuming that the target distribution is strongly log-concave. Instead, we assume the weaker log Sobolev inequality, which allows for nonconvexity.
△ Less
Submitted 7 June, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
An Ensemble Model for Face Liveness Detection
Authors:
Shashank Shekhar,
Avinash Patel,
Mrinal Haloi,
Asif Salim
Abstract:
In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by…
▽ More
In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by several means, for example, they can capture a user photo from social media and do an imposter attack using printouts of users faces or using a digital photo from a mobile device and even create a more sophisticated attack like video replay attack. We have tried to understand the different methods of attack and created an in-house large-scale dataset covering all the kinds of attacks to train a robust deep learning model. We propose an ensemble method where multiple features of the face and background regions are learned to predict whether the user is a bonafide or an attacker.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
A Comparative study of Hyper-Parameter Optimization Tools
Authors:
Shashank Shekhar,
Adesh Bansode,
Asif Salim
Abstract:
Most of the machine learning models have associated hyper-parameters along with their parameters. While the algorithm gives the solution for parameters, its utility for model performance is highly dependent on the choice of hyperparameters. For a robust performance of a model, it is necessary to find out the right hyper-parameter combination. Hyper-parameter optimization (HPO) is a systematic proc…
▽ More
Most of the machine learning models have associated hyper-parameters along with their parameters. While the algorithm gives the solution for parameters, its utility for model performance is highly dependent on the choice of hyperparameters. For a robust performance of a model, it is necessary to find out the right hyper-parameter combination. Hyper-parameter optimization (HPO) is a systematic process that helps in finding the right values for them. The conventional methods for this purpose are grid search and random search and both methods create issues in industrial-scale applications. Hence a set of strategies have been recently proposed based on Bayesian optimization and evolutionary algorithm principles that help in runtime issues in a production environment and robust performance. In this paper, we compare the performance of four python libraries, namely Optuna, Hyper-opt, Optunity, and sequential model-based algorithm configuration (SMAC) that has been proposed for hyper-parameter optimization. The performance of these tools is tested using two benchmarks. The first one is to solve a combined algorithm selection and hyper-parameter optimization (CASH) problem The second one is the NeurIPS black-box optimization challenge in which a multilayer perception (MLP) architecture has to be chosen from a set of related architecture constraints and hyper-parameters. The benchmarking is done with six real-world datasets. From the experiments, we found that Optuna has better performance for CASH problem and HyperOpt for MLP problem.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1
Authors:
Adil Salim,
Lukang Sun,
Peter Richtárik
Abstract:
Stein Variational Gradient Descent (SVGD) is an algorithm for sampling from a target density which is known up to a multiplicative constant. Although SVGD is a popular algorithm in practice, its theoretical study is limited to a few recent works. We study the convergence of SVGD in the population limit, (i.e., with an infinite number of particles) to sample from a non-logconcave target distributio…
▽ More
Stein Variational Gradient Descent (SVGD) is an algorithm for sampling from a target density which is known up to a multiplicative constant. Although SVGD is a popular algorithm in practice, its theoretical study is limited to a few recent works. We study the convergence of SVGD in the population limit, (i.e., with an infinite number of particles) to sample from a non-logconcave target distribution satisfying Talagrand's inequality T1. We first establish the convergence of the algorithm. Then, we establish a dimension-dependent complexity bound in terms of the Kernelized Stein Discrepancy (KSD). Unlike existing works, we do not assume that the KSD is bounded along the trajectory of the algorithm. Our approach relies on interpreting SVGD as a gradient descent over a space of probability measures.
△ Less
Submitted 16 June, 2022; v1 submitted 6 June, 2021;
originally announced June 2021.
-
An efficient scheme based on graph centrality to select nodes for training for effective learning
Authors:
CR Sandeep,
Asif Salim,
R Sethunadh,
S Sumitra
Abstract:
The process of selecting points for training a machine learning model is often a challenging task. Many times, we will have a lot of data, but for training, we require the labels and labeling is often costly. So we need to select the points for training in an efficient manner so that the model trained on the points selected will be better than the ones trained on any other training set. We propose…
▽ More
The process of selecting points for training a machine learning model is often a challenging task. Many times, we will have a lot of data, but for training, we require the labels and labeling is often costly. So we need to select the points for training in an efficient manner so that the model trained on the points selected will be better than the ones trained on any other training set. We propose a novel method to select the nodes in graph datasets using the concept of graph centrality. Two methods are proposed - one using a smart selection strategy, where the model is required to be trained only once and another using active learning method. We have tested this idea on three popular graph datasets - Cora, Citeseer and Pubmed- and the results are found to be encouraging.
△ Less
Submitted 19 May, 2021; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Neighborhood Preserving Kernels for Attributed Graphs
Authors:
Asif Salim,
Shiju. S. S,
Sumitra. S
Abstract:
We describe the design of a reproducing kernel suitable for attributed graphs, in which the similarity between the two graphs is defined based on the neighborhood information of the graph nodes with the aid of a product graph formulation. We represent the proposed kernel as the weighted sum of two other kernels of which one is an R-convolution kernel that processes the attribute information of the…
▽ More
We describe the design of a reproducing kernel suitable for attributed graphs, in which the similarity between the two graphs is defined based on the neighborhood information of the graph nodes with the aid of a product graph formulation. We represent the proposed kernel as the weighted sum of two other kernels of which one is an R-convolution kernel that processes the attribute information of the graph and the other is an optimal assignment kernel that processes label information. They are formulated in such a way that the edges processed as part of the kernel computation have the same neighborhood properties and hence the kernel proposed makes a well-defined correspondence between regions processed in graphs. These concepts are also extended to the case of the shortest paths. We identified the state-of-the-art kernels that can be mapped to such a neighborhood preserving framework. We found that the kernel value of the argument graphs in each iteration of the Weisfeiler-Lehman color refinement algorithm can be obtained recursively from the product graph formulated in our method. By incorporating the proposed kernel on support vector machines we analyzed the real-world data sets and it has shown superior performance in comparison with that of the other state-of-the-art graph kernels.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Framework for Designing Filters of Spectral Graph Convolutional Neural Networks in the Context of Regularization Theory
Authors:
Asif Salim,
Sumitra S
Abstract:
Graph convolutional neural networks (GCNNs) have been widely used in graph learning. It has been observed that the smoothness functional on graphs can be defined in terms of the graph Laplacian. This fact points out in the direction of using Laplacian in deriving regularization operators on graphs and its consequent use with spectral GCNN filter designs. In this work, we explore the regularization…
▽ More
Graph convolutional neural networks (GCNNs) have been widely used in graph learning. It has been observed that the smoothness functional on graphs can be defined in terms of the graph Laplacian. This fact points out in the direction of using Laplacian in deriving regularization operators on graphs and its consequent use with spectral GCNN filter designs. In this work, we explore the regularization properties of graph Laplacian and proposed a generalized framework for regularized filter designs in spectral GCNNs. We found that the filters used in many state-of-the-art GCNNs can be derived as a special case of the framework we developed. We designed new filters that are associated with well-defined regularization behavior and tested their performance on semi-supervised node classification tasks. Their performance was found to be superior to that of the other state-of-the-art techniques.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
A Non-Asymptotic Analysis for Stein Variational Gradient Descent
Authors:
Anna Korba,
Adil Salim,
Michael Arbel,
Giulia Luise,
Arthur Gretton
Abstract:
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $π\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $π$, where the gradient is smoothed through a kernel integral operator. In thi…
▽ More
We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $π\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $π$, where the gradient is smoothed through a kernel integral operator. In this paper, we provide a novel finite time analysis for the SVGD algorithm. We provide a descent lemma establishing that the algorithm decreases the objective at each iteration, and rates of convergence for the average Stein Fisher divergence (also referred to as Kernel Stein Discrepancy). We also provide a convergence result of the finite particle system corresponding to the practical implementation of SVGD to its population version.
△ Less
Submitted 3 January, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
Authors:
Adil Salim,
Peter Richtárik
Abstract:
We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, \textit{i.e.}, written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (\t…
▽ More
We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, \textit{i.e.}, written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (\textit{i.e.}, the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is $O(1/\varepsilon^2)$ in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is $O(1/\varepsilon^{12})$ in terms of total variation when the potential is convex.
△ Less
Submitted 22 February, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Adaptive Digital PID Control of a Quadcopter with Unknown Dynamics
Authors:
Ankit Goel,
Abdulazeez Mohammed Salim,
Ahmad Ansari,
Sai Ravela,
Dennis Bernstein
Abstract:
This paper develops an adaptive autopilot for quadcopters with unknown dynamics. To do this, the PX4 autopilot architecture is modified so that the feedback and feedforward controllers are replaced by adaptive control laws based on retrospective cost adaptive control (RCAC). The present paper provides a numerical investigation of the performance of the adaptive autopilot on a quadcopter with unkno…
▽ More
This paper develops an adaptive autopilot for quadcopters with unknown dynamics. To do this, the PX4 autopilot architecture is modified so that the feedback and feedforward controllers are replaced by adaptive control laws based on retrospective cost adaptive control (RCAC). The present paper provides a numerical investigation of the performance of the adaptive autopilot on a quadcopter with unknown dynamics. In order to reflect the absence of prior modeling information, all of the adaptive digital controllers are initialized at zero gains. In addition, moment-of-inertia of the quadcopter is varied to test the robustness of the adaptive autopilot. In all test cases, the vehicle is commanded to follow a given trajectory, and the resulting performance is examined.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms
Authors:
Adil Salim,
Laurent Condat,
Konstantin Mishchenko,
Peter Richtárik
Abstract:
We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY,…
▽ More
We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization.
△ Less
Submitted 26 July, 2022; v1 submitted 3 April, 2020;
originally announced April 2020.
-
Distributed Fixed Point Methods with Compressed Iterates
Authors:
Sélim Chraibi,
Ahmed Khaled,
Dmitry Kovalev,
Peter Richtárik,
Adil Salim,
Martin Takáč
Abstract:
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed. This problem is motivated by the practice of federated learning, where a large model stored in the cloud is compressed before it is sent to a mobile device, which then proceeds with training based on local data. We develop standard and variance reduced methods, and establis…
▽ More
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed. This problem is motivated by the practice of federated learning, where a large model stored in the cloud is compressed before it is sent to a mobile device, which then proceeds with training based on local data. We develop standard and variance reduced methods, and establish communication complexity bounds. Our algorithms are the first distributed methods with compressed iterates, and the first fixed point methods with compressed iterates.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows
Authors:
Michael A. Salim,
Thomas D. Uram,
J. Taylor Childers,
Prasanna Balaprakash,
Venkatram Vishwanath,
Michael E. Papka
Abstract:
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their…
▽ More
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their scheduling lifecycle. The ensembles execute in a pilot "launcher" which (i) ensures concurrent, load-balanced execution of arbitrary serial and parallel programs with heterogeneous processor requirements, (ii) requires no modification of user applications, (iii) is tolerant of task-level faults and provides several options for error recovery, (iv) stores provenance data (e.g task history, error logs) in the database, (v) supports dynamic workflows, in which tasks are created or killed at runtime. Here, we present the design and Python implementation of the Balsam service and launcher. The efficacy of this system is illustrated using two case studies: hyperparameter optimization of deep neural networks, and high-throughput single-point quantum chemistry calculations. We find that the unique combination of flexible job-packing and automated scheduling with dynamic (pilot-managed) execution facilitates excellent resource utilization. The scripting overheads typically needed to manage resources and launch workflows on supercomputers are substantially reduced, accelerating workflow development and execution.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Maximum Mean Discrepancy Gradient Flow
Authors:
Michael Arbel,
Anna Korba,
Adil Salim,
Arthur Gretton
Abstract:
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.
The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to…
▽ More
We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties.
The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks.
We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples.
△ Less
Submitted 3 December, 2019; v1 submitted 10 June, 2019;
originally announced June 2019.
-
Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates
Authors:
Adil Salim,
Dmitry Kovalev,
Peter Richtárik
Abstract:
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a…
▽ More
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a stochastic proximal operator for each of the nonsmooth terms. We establish nonasymptotic sublinear and linear convergence rates under convexity and strong convexity of the smooth term, respectively, expressed in terms of the KL divergence and Wasserstein distance. We illustrate the efficiency of our sampling technique through numerical simulations on a Bayesian learning task.
△ Less
Submitted 16 June, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Synthetic Patient Generation: A Deep Learning Approach Using Variational Autoencoders
Authors:
Ally Salim Jr
Abstract:
Artificial Intelligence in healthcare is a new and exciting frontier and the possibilities are endless. With deep learning approaches beating human performances in many areas, the logical next step is to attempt their application in the health space. For these and other Machine Learning approaches to produce good results and have their potential realized, the need for, and importance of, large amo…
▽ More
Artificial Intelligence in healthcare is a new and exciting frontier and the possibilities are endless. With deep learning approaches beating human performances in many areas, the logical next step is to attempt their application in the health space. For these and other Machine Learning approaches to produce good results and have their potential realized, the need for, and importance of, large amounts of accurate data is second to none. This is a challenge faced by many industries and more so in the healthcare space. We present an approach of using Variational Autoencoders (VAE's) as an approach to generating more data for training deeper networks, as well as uncovering underlying patterns in diagnoses and the patients suffering from them. By training a VAE, on available data, it was able to learn the latent distribution of the patient features given the diagnosis. It is then possible, after training, to sample from the learnt latent distribution to generate new accurate patient records given the patient diagnosis.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Snake: a Stochastic Proximal Gradient Algorithm for Regularized Problems over Large Graphs
Authors:
Adil Salim,
Pascal Bianchi,
Walid Hachem
Abstract:
A regularized optimization problem over a large unstructured graph is studied, where the regularization term is tied to the graph geometry. Typical regularization examples include the total variation and the Laplacian regularizations over the graph. When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backwar…
▽ More
A regularized optimization problem over a large unstructured graph is studied, where the regularization term is tied to the graph geometry. Typical regularization examples include the total variation and the Laplacian regularizations over the graph. When applying the proximal gradient algorithm to solve this problem, there exist quite affordable methods to implement the proximity operator (backward step) in the special case where the graph is a simple path without loops. In this paper, an algorithm, referred to as "Snake", is proposed to solve such regularized problems over general graphs, by taking benefit of these fast methods. The algorithm consists in properly selecting random simple paths in the graph and performing the proximal gradient algorithm over these simple paths. This algorithm is an instance of a new general stochastic proximal gradient algorithm, whose convergence is proven. Applications to trend filtering and graph inpainting are provided among others. Numerical experiments are conducted over large graphs.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Differential Modulation for Asynchronous Two-Way-Relay Systems over Frequency-Selective Fading Channels
Authors:
Ahmad Salim,
Tolga M. Duman
Abstract:
In this paper, we propose two schemes for asynchronous multi-relay two-way relay (MR-TWR) systems in which neither the users nor the relays know the channel state information (CSI). In an MR-TWR system, two users exchange their messages with the help of $N_R$ relays. Most of the existing works on MR-TWR systems based on differential modulation assume perfect symbol-level synchronization between al…
▽ More
In this paper, we propose two schemes for asynchronous multi-relay two-way relay (MR-TWR) systems in which neither the users nor the relays know the channel state information (CSI). In an MR-TWR system, two users exchange their messages with the help of $N_R$ relays. Most of the existing works on MR-TWR systems based on differential modulation assume perfect symbol-level synchronization between all communicating nodes. However, this assumption is not valid in many practical systems, which makes the design of differentially modulated schemes more challenging. Therefore, we design differential modulation schemes that can tolerate timing misalignment under frequency-selective fading. We investigate the performance of the proposed schemes in terms of either probability of bit error or pairwise error probability. Through numerical examples, we show that the proposed schemes outperform existing competing solutions in the literature, especially for high signal-to-noise ratio (SNR) values.
△ Less
Submitted 23 October, 2016;
originally announced October 2016.
-
The Empirical Commit Frequency Distribution of Open Source Projects
Authors:
Carsten Kolassa,
Dirk Riehle,
Michel A. Salim
Abstract:
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. An author's commit frequency describes how often that author commits. Knowing the distribution of all commit frequencies is a fundamental part of understanding software development processes. This paper presents a detailed quantitative analysis of commit fre…
▽ More
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. An author's commit frequency describes how often that author commits. Knowing the distribution of all commit frequencies is a fundamental part of understanding software development processes. This paper presents a detailed quantitative analysis of commit frequencies in open-source software development. The analysis is based on a large sample of open source projects, and presents the overall distribution of commit frequencies. We analyze the data to show the differences between authors and projects by project size; we also includes a comparison of successful and non successful projects and we derive an activity indicator from these analyses. By measuring a fundamental dimension of programming we help improve software development tools and our understanding of software development. We also validate some fundamental assumptions about software development.
△ Less
Submitted 21 August, 2014;
originally announced August 2014.
-
A Model of the Commit Size Distribution of Open Source
Authors:
Carsten Kolassa,
Dirk Riehle,
Michel A. Salim
Abstract:
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the g…
▽ More
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.
△ Less
Submitted 21 August, 2014;
originally announced August 2014.
-
Developer Belief vs. Reality: The Case of the Commit Size Distribution
Authors:
Dirk Riehle,
Carsten Kolassa,
Michel A. Salim
Abstract:
The design of software development tools follows from what the developers of such tools believe is true about software development. A key aspect of such beliefs is the size of code contributions (commits) to a software project. In this paper, we show that what tool developers think is true about the size of code contributions is different by more than an order of magnitude from reality. We present…
▽ More
The design of software development tools follows from what the developers of such tools believe is true about software development. A key aspect of such beliefs is the size of code contributions (commits) to a software project. In this paper, we show that what tool developers think is true about the size of code contributions is different by more than an order of magnitude from reality. We present this reality, called the commit size distribution, for a large sample of open source and selected closed source projects. We suggest that these new empirical insights will help improve software development tools by aligning underlying design assumptions closer with reality.
△ Less
Submitted 20 August, 2014;
originally announced August 2014.
-
Addressing Security Challenges in Cloud Computing
Authors:
Abu Salim,
Rajesh Kumar Tiwari,
Sachin Tripathi
Abstract:
Cloud computing is a new computing paradigm which allows sharing of resources on remote server such as hardware, network, storage using internet and provides the way through which application, computing power, computing infrastructure can be delivered to the user as a service. Cloud computing unique attribute promise cost effective Information Technology Solution (IT Solution) to the user. All com…
▽ More
Cloud computing is a new computing paradigm which allows sharing of resources on remote server such as hardware, network, storage using internet and provides the way through which application, computing power, computing infrastructure can be delivered to the user as a service. Cloud computing unique attribute promise cost effective Information Technology Solution (IT Solution) to the user. All computing needs are provided by the Cloud Service Provider (CSP) and they can be increased or decreased dynamically as required by the user. As data and Application are located at the server and may be beyond geographical boundary, this leads a number of concern from the user prospective. The objective of this paper is to explore the key issues of cloud computing which is delaying its adoption.
△ Less
Submitted 31 July, 2013;
originally announced July 2013.