-
Breaking through the classical Shannon entropy limit: A new frontier through logical semantics
Authors:
Luis A. Lastras,
Barry M. Trager,
Jonathan Lenchner,
Wojciech Szpankowski,
Chai Wah Wu,
Mark S. Squillante,
Alexander Gray
Abstract:
Information theory has provided foundations for the theories of several application areas critical for modern society, including communications, computer storage, and AI. A key aspect of Shannon's 1948 theory is a sharp lower bound on the number of bits needed to encode and communicate a string of symbols. When he introduced the theory, Shannon famously excluded any notion of semantics behind the…
▽ More
Information theory has provided foundations for the theories of several application areas critical for modern society, including communications, computer storage, and AI. A key aspect of Shannon's 1948 theory is a sharp lower bound on the number of bits needed to encode and communicate a string of symbols. When he introduced the theory, Shannon famously excluded any notion of semantics behind the symbols being communicated. This semantics-free notion went on to have massive impact on communication and computing technologies, even as multiple proposals for reintroducing semantics in a theory of information were being made, notably one where Carnap and Bar-Hillel used logic and reasoning to capture semantics. In this paper we present, for the first time, a Shannon-style analysis of a communication system equipped with a deductive reasoning capability, implemented using logical inference. We use some of the most important techniques developed in information theory to demonstrate significant and sometimes surprising gains in communication efficiency availed to us through such capability, demonstrated also through practical codes. We thus argue that proposals for a semantic information theory should include the power of deductive reasoning to magnify the value of transmitted bits as we strive to fully unlock the inherent potential of semantics.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
Authors:
Weiqin Chen,
Mark S. Squillante,
Chai Wah Wu,
Santiago Paternain
Abstract:
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a spe…
▽ More
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.
△ Less
Submitted 27 November, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Multi-Function Multi-Way Analog Technology for Sustainable Machine Intelligence Computation
Authors:
Vassilis Kalantzis,
Mark S. Squillante,
Shashanka Ubaru,
Tayfun Gokmen,
Chai Wah Wu,
Anshul Gupta,
Haim Avron,
Tomasz Nowicki,
Malte Rasch,
Murat Onen,
Vanessa Lopez Marrero,
Effendi Leobandung,
Yasuteru Kohda,
Wilfried Haensch,
Lior Horesh
Abstract:
Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous imp…
▽ More
Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous improvements in computation and energy, but at the expense of inherent unpredictability and noise. We devise novel randomized algorithms tailored to MFMWA architectures that mitigate the detrimental impact of imperfect analog computations while realizing their potential benefits across various areas of AI, such as applications in computer vision. Through analysis, measurements from analog devices, and simulations of larger systems, we demonstrate orders of magnitude reduction in both computation and energy with accuracy similar to digital computers.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Obtaining Explainable Classification Models using Distributionally Robust Optimization
Authors:
Sanjeeb Dash,
Soumyadip Ghosh,
Joao Goncalves,
Mark S. Squillante
Abstract:
Model explainability is crucial for human users to be able to interpret how a proposed classifier assigns labels to data based on its feature values. We study generalized linear models constructed using sets of feature value rules, which can capture nonlinear dependencies and interactions. An inherent trade-off exists between rule set sparsity and its prediction accuracy. It is computationally exp…
▽ More
Model explainability is crucial for human users to be able to interpret how a proposed classifier assigns labels to data based on its feature values. We study generalized linear models constructed using sets of feature value rules, which can capture nonlinear dependencies and interactions. An inherent trade-off exists between rule set sparsity and its prediction accuracy. It is computationally expensive to find the right choice of sparsity -- e.g., via cross-validation -- with existing methods. We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors. Good generalization is ensured while keeping computational costs low by utilizing distributionally robust optimization. The formulation utilizes column generation to efficiently search the space of rule sets and constructs a sparse ensemble of rule sets, in contrast with techniques like random forests or boosting and their variants. We present theoretical results that motivate and justify the use of our distributionally robust formulation. Extensive numerical experiments establish that our method improves over competing methods -- on a large set of publicly available binary classification problem instances -- with respect to one or more of the following metrics: generalization quality, computational cost, and explainability.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Generalization Performance of Transfer Learning: Overparameterized and Underparameterized Regimes
Authors:
Peizhong Ju,
Sen Lin,
Mark S. Squillante,
Yingbin Liang,
Ness B. Shroff
Abstract:
Transfer learning is a useful technique for achieving improved performance and reducing training costs by leveraging the knowledge gained from source tasks and applying it to target tasks. Assessing the effectiveness of transfer learning relies on understanding the similarity between the ground truth of the source and target tasks. In real-world applications, tasks often exhibit partial similarity…
▽ More
Transfer learning is a useful technique for achieving improved performance and reducing training costs by leveraging the knowledge gained from source tasks and applying it to target tasks. Assessing the effectiveness of transfer learning relies on understanding the similarity between the ground truth of the source and target tasks. In real-world applications, tasks often exhibit partial similarity, where certain aspects are similar while others are different or irrelevant. To investigate the impact of partial similarity on transfer learning performance, we focus on a linear regression model with two distinct sets of features: a common part shared across tasks and a task-specific part. Our study explores various types of transfer learning, encompassing two options for parameter transfer. By establishing a theoretical characterization on the error of the learned model, we compare these transfer learning options, particularly examining how generalization performance changes with the number of features/parameters in both underparameterized and overparameterized regimes. Furthermore, we provide practical guidelines for determining the number of features in the common and task-specific parts for improved generalization performance. For example, when the total number of features in the source task's learning model is fixed, we show that it is more advantageous to allocate a greater number of redundant features to the task-specific part rather than the common part. Moreover, in specific scenarios, particularly those characterized by high noise levels and small true parameters, sacrificing certain true features in the common part in favor of employing more redundant features in the task-specific part can yield notable benefits.
△ Less
Submitted 8 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Topological data analysis on noisy quantum computers
Authors:
Ismail Yunus Akhalwaya,
Shashanka Ubaru,
Kenneth L. Clarkson,
Mark S. Squillante,
Vishnu Jejjala,
Yang-Hui He,
Kugendran Naidoo,
Vasileios Kalantzis,
Lior Horesh
Abstract:
Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational probl…
▽ More
Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational problems. Indeed, TDA has been purported to be one such problem, yet, quantum computing algorithms proposed for the problem, such as the original Quantum TDA (QTDA) formulation by Lloyd, Garnerone and Zanardi, require fault-tolerance qualifications that are currently unavailable. In this study, we present NISQ-TDA, a fully implemented end-to-end quantum machine learning algorithm needing only a short circuit-depth, that is applicable to high-dimensional classical data, and with provable asymptotic speedup for certain classes of problems. The algorithm neither suffers from the data-loading problem nor does it need to store the input data on the quantum computer explicitly. The algorithm was successfully executed on quantum computing devices, as well as on noisy quantum simulators, applied to small datasets. Preliminary empirical results suggest that the algorithm is robust to noise.
△ Less
Submitted 19 March, 2024; v1 submitted 19 September, 2022;
originally announced September 2022.
-
A Class of Geometric Structures in Transfer Learning: Minimax Bounds and Optimality
Authors:
Xuhui Zhang,
Jose Blanchet,
Soumyadip Ghosh,
Mark S. Squillante
Abstract:
We study the problem of transfer learning, observing that previous efforts to understand its information-theoretic limits do not fully exploit the geometric structure of the source and target domains. In contrast, our study first illustrates the benefits of incorporating a natural geometric structure within a linear regression model, which corresponds to the generalized eigenvalue problem formed b…
▽ More
We study the problem of transfer learning, observing that previous efforts to understand its information-theoretic limits do not fully exploit the geometric structure of the source and target domains. In contrast, our study first illustrates the benefits of incorporating a natural geometric structure within a linear regression model, which corresponds to the generalized eigenvalue problem formed by the Gram matrices of both domains. We next establish a finite-sample minimax lower bound, propose a refined model interpolation estimator that enjoys a matching upper bound, and then extend our framework to multiple source domains and generalized linear models. Surprisingly, as long as information is available on the distance between the source and target parameters, negative-transfer does not occur. Simulation studies show that our proposed interpolation estimator outperforms state-of-the-art transfer learning methods in both moderate- and high-dimensional settings.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Quantum Topological Data Analysis with Linear Depth and Exponential Speedup
Authors:
Shashanka Ubaru,
Ismail Yunus Akhalwaya,
Mark S. Squillante,
Kenneth L. Clarkson,
Lior Horesh
Abstract:
Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning dequantizati…
▽ More
Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning dequantization results of Tang et al. A third issue, namely the fault-tolerance requirements of most QML algorithms, has further hindered their practical realization. The quantum topological data analysis (QTDA) algorithm of Lloyd, Garnerone and Zanardi was one of the first QML algorithms that convincingly offered an expected exponential speedup. From the outset, it did not suffer from the data-loading problem. A recent result has also shown that the generalized problem solved by this algorithm is likely classically intractable, and would therefore be immune to any dequantization efforts. However, the QTDA algorithm of Lloyd et~al. has a time complexity of $O(n^4/(ε^2 δ))$ (where $n$ is the number of data points, $ε$ is the error tolerance, and $δ$ is the smallest nonzero eigenvalue of the restricted Laplacian) and requires fault-tolerant quantum computing, which has not yet been achieved. In this paper, we completely overhaul the QTDA algorithm to achieve an improved exponential speedup and depth complexity of $O(n\log(1/(δε)))$. Our approach includes three key innovations: (a) an efficient realization of the combinatorial Laplacian as a sum of Pauli operators; (b) a quantum rejection sampling approach to restrict the superposition to the simplices in the complex; and (c) a stochastic rank estimation method to estimate the Betti numbers. We present a theoretical error analysis, and the circuit and computational time and depth complexities for Betti number estimation.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Solving sparse linear systems with approximate inverse preconditioners on analog devices
Authors:
Vasileios Kalantzis,
Anshul Gupta,
Lior Horesh,
Tomasz Nowicki,
Mark S. Squillante,
Chai Wah Wu
Abstract:
Sparse linear system solvers are computationally expensive kernels that lie at the heart of numerous applications. This paper proposes a flexible preconditioning framework to substantially reduce the time and energy requirements of this task by utilizing a hybrid architecture that combines conventional digital microprocessors with analog crossbar array accelerators. Our analysis and experiments wi…
▽ More
Sparse linear system solvers are computationally expensive kernels that lie at the heart of numerous applications. This paper proposes a flexible preconditioning framework to substantially reduce the time and energy requirements of this task by utilizing a hybrid architecture that combines conventional digital microprocessors with analog crossbar array accelerators. Our analysis and experiments with a simulator for analog hardware demonstrate that an order of magnitude speedup is readily attainable without much impact on convergence, despite the noise in analog computations.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies
Authors:
Yingdong Lu,
Mark S. Squillante,
Chai Wah Wu
Abstract:
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control p…
▽ More
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control policy over time. Our general MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish the convergence and optimality-both in general and within various control paradigms (e.g., piecewise linear control policies)-of our control-based methods through this general MDP framework, including convergence of $Q$-learning within the context of our MDP framework. Our empirical results demonstrate and quantify the significant benefits of our approach.
△ Less
Submitted 31 March, 2021; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Optimising capacity allocation in networks of stochastic loss systems: A functional-form approach
Authors:
Brendan Patch,
Mark S. Squillante,
Peter M. Van de Ven
Abstract:
Motivated by a wide variety of applications, this paper introduces a general class of networks of stochastic loss systems in which congestion renders lost revenue due to customers or jobs being permanently removed from the system. We seek to balance the trade-off between mitigating congestion by increasing service capacity and maintaining low costs for the service capacity provided. Given the lack…
▽ More
Motivated by a wide variety of applications, this paper introduces a general class of networks of stochastic loss systems in which congestion renders lost revenue due to customers or jobs being permanently removed from the system. We seek to balance the trade-off between mitigating congestion by increasing service capacity and maintaining low costs for the service capacity provided. Given the lack of analytical results and the computational burden of simulation-based methods, we propose a hybrid functional-form approach for finding the optimal resource allocation in general networks of stochastic loss systems that combines the speed of an analytical approach with the accuracy of simulation-based optimisation. The key insight is a core iterative algorithm that replaces the computationally expensive gradient estimation in simulation optimisation with a closed-form analytical approximation that is calibrated using a simple simulation run. Extensive computational experiments on complex networks show that our approach renders near-optimal solutions with objective function values that are comparable to those obtained using stochastic approximation, surrogate optimisation and Bayesian optimisation methods while requiring significantly less computational effort.
△ Less
Submitted 11 May, 2022; v1 submitted 10 April, 2019;
originally announced April 2019.
-
PROVEN: Certifying Robustness of Neural Networks with a Probabilistic Approach
Authors:
Tsui-Wei Weng,
Pin-Yu Chen,
Lam M. Nguyen,
Mark S. Squillante,
Ivan Oseledets,
Luca Daniel
Abstract:
With deep neural networks providing state-of-the-art machine learning models for numerous machine learning tasks, quantifying the robustness of these models has become an important area of research. However, most of the research literature merely focuses on the \textit{worst-case} setting where the input of the neural network is perturbed with noises that are constrained within an $\ell_p$ ball; a…
▽ More
With deep neural networks providing state-of-the-art machine learning models for numerous machine learning tasks, quantifying the robustness of these models has become an important area of research. However, most of the research literature merely focuses on the \textit{worst-case} setting where the input of the neural network is perturbed with noises that are constrained within an $\ell_p$ ball; and several algorithms have been proposed to compute certified lower bounds of minimum adversarial distortion based on such worst-case analysis. In this paper, we address these limitations and extend the approach to a \textit{probabilistic} setting where the additive noises can follow a given distributional characterization. We propose a novel probabilistic framework PROVEN to PRObabilistically VErify Neural networks with statistical guarantees -- i.e., PROVEN certifies the probability that the classifier's top-1 prediction cannot be altered under any constrained $\ell_p$ norm perturbation to a given input. Importantly, we show that it is possible to derive closed-form probabilistic certificates based on current state-of-the-art neural network robustness verification frameworks. Hence, the probabilistic certificates provided by PROVEN come naturally and with almost no overhead when obtaining the worst-case certified lower bounds from existing methods such as Fast-Lin, CROWN and CNN-Cert. Experiments on small and large MNIST and CIFAR neural network models demonstrate our probabilistic approach can achieve up to around $75\%$ improvement in the robustness certification with at least a $99.99\%$ confidence compared with the worst-case robustness certificate delivered by CROWN.
△ Less
Submitted 7 January, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
A General Family of Robust Stochastic Operators for Reinforcement Learning
Authors:
Yingdong Lu,
Mark S. Squillante,
Chai Wah Wu
Abstract:
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits…
▽ More
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.
△ Less
Submitted 28 May, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.