-
Breaking through the classical Shannon entropy limit: A new frontier through logical semantics
Authors:
Luis A. Lastras,
Barry M. Trager,
Jonathan Lenchner,
Wojciech Szpankowski,
Chai Wah Wu,
Mark S. Squillante,
Alexander Gray
Abstract:
Information theory has provided foundations for the theories of several application areas critical for modern society, including communications, computer storage, and AI. A key aspect of Shannon's 1948 theory is a sharp lower bound on the number of bits needed to encode and communicate a string of symbols. When he introduced the theory, Shannon famously excluded any notion of semantics behind the…
▽ More
Information theory has provided foundations for the theories of several application areas critical for modern society, including communications, computer storage, and AI. A key aspect of Shannon's 1948 theory is a sharp lower bound on the number of bits needed to encode and communicate a string of symbols. When he introduced the theory, Shannon famously excluded any notion of semantics behind the symbols being communicated. This semantics-free notion went on to have massive impact on communication and computing technologies, even as multiple proposals for reintroducing semantics in a theory of information were being made, notably one where Carnap and Bar-Hillel used logic and reasoning to capture semantics. In this paper we present, for the first time, a Shannon-style analysis of a communication system equipped with a deductive reasoning capability, implemented using logical inference. We use some of the most important techniques developed in information theory to demonstrate significant and sometimes surprising gains in communication efficiency availed to us through such capability, demonstrated also through practical codes. We thus argue that proposals for a semantic information theory should include the power of deductive reasoning to magnify the value of transmitted bits as we strive to fully unlock the inherent potential of semantics.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
On the Interplay of Clustering and Evolution in the Emergence of Epidemic Outbreaks
Authors:
Mansi Sood,
Hejin Gu,
Rashad Eletreby,
Swarun Kumar,
Chai Wah Wu,
Osman Yagan
Abstract:
In an increasingly interconnected world, a key scientific challenge is to examine mechanisms that lead to the widespread propagation of contagions, such as misinformation and pathogens, and identify risk factors that can trigger large-scale outbreaks. Underlying both the spread of disease and misinformation epidemics is the evolution of the contagion as it propagates, leading to the emergence of d…
▽ More
In an increasingly interconnected world, a key scientific challenge is to examine mechanisms that lead to the widespread propagation of contagions, such as misinformation and pathogens, and identify risk factors that can trigger large-scale outbreaks. Underlying both the spread of disease and misinformation epidemics is the evolution of the contagion as it propagates, leading to the emergence of different strains, e.g., through genetic mutations in pathogens and alterations in the information content. Recent studies have revealed that models that do not account for heterogeneity in transmission risks associated with different strains of the circulating contagion can lead to inaccurate predictions. However, existing results on multi-strain spreading assume that the network has a vanishingly small clustering coefficient, whereas clustering is widely known to be a fundamental property of real-world social networks. In this work, we investigate spreading processes that entail evolutionary adaptations on random graphs with tunable clustering and arbitrary degree distributions. We derive a mathematical framework to quantify the epidemic characteristics of a contagion that evolves as it spreads, with the structure of the underlying network as given via arbitrary {\em joint} degree distributions of single-edges and triangles. To the best of our knowledge, our work is the first to jointly analyze the impact of clustering and evolution on the emergence of epidemic outbreaks. We supplement our theoretical finding with numerical simulations and case studies, shedding light on the impact of clustering on contagion spread.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms
Authors:
Weiqin Chen,
Mark S. Squillante,
Chai Wah Wu,
Santiago Paternain
Abstract:
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a spe…
▽ More
We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.
△ Less
Submitted 27 November, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Multi-Function Multi-Way Analog Technology for Sustainable Machine Intelligence Computation
Authors:
Vassilis Kalantzis,
Mark S. Squillante,
Shashanka Ubaru,
Tayfun Gokmen,
Chai Wah Wu,
Anshul Gupta,
Haim Avron,
Tomasz Nowicki,
Malte Rasch,
Murat Onen,
Vanessa Lopez Marrero,
Effendi Leobandung,
Yasuteru Kohda,
Wilfried Haensch,
Lior Horesh
Abstract:
Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous imp…
▽ More
Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous improvements in computation and energy, but at the expense of inherent unpredictability and noise. We devise novel randomized algorithms tailored to MFMWA architectures that mitigate the detrimental impact of imperfect analog computations while realizing their potential benefits across various areas of AI, such as applications in computer vision. Through analysis, measurements from analog devices, and simulations of larger systems, we demonstrate orders of magnitude reduction in both computation and energy with accuracy similar to digital computers.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Towards a Unification of Logic and Information Theory
Authors:
Luis A. Lastras,
Barry Trager,
Jonathan Lenchner,
Wojtek Szpankowski,
Chai Wah Wu,
Mark Squillante,
Alex Gray
Abstract:
Today, the vast majority of the world's digital information is represented using the fundamental assumption, introduced by Claude Shannon in 1948, that ``...the semantic aspects of communication are irrelevant to the engineering problem (of the design of communication systems)...''. Consider, nonetheless, the observation that we often combine a message with other information in order to deduce new…
▽ More
Today, the vast majority of the world's digital information is represented using the fundamental assumption, introduced by Claude Shannon in 1948, that ``...the semantic aspects of communication are irrelevant to the engineering problem (of the design of communication systems)...''. Consider, nonetheless, the observation that we often combine a message with other information in order to deduce new facts, thereby expanding the value of such a message. It is noteworthy that to-date, no rigorous theory of communication has been put forth which postulates the existence of deductive capabilities on the receiver's side.
The purpose of this paper is to present such a theory. We formally model such deductive capabilities using logic reasoning, and present a rigorous theory which covers the following generic scenario: Alice and Bob each have knowledge of some logic sentence, and they wish to communicate as efficiently as possible with the shared goal that, following their communication, Bob should be able to deduce a particular logic sentence that Alice knows to be true, but that Bob currently cannot prove. Many variants of this general setup are considered in this article; in all cases we are able to provide sharp upper and lower bounds. Our contribution includes the identification of the most fundamental requirements that we place on a logic and associated logical language for all of our results to apply. Practical algorithms that are in some cases asymptotically optimal are provided, and we illustrate the potential practical value of the design of communication systems that incorporate the assumption of deductive capabilities at the receiver using experimental results that suggest significant possible gains compared to classical systems.
△ Less
Submitted 30 December, 2024; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Spreading Processes with Mutations over Multi-layer Networks
Authors:
Mansi Sood,
Anirudh Sridhar,
Rashad Eletreby,
Chai Wah Wu,
Simon A. Levin,
H. Vincent Poor,
Osman Yagan
Abstract:
A key scientific challenge during the outbreak of novel infectious diseases is to predict how the course of the epidemic changes under different countermeasures that limit interaction in the population. Most epidemiological models do not consider the role of mutations and heterogeneity in the type of contact events. However, pathogens have the capacity to mutate in response to changing environment…
▽ More
A key scientific challenge during the outbreak of novel infectious diseases is to predict how the course of the epidemic changes under different countermeasures that limit interaction in the population. Most epidemiological models do not consider the role of mutations and heterogeneity in the type of contact events. However, pathogens have the capacity to mutate in response to changing environments, especially caused by the increase in population immunity to existing strains and the emergence of new pathogen strains poses a continued threat to public health. Further, in light of differing transmission risks in different congregate settings (e.g., schools and offices), different mitigation strategies may need to be adopted to control the spread of infection. We analyze a multi-layer multi-strain model by simultaneously accounting for i) pathways for mutations in the pathogen leading to the emergence of new pathogen strains, and ii) differing transmission risks in different congregate settings, modeled as network-layers. Assuming complete cross-immunity among strains, namely, recovery from any infection prevents infection with any other (an assumption that will need to be relaxed to deal with COVID-19 or influenza), we derive the key epidemiological parameters for the proposed multi-layer multi-strain framework. We demonstrate that reductions to existing network-based models that discount heterogeneity in either the strain or the network layers can lead to incorrect predictions for the course of the outbreak. In addition, our results highlight that the impact of imposing/lifting mitigation measures concerning different contact network layers (e.g., school closures or work-from-home policies) should be evaluated in connection with their effect on the likelihood of the emergence of new pathogen strains.
△ Less
Submitted 24 January, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Active Learning of Quantum System Hamiltonians yields Query Advantage
Authors:
Arkopal Dutt,
Edwin Pednault,
Chai Wah Wu,
Sarah Sheldon,
John Smolin,
Lev Bishop,
Isaac L. Chuang
Abstract:
Hamiltonian learning is an important procedure in quantum system identification, calibration, and successful operation of quantum computers. Through queries to the quantum system, this procedure seeks to obtain the parameters of a given Hamiltonian model and description of noise sources. Standard techniques for Hamiltonian learning require careful design of queries and $O(ε^{-2})$ queries in achie…
▽ More
Hamiltonian learning is an important procedure in quantum system identification, calibration, and successful operation of quantum computers. Through queries to the quantum system, this procedure seeks to obtain the parameters of a given Hamiltonian model and description of noise sources. Standard techniques for Hamiltonian learning require careful design of queries and $O(ε^{-2})$ queries in achieving learning error $ε$ due to the standard quantum limit. With the goal of efficiently and accurately estimating the Hamiltonian parameters within learning error $ε$ through minimal queries, we introduce an active learner that is given an initial set of training examples and the ability to interactively query the quantum system to generate new training data. We formally specify and experimentally assess the performance of this Hamiltonian active learning (HAL) algorithm for learning the six parameters of a two-qubit cross-resonance Hamiltonian on four different superconducting IBM Quantum devices. Compared with standard techniques for the same problem and a specified learning error, HAL achieves up to a $99.8\%$ reduction in queries required, and a $99.1\%$ reduction over the comparable non-adaptive learning algorithm. Moreover, with access to prior information on a subset of Hamiltonian parameters and given the ability to select queries with linearly (or exponentially) longer system interaction times during learning, HAL can exceed the standard quantum limit and achieve Heisenberg (or super-Heisenberg) limited convergence rates during learning.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
The Role of Masks in Mitigating Viral Spread on Networks
Authors:
Yurun Tian,
Anirudh Sridhar,
Chai Wah Wu,
Simon A. Levin,
Kathleen M. Carley,
H. Vincent Poor,
Osman Yagan
Abstract:
Masks have remained an important mitigation strategy in the fight against COVID-19 due to their ability to prevent the transmission of respiratory droplets between individuals. In this work, we provide a comprehensive quantitative analysis of the impact of mask-wearing. To this end, we propose a novel agent-based model of viral spread on networks where agents may either wear no mask or wear one of…
▽ More
Masks have remained an important mitigation strategy in the fight against COVID-19 due to their ability to prevent the transmission of respiratory droplets between individuals. In this work, we provide a comprehensive quantitative analysis of the impact of mask-wearing. To this end, we propose a novel agent-based model of viral spread on networks where agents may either wear no mask or wear one of several types of masks with different properties (e.g., cloth or surgical). We derive analytical expressions for three key epidemiological quantities: the probability of emergence, the epidemic threshold, and the expected epidemic size. In particular, we show how the aforementioned quantities depend on the structure of the contact network, viral transmission dynamics, and the distribution of the different types of masks within the population. Through extensive simulations, we then investigate the impact of different allocations of masks within the population and trade-offs between the outward efficiency and inward efficiency of the masks. Interestingly, we find that masks with high outward efficiency and low inward efficiency are most useful for controlling the spread in the early stages of an epidemic, while masks with high inward efficiency but low outward efficiency are most useful in reducing the size of an already large spread. Lastly, we study whether degree-based mask allocation is more effective in reducing the probability of epidemic as well as epidemic size compared to random allocation. The result echoes the previous findings that mitigation strategies should differ based on the stage of the spreading process, focusing on source control before the epidemic emerges and on self-protection after the emergence.
△ Less
Submitted 6 June, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Solving sparse linear systems with approximate inverse preconditioners on analog devices
Authors:
Vasileios Kalantzis,
Anshul Gupta,
Lior Horesh,
Tomasz Nowicki,
Mark S. Squillante,
Chai Wah Wu
Abstract:
Sparse linear system solvers are computationally expensive kernels that lie at the heart of numerous applications. This paper proposes a flexible preconditioning framework to substantially reduce the time and energy requirements of this task by utilizing a hybrid architecture that combines conventional digital microprocessors with analog crossbar array accelerators. Our analysis and experiments wi…
▽ More
Sparse linear system solvers are computationally expensive kernels that lie at the heart of numerous applications. This paper proposes a flexible preconditioning framework to substantially reduce the time and energy requirements of this task by utilizing a hybrid architecture that combines conventional digital microprocessors with analog crossbar array accelerators. Our analysis and experiments with a simulator for analog hardware demonstrate that an order of magnitude speedup is readily attainable without much impact on convergence, despite the noise in analog computations.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Dither computing: a hybrid deterministic-stochastic computing framework
Authors:
Chai Wah Wu
Abstract:
Stochastic computing has a long history as an alternative method of performing arithmetic on a computer. While it can be considered an unbiased estimator of real numbers, it has a variance and MSE on the order of $Ω(\frac{1}{N})$. On the other hand, deterministic variants of stochastic computing remove the stochastic aspect, but cannot approximate arbitrary real numbers with arbitrary precision an…
▽ More
Stochastic computing has a long history as an alternative method of performing arithmetic on a computer. While it can be considered an unbiased estimator of real numbers, it has a variance and MSE on the order of $Ω(\frac{1}{N})$. On the other hand, deterministic variants of stochastic computing remove the stochastic aspect, but cannot approximate arbitrary real numbers with arbitrary precision and are biased estimators. However, they have an asymptotically superior MSE on the order of $O(\frac{1}{N^2})$. Recent results in deep learning with stochastic rounding suggest that the bias in the rounding can degrade performance. We proposed an alternative framework, called dither computing, that combines aspects of stochastic computing and its deterministic variants and that can perform computing with similar efficiency, is unbiased, and with a variance and MSE also on the optimal order of $Θ(\frac{1}{N^2})$. We also show that it can be beneficial in stochastic rounding applications as well. We provide implementation details and give experimental results to comparatively show the benefits of the proposed scheme.
△ Less
Submitted 5 May, 2021; v1 submitted 21 February, 2021;
originally announced February 2021.
-
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies
Authors:
Yingdong Lu,
Mark S. Squillante,
Chai Wah Wu
Abstract:
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control p…
▽ More
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of general classes of our control-based RL methods are presented, together with forms of exploration and exploitation in learning and applying the optimal control policy over time. Our general MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish the convergence and optimality-both in general and within various control paradigms (e.g., piecewise linear control policies)-of our control-based methods through this general MDP framework, including convergence of $Q$-learning within the context of our MDP framework. Our empirical results demonstrate and quantify the significant benefits of our approach.
△ Less
Submitted 31 March, 2021; v1 submitted 28 May, 2019;
originally announced May 2019.
-
TableNet: a multiplier-less implementation of neural networks for inferencing
Authors:
Chai Wah Wu
Abstract:
We consider the use of look-up tables (LUT) to simplify the hardware implementation of a deep learning network for inferencing after weights have been successfully trained. The use of LUT replaces the matrix multiply and add operations with a small number of LUTs and addition operations resulting in a completely multiplier-less implementation. We compare the different tradeoffs of this approach in…
▽ More
We consider the use of look-up tables (LUT) to simplify the hardware implementation of a deep learning network for inferencing after weights have been successfully trained. The use of LUT replaces the matrix multiply and add operations with a small number of LUTs and addition operations resulting in a completely multiplier-less implementation. We compare the different tradeoffs of this approach in terms of accuracy versus LUT size and the number of operations and show that similar performance can be obtained with a comparable memory footprint as a full precision deep neural network, but without the use of any multipliers. We illustrate this with several architectures such as MLP and CNN.
△ Less
Submitted 5 September, 2019; v1 submitted 25 May, 2019;
originally announced May 2019.
-
ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions
Authors:
Chai Wah Wu
Abstract:
We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We s…
▽ More
We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3x10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.
△ Less
Submitted 23 May, 2019; v1 submitted 6 September, 2018;
originally announced September 2018.
-
A General Family of Robust Stochastic Operators for Reinforcement Learning
Authors:
Yingdong Lu,
Mark S. Squillante,
Chai Wah Wu
Abstract:
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits…
▽ More
We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.
△ Less
Submitted 28 May, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Can machine learning identify interesting mathematics? An exploration using empirically observed laws
Authors:
Chai Wah Wu
Abstract:
We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford's law and Taylor's law and experiment with various classifiers to identify whether a sequence is, for example, nice, important, multiplicative, easy to…
▽ More
We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford's law and Taylor's law and experiment with various classifiers to identify whether a sequence is, for example, nice, important, multiplicative, easy to compute or related to primes or palindromes.
△ Less
Submitted 9 September, 2018; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error
Authors:
Chai Wah Wu
Abstract:
In most error correction coding (ECC) frameworks, the typical error metric is the bit error rate (BER) which measures the number of bit errors. For this metric, the positions of the bits are not relevant to the decoding, and in many noise models, not relevant to the BER either. In many applications this is unsatisfactory as typically all bits are not equal and have different significance. We consi…
▽ More
In most error correction coding (ECC) frameworks, the typical error metric is the bit error rate (BER) which measures the number of bit errors. For this metric, the positions of the bits are not relevant to the decoding, and in many noise models, not relevant to the BER either. In many applications this is unsatisfactory as typically all bits are not equal and have different significance. We consider the problem of bit error correction and mitigation where bits in different positions have different importance. For error correction, we look at ECC from a Bayesian perspective and introduce Bayes estimators with general loss functions to take into account the bit significance. We propose ECC schemes that optimize this error metric. As the problem is highly nonlinear, traditional ECC construction techniques are not applicable. Using exhaustive search is cost prohibitive, and thus we use iterative improvement search techniques to find good codebooks. We optimize both general codebooks and linear codes. We provide numerical experiments to show that they can be superior to classical linear block codes such as Hamming codes and decoding methods such as minimum distance decoding.
For error mitigation, we study the case where ECC is not possible or not desirable, but significance aware encoding of information is still beneficial in reducing the average error. We propose a novel number presentation format suitable for emerging storage media where the noise magnitude is unknown and possibly large and show that it has lower mean error than the traditional number format.
△ Less
Submitted 7 October, 2021; v1 submitted 18 May, 2018;
originally announced May 2018.