-
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
Authors:
N. Benjamin Erichson,
Vinicius Mikuni,
Dongwei Lyu,
Yang Gao,
Omri Azencot,
Soon Hoe Lim,
Michael W. Mahoney
Abstract:
We introduce FLEX (FLow EXpert), a backbone architecture for generative modeling of spatio-temporal physical systems using diffusion models. FLEX operates in the residual space rather than on raw data, a modeling choice that we motivate theoretically, showing that it reduces the variance of the velocity field in the diffusion model, which helps stabilize training. FLEX integrates a latent Transfor…
▽ More
We introduce FLEX (FLow EXpert), a backbone architecture for generative modeling of spatio-temporal physical systems using diffusion models. FLEX operates in the residual space rather than on raw data, a modeling choice that we motivate theoretically, showing that it reduces the variance of the velocity field in the diffusion model, which helps stabilize training. FLEX integrates a latent Transformer into a U-Net with standard convolutional ResNet layers and incorporates a redesigned skip connection scheme. This hybrid design enables the model to capture both local spatial detail and long-range dependencies in latent space. To improve spatio-temporal conditioning, FLEX uses a task-specific encoder that processes auxiliary inputs such as coarse or past snapshots. Weak conditioning is applied to the shared encoder via skip connections to promote generalization, while strong conditioning is applied to the decoder through both skip and bottleneck features to ensure reconstruction fidelity. FLEX achieves accurate predictions for super-resolution and forecasting tasks using as few as two reverse diffusion steps. It also produces calibrated uncertainty estimates through sampling. Evaluations on high-resolution 2D turbulence data show that FLEX outperforms strong baselines and generalizes to out-of-distribution settings, including unseen Reynolds numbers, physical observables (e.g., fluid flow velocity fields), and boundary conditions.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
JFlow: Model-Independent Spherical Jeans Analysis using Equivariant Continuous Normalizing Flows
Authors:
Sung Hak Lim,
Kohei Hayashi,
Shun'ichi Horigome,
Shigeki Matsumoto,
Mihoko M. Nojiri
Abstract:
The kinematics of stars in dwarf spheroidal galaxies have been studied to understand the structure of dark matter halos. However, the kinematic information of these stars is often limited to celestial positions and line-of-sight velocities, making full phase space analysis challenging. Conventional methods rely on projected analytic phase space density models with several parameters and infer dark…
▽ More
The kinematics of stars in dwarf spheroidal galaxies have been studied to understand the structure of dark matter halos. However, the kinematic information of these stars is often limited to celestial positions and line-of-sight velocities, making full phase space analysis challenging. Conventional methods rely on projected analytic phase space density models with several parameters and infer dark matter halo structures by solving the spherical Jeans equation. In this paper, we introduce an unsupervised machine learning method for solving the spherical Jeans equation in a model-independent way as a first step toward model-independent analysis of dwarf spheroidal galaxies. Using equivariant continuous normalizing flows, we demonstrate that spherically symmetric stellar phase space densities and velocity dispersions can be estimated without model assumptions. As a proof of concept, we apply our method to Gaia challenge datasets for spherical models and measure dark matter mass densities for given velocity anisotropy profiles. Our method can identify halo structures accurately, even with a small number of tracer stars.
△ Less
Submitted 2 June, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Integrated Communication and Binary State Detection Under Unequal Error Constraints
Authors:
Daewon Seo,
Sung Hoon Lim
Abstract:
This work considers a problem of integrated sensing and communication (ISAC) in which the goal of sensing is to detect a binary state. Unlike most approaches that minimize the total detection error probability, in our work, we disaggregate the error probability into false alarm and missed detection probabilities and investigate their information-theoretic three-way tradeoff including communication…
▽ More
This work considers a problem of integrated sensing and communication (ISAC) in which the goal of sensing is to detect a binary state. Unlike most approaches that minimize the total detection error probability, in our work, we disaggregate the error probability into false alarm and missed detection probabilities and investigate their information-theoretic three-way tradeoff including communication data rate. We consider a broadcast channel that consists of a transmitter, a communication receiver, and a detector where the receiver's and the detector's channels are affected by an unknown binary state. We consider and present results on two different state-dependent models. In the first setting, the state is fixed throughout the entire transmission, for which we fully characterize the optimal three-way tradeoff between the coding rate for communication and the two possibly nonidentical error exponents for sensing in the asymptotic regime. The achievability and converse proofs rely on the analysis of the cumulant-generating function of the log-likelihood ratio. In the second setting, the state changes every symbol in an independently and identically distributed (i.i.d.) manner, for which we characterize the optimal tradeoff region based on the analysis of the receiver operating characteristic (ROC) curves.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
Multi-modal AI for comprehensive breast cancer prognostication
Authors:
Jan Witowski,
Ken G. Zeng,
Joseph Cappadona,
Jailan Elayoubi,
Khalil Choucair,
Elena Diana Chiru,
Nancy Chan,
Young-Joon Kang,
Frederick Howard,
Irina Ostrovnaya,
Carlos Fernandez-Granda,
Freya Schnabel,
Zoe Steinsnyder,
Ugur Ozerdem,
Kangning Liu,
Waleed Abdulsattar,
Yu Zong,
Lina Daoud,
Rafic Beydoun,
Anas Saad,
Nitya Thakore,
Mohammad Sadic,
Frank Yeung,
Elisa Liu,
Theodore Hill
, et al. (26 additional authors not shown)
Abstract:
Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. However, current tools including genomic assays lack the accuracy required for optimal clinical decision-making. We developed a novel artificial intelligence (AI)-based approach that integrates digital pathology images with clinical data, providing a more robust and effective method for predicting th…
▽ More
Treatment selection in breast cancer is guided by molecular subtypes and clinical characteristics. However, current tools including genomic assays lack the accuracy required for optimal clinical decision-making. We developed a novel artificial intelligence (AI)-based approach that integrates digital pathology images with clinical data, providing a more robust and effective method for predicting the risk of cancer recurrence in breast cancer patients. Specifically, we utilized a vision transformer pan-cancer foundation model trained with self-supervised learning to extract features from digitized H&E-stained slides. These features were integrated with clinical data to form a multi-modal AI test predicting cancer recurrence and death. The test was developed and evaluated using data from a total of 8,161 female breast cancer patients across 15 cohorts originating from seven countries. Of these, 3,502 patients from five cohorts were used exclusively for evaluation, while the remaining patients were used for training. Our test accurately predicted our primary endpoint, disease-free interval, in the five evaluation cohorts (C-index: 0.71 [0.68-0.75], HR: 3.63 [3.02-4.37, p<0.001]). In a direct comparison (n=858), the AI test was more accurate than Oncotype DX, the standard-of-care 21-gene assay, achieving a C-index of 0.67 [0.61-0.74] versus 0.61 [0.49-0.73], respectively. Additionally, the AI test added independent prognostic information to Oncotype DX in a multivariate analysis (HR: 3.11 [1.91-5.09, p<0.001)]). The test demonstrated robust accuracy across major molecular breast cancer subtypes, including TNBC (C-index: 0.71 [0.62-0.81], HR: 3.81 [2.35-6.17, p=0.02]), where no diagnostic tools are currently recommended by clinical guidelines. These results suggest that our AI test improves upon the accuracy of existing prognostic tests, while being applicable to a wider range of patients.
△ Less
Submitted 2 March, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting
Authors:
Soon Hoe Lim,
Yijin Wang,
Annan Yu,
Emma Hart,
Michael W. Mahoney,
Xiaoye S. Li,
N. Benjamin Erichson
Abstract:
Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting performance remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the sele…
▽ More
Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces. However, the impact of the specific choice of probability path model on forecasting performance remains under-explored. In this work, we demonstrate that forecasting spatio-temporal data with flow matching is highly sensitive to the selection of the probability path model. Motivated by this insight, we propose a novel probability path model designed to improve forecasting performance. Our empirical results across various dynamical system benchmarks show that our model achieves faster convergence during training and improved predictive performance compared to existing probability path models. Importantly, our approach is efficient during inference, requiring only a few sampling steps. This makes our proposed model practical for real-world applications and opens new avenues for probabilistic forecasting.
△ Less
Submitted 18 January, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Tuning Frequency Bias of State Space Models
Authors:
Annan Yu,
Dongwei Lyu,
Soon Hoe Lim,
Michael W. Mahoney,
N. Benjamin Erichson
Abstract:
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model t…
▽ More
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
Authors:
Su Hyeon Lim,
Minkuk Kim,
Hyeon Bae Kim,
Seong Tae Kim
Abstract:
Visual Question Answering with Natural Language Explanation (VQA-NLE) task is challenging due to its high demand for reasoning-based inference. Recent VQA-NLE studies focus on enhancing model networks to amplify the model's reasoning capability but this approach is resource-consuming and unstable. In this work, we introduce a new VQA-NLE model, ReRe (Retrieval-augmented natural language Reasoning)…
▽ More
Visual Question Answering with Natural Language Explanation (VQA-NLE) task is challenging due to its high demand for reasoning-based inference. Recent VQA-NLE studies focus on enhancing model networks to amplify the model's reasoning capability but this approach is resource-consuming and unstable. In this work, we introduce a new VQA-NLE model, ReRe (Retrieval-augmented natural language Reasoning), using leverage retrieval information from the memory to aid in generating accurate answers and persuasive explanations without relying on complex networks and extra datasets. ReRe is an encoder-decoder architecture model using a pre-trained clip vision encoder and a pre-trained GPT-2 language model as a decoder. Cross-attention layers are added in the GPT-2 for processing retrieval features. ReRe outperforms previous methods in VQA accuracy and explanation score and shows improvement in NLE with more persuasive, reliability.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Authors:
Seong Hoon Lim,
Taejun Yun,
Jinhyeon Kim,
Jihun Choi,
Taeuk Kim
Abstract:
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising…
▽ More
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
△ Less
Submitted 4 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
On the Fundamental Tradeoff of Joint Communication and Quickest Change Detection with State-Independent Data Channels
Authors:
Daewon Seo,
Sung Hoon Lim
Abstract:
In this work, we take the initiative in studying the information-theoretic tradeoff between communication and quickest change detection (QCD) under an integrated sensing and communication setting. We formally establish a joint communication and sensing problem for the quickest change detection. We assume a broadcast channel with a transmitter, a communication receiver, and a QCD detector in which…
▽ More
In this work, we take the initiative in studying the information-theoretic tradeoff between communication and quickest change detection (QCD) under an integrated sensing and communication setting. We formally establish a joint communication and sensing problem for the quickest change detection. We assume a broadcast channel with a transmitter, a communication receiver, and a QCD detector in which only the detection channel is state dependent. For the problem setting, by utilizing constant subblock-composition codes and a modified CuSum detection rule, which we call subblock CuSum (SCS), we provide an inner bound on the information-theoretic tradeoff between communication rate and change point detection delay in the asymptotic regime of vanishing false alarm rate. We further provide a partial converse that matches our inner bound for a certain class of codes. This implies that the SCS detection strategy is asymptotically optimal for our codes as the false alarm rate constraint vanishes. We also present some canonical examples of the tradeoff region for a binary channel, a scalar Gaussian channel, and a MIMO Gaussian channel.
△ Less
Submitted 9 October, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity
Authors:
Taejun Yun,
Jinhyeon Kim,
Deokyeong Kang,
Seong Hoon Lim,
Jihoon Kim,
Taeuk Kim
Abstract:
Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process. While English, due to its widespread usage, is typically regarded as the primary language for model adaption in various tasks, recent studies have revealed that the effic…
▽ More
Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process. While English, due to its widespread usage, is typically regarded as the primary language for model adaption in various tasks, recent studies have revealed that the efficacy of XLT can be amplified by selecting the most appropriate source languages based on specific conditions. In this work, we propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT. Our approach is model-oriented, better reflecting the inner workings of foundation models. In addition, it requires only a moderate amount of raw text from candidate languages, distinguishing it from the majority of previous methods that rely on external resources. In experiments, we demonstrate that our method is more effective than baselines across diverse tasks. Specifically, it shows proficiency in ranking candidates for zero-shot XLT, achieving an improvement of 4.6% on average in terms of NDCG@3. We also provide extensive analyses that confirm the utility of sub-networks for XLT prediction.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
A Context-Aware CEO Problem
Authors:
Daewon Seo,
Sung Hoon Lim,
Yongjune Kim
Abstract:
In many sensor network applications, a fusion center often has additional valuable information, such as context data, which cannot be obtained directly from the sensors. Motivated by this, we study a generalized CEO problem where a CEO has access to context information. The main contribution of this work is twofold. Firstly, we characterize the asymptotically optimal error exponent per rate as the…
▽ More
In many sensor network applications, a fusion center often has additional valuable information, such as context data, which cannot be obtained directly from the sensors. Motivated by this, we study a generalized CEO problem where a CEO has access to context information. The main contribution of this work is twofold. Firstly, we characterize the asymptotically optimal error exponent per rate as the number of sensors and sum rate grow without bound. The proof extends the Berger-Tung coding scheme and the converse argument by Berger et al. (1996) taking into account context information. The resulting expression includes the minimum Chernoff divergence over context information. Secondly, assuming that the sizes of the source and context alphabets are respectively $|\mathcal{X}|$ and $|\mathcal{S}|$, we prove that it is asymptotically optimal to partition all sensors into at most $\binom{|\mathcal{X}|}{2} |\mathcal{S}|$ groups and have the sensors in each group adopt the same encoding scheme. Our problem subsumes the original CEO problem by Berger et al. (1996) as a special case if there is only one letter for context information; in this case, our result tightens its required number of groups from $\binom{|\mathcal{X}|}{2}+2$ to $\binom{|\mathcal{X}|}{2}$. We also numerically demonstrate the effect of context information for a simple Gaussian scenario.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Gated Recurrent Neural Networks with Weighted Time-Delay Feedback
Authors:
N. Benjamin Erichson,
Soon Hoe Lim,
Michael W. Mahoney
Abstract:
In this paper, we present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism. Our proposed model, named $τ$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and…
▽ More
In this paper, we present a novel approach to modeling long-term dependencies in sequential data by introducing a gated recurrent unit (GRU) with a weighted time-delay feedback mechanism. Our proposed model, named $τ$-GRU, is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). We prove the existence and uniqueness of solutions for the continuous-time model and show that the proposed feedback mechanism can significantly improve the modeling of long-term dependencies. Our empirical results indicate that $τ$-GRU outperforms state-of-the-art recurrent units and gated recurrent architectures on a range of tasks, achieving faster convergence and better generalization.
△ Less
Submitted 19 May, 2025; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent
Authors:
Soon Hoe Lim,
Yijun Wan,
Umut Şimşekli
Abstract:
Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce multiscale p…
▽ More
Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce multiscale perturbed GD (MPGD), a novel optimization framework where the GD recursion is augmented with chaotic perturbations that evolve via an independent dynamical system. We analyze MPGD from three different angles: (i) By building up on recent advances in rough paths theory, we show that, under appropriate assumptions, as the step-size decreases, the MPGD recursion converges weakly to a stochastic differential equation (SDE) driven by a heavy-tailed Lévy-stable process. (ii) By making connections to recently developed generalization bounds for heavy-tailed processes, we derive a generalization bound for the limiting SDE and relate the worst-case generalization error over the trajectories of the process to the parameters of MPGD. (iii) We analyze the implicit regularization effect brought by the dynamical regularization and show that, in the weak perturbation regime, MPGD introduces terms that penalize the Hessian of the loss function. Empirical results are provided to demonstrate the advantages of MPGD.
△ Less
Submitted 22 October, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Neural-Progressive Hedging: Enforcing Constraints in Reinforcement Learning with Stochastic Programming
Authors:
Supriyo Ghosh,
Laura Wynter,
Shiau Hong Lim,
Duc Thien Nguyen
Abstract:
We propose a framework, called neural-progressive hedging (NP), that leverages stochastic programming during the online phase of executing a reinforcement learning (RL) policy. The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR) during the execution of the policy, using probabilistic models of the state transitions to gui…
▽ More
We propose a framework, called neural-progressive hedging (NP), that leverages stochastic programming during the online phase of executing a reinforcement learning (RL) policy. The goal is to ensure feasibility with respect to constraints and risk-based objectives such as conditional value-at-risk (CVaR) during the execution of the policy, using probabilistic models of the state transitions to guide policy adjustments. The framework is particularly amenable to the class of sequential resource allocation problems since feasibility with respect to typical resource constraints cannot be enforced in a scalable manner. The NP framework provides an alternative that adds modest overhead during the online phase. Experimental results demonstrate the efficacy of the NP framework on two continuous real-world tasks: (i) the portfolio optimization problem with liquidity constraints for financial planning, characterized by non-stationary state distributions; and (ii) the dynamic repositioning problem in bike sharing systems, that embodies the class of supply-demand matching problems. We show that the NP framework produces policies that are better than deep RL and other baseline approaches, adapting to non-stationarity, whilst satisfying structural constraints and accommodating risk measures in the resulting policies. Additional benefits of the NP framework are ease of implementation and better explainability of the policies.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
Hybrid Neural Coded Modulation: Design and Training Methods
Authors:
Sung Hoon Lim,
Jiyong Han,
Wonjong Noh,
Yujae Song,
Sang-Woon Jeon
Abstract:
We propose a hybrid coded modulation scheme which composes of inner and outer codes. The outer-code can be any standard binary linear code with efficient soft decoding capability (e.g. low-density parity-check (LDPC) codes). The inner code is designed using a deep neural network (DNN) which takes the channel coded bits and outputs modulated symbols. For training the DNN, we propose to use a loss f…
▽ More
We propose a hybrid coded modulation scheme which composes of inner and outer codes. The outer-code can be any standard binary linear code with efficient soft decoding capability (e.g. low-density parity-check (LDPC) codes). The inner code is designed using a deep neural network (DNN) which takes the channel coded bits and outputs modulated symbols. For training the DNN, we propose to use a loss function that is inspired by the generalized mutual information. The resulting constellations are shown to outperform the conventional quadrature amplitude modulation (QAM) based coding scheme for modulation order 16 and 64 with 5G standard LDPC codes.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
NoisyMix: Boosting Model Robustness to Common Corruptions
Authors:
N. Benjamin Erichson,
Soon Hoe Lim,
Winnie Xu,
Francisco Utrera,
Ziang Cao,
Michael W. Mahoney
Abstract:
For many real-world applications, obtaining stable and robust statistical performance is more important than simply achieving state-of-the-art predictive test accuracy, and thus robustness of neural networks is an increasingly important topic. Relatedly, data augmentation schemes have been shown to improve robustness with respect to input perturbations and domain shifts. Motivated by this, we intr…
▽ More
For many real-world applications, obtaining stable and robust statistical performance is more important than simply achieving state-of-the-art predictive test accuracy, and thus robustness of neural networks is an increasingly important topic. Relatedly, data augmentation schemes have been shown to improve robustness with respect to input perturbations and domain shifts. Motivated by this, we introduce NoisyMix, a novel training scheme that promotes stability as well as leverages noisy augmentations in input and feature space to improve both model robustness and in-domain accuracy. NoisyMix produces models that are consistently more robust and that provide well-calibrated estimates of class membership probabilities. We demonstrate the benefits of NoisyMix on a range of benchmark datasets, including ImageNet-C, ImageNet-R, and ImageNet-P. Moreover, we provide theory to understand implicit regularization and robustness of NoisyMix.
△ Less
Submitted 22 May, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Order Constraints in Optimal Transport
Authors:
Fabian Lim,
Laura Wynter,
Shiau Hong Lim
Abstract:
Optimal transport is a framework for comparing measures whereby a cost is incurred for transporting one measure to another. Recent works have aimed to improve optimal transport plans through the introduction of various forms of structure. We introduce novel order constraints into the optimal transport formulation to allow for the incorporation of structure. We define an efficient method for obtain…
▽ More
Optimal transport is a framework for comparing measures whereby a cost is incurred for transporting one measure to another. Recent works have aimed to improve optimal transport plans through the introduction of various forms of structure. We introduce novel order constraints into the optimal transport formulation to allow for the incorporation of structure. We define an efficient method for obtaining explainable solutions to the new formulation that scales far better than standard approaches. The theoretical properties of the method are provided. We demonstrate experimentally that order constraints improve explainability using the e-SNLI (Stanford Natural Language Inference) dataset that includes human-annotated rationales as well as on several image color transfer examples.
△ Less
Submitted 28 June, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Noisy Feature Mixup
Authors:
Soon Hoe Lim,
N. Benjamin Erichson,
Francisco Utrera,
Winnie Xu,
Michael W. Mahoney
Abstract:
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes. Rather than training with convex combinations of pairs of examples and their labels, we use noise-perturbed convex combinations of pairs of data points in both input and feature space. This method includes mixup and ma…
▽ More
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes. Rather than training with convex combinations of pairs of examples and their labels, we use noise-perturbed convex combinations of pairs of data points in both input and feature space. This method includes mixup and manifold mixup as special cases, but it has additional advantages, including better smoothing of decision boundaries and enabling improved model robustness. We provide theory to understand this as well as the implicit regularization effects of NFM. Our theory is supported by empirical results, demonstrating the advantage of NFM, as compared to mixup and manifold mixup. We show that residual networks and vision transformers trained with NFM have favorable trade-offs between predictive accuracy on clean data and robustness with respect to various types of data perturbation across a range of computer vision benchmark datasets.
△ Less
Submitted 21 November, 2021; v1 submitted 5 October, 2021;
originally announced October 2021.
-
A Unified Discretization Approach to Compute-Forward: From Discrete to Continuous Inputs
Authors:
Adriano Pastore,
Sung Hoon Lim,
Chen Feng,
Bobak Nazer,
Michael Gastpar
Abstract:
Compute-forward is a coding technique that enables receiver(s) in a network to directly decode one or more linear combinations of the transmitted codewords. Initial efforts focused on Gaussian channels and derived achievable rate regions via nested lattice codes and single-user (lattice) decoding as well as sequential (lattice) decoding. Recently, these results have been generalized to discrete me…
▽ More
Compute-forward is a coding technique that enables receiver(s) in a network to directly decode one or more linear combinations of the transmitted codewords. Initial efforts focused on Gaussian channels and derived achievable rate regions via nested lattice codes and single-user (lattice) decoding as well as sequential (lattice) decoding. Recently, these results have been generalized to discrete memoryless channels via nested linear codes and joint typicality coding, culminating in a simultaneous-decoding rate region for recovering one or more linear combinations from $K$ users. Using a discretization approach, this paper translates this result into a simultaneous-decoding rate region for a wide class of continuous memoryless channels, including the important special case of Gaussian channels. Additionally, this paper derives a single, unified expression for both discrete and continuous rate regions via an algebraic generalization of Rényi's information dimension.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Deep Learning-based Beam Tracking for Millimeter-wave Communications under Mobility
Authors:
Sun Hong Lim,
Sunwoo Kim,
Byonghyo Shim,
Jun Won Choi
Abstract:
In this paper, we propose a deep learning-based beam tracking method for millimeter-wave (mmWave)communications. Beam tracking is employed for transmitting the known symbols using the sounding beams and tracking time-varying channels to maintain a reliable communication link. When the pose of a user equipment (UE) device varies rapidly, the mmWave channels also tend to vary fast, which hinders sea…
▽ More
In this paper, we propose a deep learning-based beam tracking method for millimeter-wave (mmWave)communications. Beam tracking is employed for transmitting the known symbols using the sounding beams and tracking time-varying channels to maintain a reliable communication link. When the pose of a user equipment (UE) device varies rapidly, the mmWave channels also tend to vary fast, which hinders seamless communication. Thus, models that can capture temporal behavior of mmWave channels caused by the motion of the device are required, to cope with this problem. Accordingly, we employa deep neural network to analyze the temporal structure and patterns underlying in the time-varying channels and the signals acquired by inertial sensors. We propose a model based on long short termmemory (LSTM) that predicts the distribution of the future channel behavior based on a sequence of input signals available at the UE. This channel distribution is used to 1) control the sounding beams adaptively for the future channel state and 2) update the channel estimate through the measurement update step under a sequential Bayesian estimation framework. Our experimental results demonstrate that the proposed method achieves a significant performance gain over the conventional beam tracking methods under various mobility scenarios.
△ Less
Submitted 1 December, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Efficient Reinforcement Learning in Resource Allocation Problems Through Permutation Invariant Multi-task Learning
Authors:
Desmond Cai,
Shiau Hong Lim,
Laura Wynter
Abstract:
One of the main challenges in real-world reinforcement learning is to learn successfully from limited training samples. We show that in certain settings, the available data can be dramatically increased through a form of multi-task learning, by exploiting an invariance property in the tasks. We provide a theoretical performance bound for the gain in sample efficiency under this setting. This motiv…
▽ More
One of the main challenges in real-world reinforcement learning is to learn successfully from limited training samples. We show that in certain settings, the available data can be dramatically increased through a form of multi-task learning, by exploiting an invariance property in the tasks. We provide a theoretical performance bound for the gain in sample efficiency under this setting. This motivates a new approach to multi-task learning, which involves the design of an appropriate neural network architecture and a prioritized task-sampling strategy. We demonstrate empirically the effectiveness of the proposed approach on two real-world sequential resource allocation tasks where this invariance property occurs: financial portfolio optimization and meta federated learning.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Noisy Recurrent Neural Networks
Authors:
Soon Hoe Lim,
N. Benjamin Erichson,
Liam Hodgkinson,
Michael W. Mahoney
Abstract:
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regulari…
▽ More
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regularizer in the small noise regime. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Sufficient conditions for global stability are obtained, highlighting the phenomenon of stochastic stabilization, where noise injection can improve stability during training. Our theory is supported by empirical results which demonstrate that the RNNs have improved robustness with respect to various input perturbations.
△ Less
Submitted 1 December, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Probabilistic Inference for Learning from Untrusted Sources
Authors:
Duc Thien Nguyen,
Shiau Hoong Lim,
Laura Wynter,
Desmond Cai
Abstract:
Federated learning brings potential benefits of faster learning, better solutions, and a greater propensity to transfer when heterogeneous data from different parties increases diversity. However, because federated learning tasks tend to be large and complex, and training times non-negligible, it is important for the aggregation algorithm to be robust to non-IID data and corrupted parties. This ro…
▽ More
Federated learning brings potential benefits of faster learning, better solutions, and a greater propensity to transfer when heterogeneous data from different parties increases diversity. However, because federated learning tasks tend to be large and complex, and training times non-negligible, it is important for the aggregation algorithm to be robust to non-IID data and corrupted parties. This robustness relies on the ability to identify, and appropriately weight, incompatible parties. Recent work assumes that a \textit{reference dataset} is available through which to perform the identification. We consider settings where no such reference dataset is available; rather, the quality and suitability of the parties needs to be \textit{inferred}. We do so by bringing ideas from crowdsourced predictions and collaborative filtering, where one must infer an unknown ground truth given proposals from participants with unknown quality. We propose novel federated learning aggregation algorithms based on Bayesian inference that adapt to the quality of the parties. Empirically, we show that the algorithms outperform standard and robust aggregation in federated learning on both synthetic and real data.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Robustness and Personalization in Federated Learning: A Unified Approach via Regularization
Authors:
Achintya Kundu,
Pengqian Yu,
Laura Wynter,
Shiau Hong Lim
Abstract:
We present a class of methods for robust, personalized federated learning, called Fed+, that unifies many federated learning algorithms. The principal advantage of this class of methods is to better accommodate the real-world characteristics found in federated training, such as the lack of IID data across parties, the need for robustness to outliers or stragglers, and the requirement to perform we…
▽ More
We present a class of methods for robust, personalized federated learning, called Fed+, that unifies many federated learning algorithms. The principal advantage of this class of methods is to better accommodate the real-world characteristics found in federated training, such as the lack of IID data across parties, the need for robustness to outliers or stragglers, and the requirement to perform well on party-specific datasets. We achieve this through a problem formulation that allows the central server to employ robust ways of aggregating the local models while keeping the structure of local computation intact. Without making any statistical assumption on the degree of heterogeneity of local data across parties, we provide convergence guarantees for Fed+ for convex and non-convex loss functions under different (robust) aggregation methods. The Fed+ theory is also equipped to handle heterogeneous computing environments including stragglers without additional assumptions; specifically, the convergence results cover the general setting where the number of local update steps across parties can vary. We demonstrate the benefits of Fed+ through extensive experiments across standard benchmark datasets.
△ Less
Submitted 12 July, 2022; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Understanding Recurrent Neural Networks Using Nonequilibrium Response Theory
Authors:
Soon Hoe Lim
Abstract:
Recurrent neural networks (RNNs) are brain-inspired models widely used in machine learning for analyzing sequential data. The present work is a contribution towards a deeper understanding of how RNNs process input signals using the response theory from nonequilibrium statistical mechanics. For a class of continuous-time stochastic RNNs (SRNNs) driven by an input signal, we derive a Volterra type s…
▽ More
Recurrent neural networks (RNNs) are brain-inspired models widely used in machine learning for analyzing sequential data. The present work is a contribution towards a deeper understanding of how RNNs process input signals using the response theory from nonequilibrium statistical mechanics. For a class of continuous-time stochastic RNNs (SRNNs) driven by an input signal, we derive a Volterra type series representation for their output. This representation is interpretable and disentangles the input signal from the SRNN architecture. The kernels of the series are certain recursively defined correlation functions with respect to the unperturbed dynamics that completely determine the output. Exploiting connections of this representation and its implications to rough paths theory, we identify a universal feature -- the response feature, which turns out to be the signature of tensor product of the input signal and a natural support basis. In particular, we show that SRNNs, with only the weights in the readout layer optimized and the weights in the hidden layer kept fixed and not optimized, can be viewed as kernel machines operating on a reproducing kernel Hilbert space associated with the response feature.
△ Less
Submitted 18 January, 2021; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Variational Bayesian Inference for Crowdsourcing Predictions
Authors:
Desmond Cai,
Duc Thien Nguyen,
Shiau Hong Lim,
Laura Wynter
Abstract:
Crowdsourcing has emerged as an effective means for performing a number of machine learning tasks such as annotation and labelling of images and other data sets. In most early settings of crowdsourcing, the task involved classification, that is assigning one of a discrete set of labels to each task. Recently, however, more complex tasks have been attempted including asking crowdsource workers to a…
▽ More
Crowdsourcing has emerged as an effective means for performing a number of machine learning tasks such as annotation and labelling of images and other data sets. In most early settings of crowdsourcing, the task involved classification, that is assigning one of a discrete set of labels to each task. Recently, however, more complex tasks have been attempted including asking crowdsource workers to assign continuous labels, or predictions. In essence, this involves the use of crowdsourcing for function estimation. We are motivated by this problem to drive applications such as collaborative prediction, that is, harnessing the wisdom of the crowd to predict quantities more accurately. To do so, we propose a Bayesian approach aimed specifically at alleviating overfitting, a typical impediment to accurate prediction models in practice. In particular, we develop a variational Bayesian technique for two different worker noise models - one that assumes workers' noises are independent and the other that assumes workers' noises have a latent low-rank structure. Our evaluations on synthetic and real-world datasets demonstrate that these Bayesian approaches perform significantly better than existing non-Bayesian approaches and are thus potentially useful for this class of crowdsourcing problems.
△ Less
Submitted 1 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
A Deep Ensemble Multi-Agent Reinforcement Learning Approach for Air Traffic Control
Authors:
Supriyo Ghosh,
Sean Laguna,
Shiau Hong Lim,
Laura Wynter,
Hasan Poonawala
Abstract:
Air traffic control is an example of a highly challenging operational problem that is readily amenable to human expertise augmentation via decision support technologies. In this paper, we propose a new intelligent decision making framework that leverages multi-agent reinforcement learning (MARL) to dynamically suggest adjustments of aircraft speeds in real-time. The goal of the system is to enhanc…
▽ More
Air traffic control is an example of a highly challenging operational problem that is readily amenable to human expertise augmentation via decision support technologies. In this paper, we propose a new intelligent decision making framework that leverages multi-agent reinforcement learning (MARL) to dynamically suggest adjustments of aircraft speeds in real-time. The goal of the system is to enhance the ability of an air traffic controller to provide effective guidance to aircraft to avoid air traffic congestion, near-miss situations, and to improve arrival timeliness. We develop a novel deep ensemble MARL method that can concisely capture the complexity of the air traffic control problem by learning to efficiently arbitrate between the decisions of a local kernel-based RL model and a wider-reaching deep MARL model. The proposed method is trained and evaluated on an open-source air traffic management simulator developed by Eurocontrol. Extensive empirical results on a real-world dataset including thousands of aircraft demonstrate the feasibility of using multi-agent RL for the problem of en-route air traffic control and show that our proposed deep ensemble MARL method significantly outperforms three state-of-the-art benchmark approaches.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Towards an Algebraic Network Information Theory: Simultaneous Joint Typicality Decoding
Authors:
Sung Hoon Lim,
Chen Feng,
Adriano Pastore,
Bobak Nazer,
Michael Gastpar
Abstract:
Consider a receiver in a multi-user network that wishes to decode several messages. Simultaneous joint typicality decoding is one of the most powerful techniques for determining the fundamental limits at which reliable decoding is possible. This technique has historically been used in conjunction with random i.i.d. codebooks to establish achievable rate regions for networks. Recently, it has been…
▽ More
Consider a receiver in a multi-user network that wishes to decode several messages. Simultaneous joint typicality decoding is one of the most powerful techniques for determining the fundamental limits at which reliable decoding is possible. This technique has historically been used in conjunction with random i.i.d. codebooks to establish achievable rate regions for networks. Recently, it has been shown that, in certain scenarios, nested linear codebooks in conjunction with "single-user" or sequential decoding can yield better achievable rates. For instance, the compute-forward problem examines the scenario of recovering $L \le K$ linear combinations of transmitted codewords over a $K$-user multiple-access channel (MAC), and it is well established that linear codebooks can yield higher rates. Here, we develop bounds for simultaneous joint typicality decoding used in conjunction with nested linear codebooks, and apply them to obtain a larger achievable region for compute-forward over a $K$-user discrete memoryless MAC. The key technical challenge is that competing codeword tuples that are linearly dependent on the true codeword tuple introduce statistical dependencies, which requires careful partitioning of the associated error events.
△ Less
Submitted 10 January, 2019;
originally announced January 2019.
-
On the Optimal Achievable Rates for Linear Computation With Random Homologous Codes
Authors:
Pinar Sen,
Sung Hoon Lim,
Young-Han Kim
Abstract:
The problem of computing a linear combination of sources over a multiple access channel is studied. Inner and outer bounds on the optimal tradeoff between the communication rates are established when encoding is restricted to random ensembles of homologous codes, namely, structured nested coset codes from the same generator matrix and individual shaping functions, but when decoding is optimized wi…
▽ More
The problem of computing a linear combination of sources over a multiple access channel is studied. Inner and outer bounds on the optimal tradeoff between the communication rates are established when encoding is restricted to random ensembles of homologous codes, namely, structured nested coset codes from the same generator matrix and individual shaping functions, but when decoding is optimized with respect to the realization of the encoders. For the special case in which the desired linear combination is "matched" to the structure of the multiple access channel in a natural sense, these inner and outer bounds coincide. This result indicates that most, if not all, coding schemes for computation in the literature that rely on random construction of nested coset codes cannot be improved by using more powerful decoders, such as the maximum likelihood decoder. The proof techniques are adapted to characterize the rate region for broadcast channels achieved by Marton's (random) coding scheme under maximum likelihood decoding.
△ Less
Submitted 29 October, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Efficient Beam Training and Channel Estimation for Millimeter Wave Communications Under Mobility
Authors:
Sun Hong Lim,
Jisu Bae,
Sunwoo Kim,
Byonghyo Shim,
Jun Won Choi
Abstract:
In this paper, we propose an efficient beam training technique for millimeter-wave (mmWave) communications. When some mobile users are under high mobility, the beam training should be performed frequently to ensure the accurate acquisition of the channel state information. In order to reduce the resource overhead caused by frequent beam training, we introduce a dedicated beam training strategy whi…
▽ More
In this paper, we propose an efficient beam training technique for millimeter-wave (mmWave) communications. When some mobile users are under high mobility, the beam training should be performed frequently to ensure the accurate acquisition of the channel state information. In order to reduce the resource overhead caused by frequent beam training, we introduce a dedicated beam training strategy which sends the training beams separately to a specific high mobility user (called a target user) without changing the periodicity of the conventional beam training. The dedicated beam training requires small amount of resources since the training beams can be optimized for the target user. In order to satisfy the performance requirement with low training overhead, we propose the optimal training beam selection strategy which finds the best beamforming vectors yielding the lowest channel estimation error based on the target user's probabilistic channel information. Such dedicated beam training is combined with the greedy channel estimation algorithm that accounts for sparse characteristics and temporal dynamics of the target user's channel. Our numerical evaluation demonstrates that the proposed scheme can maintain good channel estimation performance with significantly less training overhead compared to the conventional beam training protocols.
△ Less
Submitted 7 October, 2018; v1 submitted 21 April, 2018;
originally announced April 2018.
-
On the Effects of Subpacketization in Content-Centric Mobile Networks
Authors:
Adeel Malik,
Sung Hoon Lim,
Won-Yong Shin
Abstract:
A large-scale content-centric mobile ad hoc network employing subpacketization is studied in which each mobile node having finite-size cache moves according to the reshuffling mobility model and requests a content object from the library independently at random according to the Zipf popularity distribution. Instead of assuming that one content object is transferred in a single time slot, we consid…
▽ More
A large-scale content-centric mobile ad hoc network employing subpacketization is studied in which each mobile node having finite-size cache moves according to the reshuffling mobility model and requests a content object from the library independently at random according to the Zipf popularity distribution. Instead of assuming that one content object is transferred in a single time slot, we consider a more challenging scenario where the size of each content object is considerably large and thus only a subpacket of a file can be delivered during one time slot, which is motivated by a fast mobility scenario. Under our mobility model, we consider a single-hop-based content delivery and characterize the fundamental trade-offs between throughput and delay. The order-optimal throughput-delay trade-off is analyzed by presenting the following two content reception strategies: the sequential reception for uncoded caching and the random reception for maximum distance separable (MDS)-coded caching. We also perform numerical evaluation to validate our analytical results. In particular, we conduct performance comparisons between the uncoded caching and the MDS-coded caching strategies by identifying the regimes in which the performance difference between the two caching strategies becomes prominent with respect to system parameters such as the Zipf exponent and the number of subpackets. In addition, we extend our study to the random walk mobility scenario and show that our main results are essentially the same as those in the reshuffling mobility model.
△ Less
Submitted 20 April, 2018;
originally announced April 2018.
-
Cooperative Strategies for {UAV}-Enabled Small Cell Networks Sharing Unlicensed Spectrum
Authors:
Yujae Song,
Sung Hoon Lim,
Sang-Woon Jeon,
Seungjae Baek
Abstract:
In this paper, we study an aerial drone base station (DBS) assisted cellular network that consists of a single ground macro base station (MBS), multiple DBSs, and multiple ground terminals (GT). We assume that the MBS transmits to the DBSs and the GTs in the licensed band while the DBSs use a separate unlicensed band (e.g. Wi-Fi) to transmit to the GTs. For the utilization of the DBSs, we propose…
▽ More
In this paper, we study an aerial drone base station (DBS) assisted cellular network that consists of a single ground macro base station (MBS), multiple DBSs, and multiple ground terminals (GT). We assume that the MBS transmits to the DBSs and the GTs in the licensed band while the DBSs use a separate unlicensed band (e.g. Wi-Fi) to transmit to the GTs. For the utilization of the DBSs, we propose a cooperative decode--forward (DF) protocol in which multiple DBSs assist the terminals simultaneously while maintaining a predetermined interference level on the coexisting unlicensed band users. For our network setup, we formulate a joint optimization problem for minimizing the aggregate gap between the target rates and the throughputs of terminals by optimizing over the 3D positions of the DBSs and the resources (power, time, bandwidth) of the network. To solve the optimization problem, we propose an efficient nested structured algorithm based on particle swarm optimization and convex optimization methods. Extensive numerical evaluations of the proposed algorithm is performed considering various aspects to demonstrate the performance of our algorithm and the gain for utilizing DBSs.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Compute--Forward Multiple Access (CFMA): Practical Code Design
Authors:
Erixhen Sula,
Jingge Zhu,
Adriano Pastore,
Sung Hoon Lim,
Michael Gastpar
Abstract:
We present a practical strategy that aims to attain rate points on the dominant face of the multiple access channel capacity using a standard low complexity decoder. This technique is built upon recent theoretical developments of Zhu and Gastpar on compute-forward multiple access (CFMA) which achieves the capacity of the multiple access channel using a sequential decoder. We illustrate this strate…
▽ More
We present a practical strategy that aims to attain rate points on the dominant face of the multiple access channel capacity using a standard low complexity decoder. This technique is built upon recent theoretical developments of Zhu and Gastpar on compute-forward multiple access (CFMA) which achieves the capacity of the multiple access channel using a sequential decoder. We illustrate this strategy with off-the-shelf LDPC codes. In the first stage of decoding, the receiver first recovers a linear combination of the transmitted codewords using the sum-product algorithm (SPA). In the second stage, by using the recovered sum-of-codewords as side information, the receiver recovers one of the two codewords using a modified SPA, ultimately recovering both codewords. The main benefit of recovering the sum-of-codewords instead of the codeword itself is that it allows to attain points on the dominant face of the multiple access channel capacity without the need of rate-splitting or time sharing while maintaining a low complexity in the order of a standard point-to-point decoder. This property is also shown to be crucial for some applications, e.g., interference channels. For all the simulations with single-layer binary codes, our proposed practical strategy is shown to be within \SI{1.7}{\decibel} of the theoretical limits, without explicit optimization on the off-the-self LDPC codes.
△ Less
Submitted 29 December, 2017;
originally announced December 2017.
-
Communication versus Computation: Duality for multiple access channels and source coding
Authors:
Jingge Zhu,
Sung Hoon Lim,
Michael Gastpar
Abstract:
Computation codes in network information theory are designed for the scenarios where the decoder is not interested in recovering the information sources themselves, but only a function thereof. Körner and Marton showed for distributed source coding that such function decoding can be achieved more efficiently than decoding the full information sources. Compute-and-forward has shown that function de…
▽ More
Computation codes in network information theory are designed for the scenarios where the decoder is not interested in recovering the information sources themselves, but only a function thereof. Körner and Marton showed for distributed source coding that such function decoding can be achieved more efficiently than decoding the full information sources. Compute-and-forward has shown that function decoding, in combination with network coding ideas, is a useful building block for end-to-end communication. In both cases, good computation codes are the key component in the coding schemes. In this work, we expose the fact that good computation codes could undermine the capability of the codes for recovering the information sources individually, e.g., for the purpose of multiple access and distributed source coding. Particularly, we establish duality results between the codes which are good for computation and the codes which are good for multiple access or distributed compression.
△ Less
Submitted 26 July, 2017;
originally announced July 2017.
-
New Beam Tracking Technique for Millimeter Wave-band Communications
Authors:
Jisu Bae,
Sun Hong Lim,
Jin Hyeok Yoo,
Jun Won Choi
Abstract:
In this paper, we propose an efficient beam tracking method for mobility scenario in mmWave-band communications. When the position of the mobile changes in mobility scenario, the base-station needs to perform beam training frequently to track the time-varying channel, thereby spending significant resources for training beams. In order to reduce the training overhead, we propose a new beam training…
▽ More
In this paper, we propose an efficient beam tracking method for mobility scenario in mmWave-band communications. When the position of the mobile changes in mobility scenario, the base-station needs to perform beam training frequently to track the time-varying channel, thereby spending significant resources for training beams. In order to reduce the training overhead, we propose a new beam training approach called "beam tracking" which exploits the continuous nature of time varying angle of departure (AoD) for beam selection. We show that transmission of only two training beams is enough to track the time-varying AoD at good accuracy. We derive the optimal selection of beam pair which minimizes Cramer-Rao Lower Bound (CRLB) for AoD estimation averaged over statistical distribution of the AoD. Our numerical results demonstrate that the proposed beam tracking scheme produces better AoD estimation than the conventional beam training protocol with less training overhead.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
A Joint Typicality Approach to Algebraic Network Information Theory
Authors:
Sung Hoon Lim,
Chen Feng,
Adriano Pastore,
Bobak Nazer,
Michael Gastpar
Abstract:
This paper presents a joint typicality framework for encoding and decoding nested linear codes for multi-user networks. This framework provides a new perspective on compute-forward within the context of discrete memoryless networks. In particular, it establishes an achievable rate region for computing the weighted sum of nested linear codewords over a discrete memoryless multiple-access channel (M…
▽ More
This paper presents a joint typicality framework for encoding and decoding nested linear codes for multi-user networks. This framework provides a new perspective on compute-forward within the context of discrete memoryless networks. In particular, it establishes an achievable rate region for computing the weighted sum of nested linear codewords over a discrete memoryless multiple-access channel (MAC). When specialized to the Gaussian MAC, this rate region recovers and improves upon the lattice-based compute-forward rate region of Nazer and Gastpar, thus providing a unified approach for discrete memoryless and Gaussian networks. Furthermore, this framework can be used to shed light on the joint decoding rate region for compute-forward, which is considered an open problem. Specifically, this work establishes an achievable rate region for simultaneously decoding two linear combinations of nested linear codewords from K senders.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
Fundamental Limits of Spectrum Sharing Full-Duplex Multicell Networks
Authors:
Sung Ho Chae,
Sang-Woon Jeon,
Sung Hoon Lim
Abstract:
This paper studies the degrees of freedom of full-duplex multicell networks that share the spectrum among multiple cells in a non-orthogonal setting. In the considered network, we assume that {\em full-duplex} base stations with multiple transmit and receive antennas communicate with multiple single-antenna mobile users. By spectrum sharing among multiple cells and (simultaneously) enabling full-d…
▽ More
This paper studies the degrees of freedom of full-duplex multicell networks that share the spectrum among multiple cells in a non-orthogonal setting. In the considered network, we assume that {\em full-duplex} base stations with multiple transmit and receive antennas communicate with multiple single-antenna mobile users. By spectrum sharing among multiple cells and (simultaneously) enabling full-duplex radio, the network can utilize the spectrum more flexibly, but, at the same time, the network is subject to multiple sources of interference compared to a network with separately dedicated bands for distinct cells and uplink--downlink traffic. Consequently, to take advantage of the additional freedom in utilizing the spectrum, interference management is a crucial ingredient. In this work, we propose a novel strategy based on interference alignment which takes into account inter-cell interference and intra-cell interference caused by spectrum sharing and full-duplex to establish a general achievability result on the sum degrees of freedom of the considered network. Paired with an upper bound on the sum degrees of freedom, which is tight under certain conditions, we demonstrate how spectrum sharing and full-duplex can significantly improve the throughput over conventional cellular networks, especially for a network with large number of users and/or cells.
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
Information Theoretic Caching: The Multi-User Case
Authors:
Sung Hoon Lim,
Chien-Yi Wang,
Michael Gastpar
Abstract:
In this paper, we consider a cache aided network in which each user is assumed to have individual caches, while upon users' requests, an update message is sent though a common link to all users. First, we formulate a general information theoretic setting that represents the database as a discrete memoryless source, and the users' requests as side information that is available everywhere except at…
▽ More
In this paper, we consider a cache aided network in which each user is assumed to have individual caches, while upon users' requests, an update message is sent though a common link to all users. First, we formulate a general information theoretic setting that represents the database as a discrete memoryless source, and the users' requests as side information that is available everywhere except at the cache encoder. The decoders' objective is to recover a function of the source and the side information. By viewing cache aided networks in terms of a general distributed source coding problem and through information theoretic arguments, we present inner and outer bounds on the fundamental tradeoff of cache memory size and update rate. Then, we specialize our general inner and outer bounds to a specific model of content delivery networks: File selection networks, in which the database is a collection of independent equal-size files and each user requests one of the files independently. For file selection networks, we provide an outer bound and two inner bounds (for centralized and decentralized caching strategies). For the case when the user request information is uniformly distributed, we characterize the rate vs. cache size tradeoff to within a multiplicative gap of 4. By further extending our arguments to the framework of Maddah-Ali and Niesen, we also establish a new outer bound and two new inner bounds in which it is shown to recover the centralized and decentralized strategies, previously established by Maddah-Ali and Niesen. Finally, in terms of rate vs. cache size tradeoff, we improve the previous multiplicative gap of 72 to 4.7 for the average case with uniform requests.
△ Less
Submitted 8 April, 2016;
originally announced April 2016.
-
A New Converse Bound for Coded Caching
Authors:
Chien-Yi Wang,
Sung Hoon Lim,
Michael Gastpar
Abstract:
An information-theoretic lower bound is developed for the caching system studied by Maddah-Ali and Niesen. By comparing the proposed lower bound with the decentralized coded caching scheme of Maddah-Ali and Niesen, the optimal memory--rate tradeoff is characterized to within a multiplicative gap of $4.7$ for the worst case, improving the previous analytical gap of $12$. Furthermore, for the case w…
▽ More
An information-theoretic lower bound is developed for the caching system studied by Maddah-Ali and Niesen. By comparing the proposed lower bound with the decentralized coded caching scheme of Maddah-Ali and Niesen, the optimal memory--rate tradeoff is characterized to within a multiplicative gap of $4.7$ for the worst case, improving the previous analytical gap of $12$. Furthermore, for the case when users' requests follow the uniform distribution, the multiplicative gap is tightened to $4.7$, improving the previous analytical gap of $72$. As an independent result of interest, for the single-user average case in which the user requests multiple files, it is proved that caching the most requested files is optimal.
△ Less
Submitted 21 January, 2016;
originally announced January 2016.
-
Distributed Decode-Forward for Relay Networks
Authors:
Sung Hoon Lim,
Kwang Taik Kim,
Young-Han Kim
Abstract:
A new coding scheme for general N-node relay networks is presented for unicast, multicast, and broadcast. The proposed distributed decode-forward scheme combines and generalizes Marton coding for single-hop broadcast channels and the Cover-El Gamal partial decode-forward coding scheme for 3-node relay channels. The key idea of the scheme is to precode all the codewords of the entire network at the…
▽ More
A new coding scheme for general N-node relay networks is presented for unicast, multicast, and broadcast. The proposed distributed decode-forward scheme combines and generalizes Marton coding for single-hop broadcast channels and the Cover-El Gamal partial decode-forward coding scheme for 3-node relay channels. The key idea of the scheme is to precode all the codewords of the entire network at the source by multicoding over multiple blocks. This encoding step allows these codewords to carry partial information of the messages implicitly without complicated rate splitting and routing. This partial information is then recovered at the relay nodes and forwarded further. For N-node Gaussian unicast, multicast, and broadcast relay networks, the scheme achieves within 0.5N bits from the cutset bound and thus from the capacity (region), regardless of the network topology, channel gains, or power constraints. Roughly speaking, distributed decode-forward is dual to noisy network coding, which generalized compress-forward to unicast, multicast, and multiple access relay networks.
△ Less
Submitted 11 January, 2017; v1 submitted 3 October, 2015;
originally announced October 2015.
-
Information-Theoretic Caching: Sequential Coding for Computing
Authors:
Chien-Yi Wang,
Sung Hoon Lim,
Michael Gastpar
Abstract:
Under the paradigm of caching, partial data is delivered before the actual requests of users are known. In this paper, this problem is modeled as a canonical distributed source coding problem with side information, where the side information represents the users' requests. For the single-user case, a single-letter characterization of the optimal rate region is established, and for several importan…
▽ More
Under the paradigm of caching, partial data is delivered before the actual requests of users are known. In this paper, this problem is modeled as a canonical distributed source coding problem with side information, where the side information represents the users' requests. For the single-user case, a single-letter characterization of the optimal rate region is established, and for several important special cases, closed-form solutions are given, including the scenario of uniformly distributed user requests. In this case, it is shown that the optimal caching strategy is closely related to total correlation and Wyner's common information. Using the insight gained from the single-user case, three two-user scenarios admitting single-letter characterization are considered, which draw connections to existing source coding problems in the literature: the Gray--Wyner system and distributed successive refinement. Finally, the model studied by Maddah-Ali and Niesen is rephrased to make a comparison with the considered information-theoretic model. Although the two caching models have a similar behavior for the single-user case, it is shown through a two-user example that the two caching models behave differently in general.
△ Less
Submitted 27 February, 2016; v1 submitted 2 April, 2015;
originally announced April 2015.
-
Degrees of Freedom of Full-Duplex Multiantenna Cellular Networks
Authors:
Sang-Woon Jeon,
Sung Ho Chae,
Sung Hoon Lim
Abstract:
We study the degrees of freedom (DoF) of cellular networks in which a full duplex (FD) base station (BS) equipped with multiple transmit and receive antennas communicates with multiple mobile users. We consider two different scenarios. In the first scenario, we study the case when half duplex (HD) users, partitioned to either the uplink (UL) set or the downlink (DL) set, simultaneously communicate…
▽ More
We study the degrees of freedom (DoF) of cellular networks in which a full duplex (FD) base station (BS) equipped with multiple transmit and receive antennas communicates with multiple mobile users. We consider two different scenarios. In the first scenario, we study the case when half duplex (HD) users, partitioned to either the uplink (UL) set or the downlink (DL) set, simultaneously communicate with the FD BS. In the second scenario, we study the case when FD users simultaneously communicate UL and DL data with the FD BS. Unlike conventional HD only systems, inter-user interference (within the cell) may severely limit the DoF, and must be carefully taken into account. With the goal of providing theoretical guidelines for designing such FD systems, we completely characterize the sum DoF of each of the two different FD cellular networks by developing an achievable scheme and obtaining a matching upper bound. The key idea of the proposed scheme is to carefully allocate UL and DL information streams using interference alignment and beamforming techniques. By comparing the DoFs of the considered FD systems with those of the conventional HD systems, we establish the DoF gain by enabling FD operation in various configurations. As a consequence of the result, we show that the DoF can approach the two-fold gain over the HD systems when the number of users becomes large enough as compared to the number of antennas at the BS.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
Hybrid Coding: An Interface for Joint Source-Channel Coding and Network Communication
Authors:
Paolo Minero,
Sung Hoon Lim,
Young-Han Kim
Abstract:
A new approach to joint source-channel coding is presented in the context of communicating correlated sources over multiple access channels. Similar to the separation architecture, the joint source-channel coding system architecture in this approach is modular, whereby the source encoding and channel decoding operations are decoupled. However, unlike the separation architecture, the same codeword…
▽ More
A new approach to joint source-channel coding is presented in the context of communicating correlated sources over multiple access channels. Similar to the separation architecture, the joint source-channel coding system architecture in this approach is modular, whereby the source encoding and channel decoding operations are decoupled. However, unlike the separation architecture, the same codeword is used for both source coding and channel coding, which allows the resulting hybrid coding scheme to achieve the performance of the best known joint source-channel coding schemes. Applications of the proposed architecture to relay communication are also discussed.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.
-
Noisy Search with Comparative Feedback
Authors:
Shiau Hong Lim,
Peter Auer
Abstract:
We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a targ…
▽ More
We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a target out of n items can be found in O(log n) queries. We also show the surprising result that for k possible answers per query, the speedup is not log k (as for k-ary search) but only log log k in some cases.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
Noisy Network Coding
Authors:
Sung Hoon Lim,
Young-Han Kim,
Abbas El Gamal,
Sae-Young Chung
Abstract:
A noisy network coding scheme for sending multiple sources over a general noisy network is presented. For multi-source multicast networks, the scheme naturally extends both network coding over noiseless networks by Ahlswede, Cai, Li, and Yeung, and compress-forward coding for the relay channel by Cover and El Gamal to general discrete memoryless and Gaussian networks. The scheme also recovers as…
▽ More
A noisy network coding scheme for sending multiple sources over a general noisy network is presented. For multi-source multicast networks, the scheme naturally extends both network coding over noiseless networks by Ahlswede, Cai, Li, and Yeung, and compress-forward coding for the relay channel by Cover and El Gamal to general discrete memoryless and Gaussian networks. The scheme also recovers as special cases the results on coding for wireless relay networks and deterministic networks by Avestimehr, Diggavi, and Tse, and coding for wireless erasure networks by Dana, Gowaikar, Palanki, Hassibi, and Effros. The scheme involves message repetition coding, relay signal compression, and simultaneous decoding. Unlike previous compress--forward schemes, where independent messages are sent over multiple blocks, the same message is sent multiple times using independent codebooks as in the network coding scheme for cyclic networks. Furthermore, the relays do not use Wyner--Ziv binning as in previous compress-forward schemes, and each decoder performs simultaneous joint typicality decoding on the received signals from all the blocks without explicitly decoding the compression indices. A consequence of this new scheme is that achievability is proved simply and more generally without resorting to time expansion to extend results for acyclic networks to networks with cycles. The noisy network coding scheme is then extended to general multi-source networks by combining it with decoding techniques for interference channels. For the Gaussian multicast network, noisy network coding improves the previously established gap to the cutset bound. We also demonstrate through two popular AWGN network examples that noisy network coding can outperform conventional compress-forward, amplify-forward, and hash-forward schemes.
△ Less
Submitted 11 March, 2010; v1 submitted 16 February, 2010;
originally announced February 2010.
-
Deterministic Relay Networks with State Information
Authors:
Sung Hoon Lim,
Young-Han Kim,
Sae-Young Chung
Abstract:
Motivated by fading channels and erasure channels, the problem of reliable communication over deterministic relay networks is studied, in which relay nodes receive a function of the incoming signals and a random network state. An achievable rate is characterized for the case in which destination nodes have full knowledge of the state information. If the relay nodes receive a linear function of t…
▽ More
Motivated by fading channels and erasure channels, the problem of reliable communication over deterministic relay networks is studied, in which relay nodes receive a function of the incoming signals and a random network state. An achievable rate is characterized for the case in which destination nodes have full knowledge of the state information. If the relay nodes receive a linear function of the incoming signals and the state in a finite field, then the achievable rate is shown to be optimal, meeting the cut-set upper bound on the capacity. This result generalizes on a unified framework the work of Avestimehr, Diggavi, and Tse on the deterministic networks with state dependency, the work of Dana, Gowaikar, Palanki, Hassibi, and Effros on linear erasure networks with interference, and the work of Smith and Vishwanath on linear erasure networks with broadcast.
△ Less
Submitted 20 May, 2009; v1 submitted 19 May, 2009;
originally announced May 2009.