-
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Authors:
Jiazhi Guan,
Kaisiyuan Wang,
Zhiliang Xu,
Quanwei Yang,
Yasheng Sun,
Shengyi He,
Borong Liang,
Yukang Cao,
Yingying Li,
Haocheng Feng,
Errui Ding,
Jingdong Wang,
Youjian Zhao,
Hang Zhou,
Ziwei Liu
Abstract:
Despite the recent progress of audio-driven video generation, existing methods mostly focus on driving facial movements, leading to non-coherent head and body dynamics. Moving forward, it is desirable yet challenging to generate holistic human videos with both accurate lip-sync and delicate co-speech gestures w.r.t. given audio. In this work, we propose AudCast, a generalized audio-driven human vi…
▽ More
Despite the recent progress of audio-driven video generation, existing methods mostly focus on driving facial movements, leading to non-coherent head and body dynamics. Moving forward, it is desirable yet challenging to generate holistic human videos with both accurate lip-sync and delicate co-speech gestures w.r.t. given audio. In this work, we propose AudCast, a generalized audio-driven human video generation framework adopting a cascade Diffusion-Transformers (DiTs) paradigm, which synthesizes holistic human videos based on a reference image and a given audio. 1) Firstly, an audio-conditioned Holistic Human DiT architecture is proposed to directly drive the movements of any human body with vivid gesture dynamics. 2) Then to enhance hand and face details that are well-knownly difficult to handle, a Regional Refinement DiT leverages regional 3D fitting as the bridge to reform the signals, producing the final results. Extensive experiments demonstrate that our framework generates high-fidelity audio-driven holistic human videos with temporal coherence and fine facial and hand details. Resources can be found at https://guanjz20.github.io/projects/AudCast.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Authors:
Jiazhi Guan,
Quanwei Yang,
Kaisiyuan Wang,
Hang Zhou,
Shengyi He,
Zhiliang Xu,
Haocheng Feng,
Errui Ding,
Jingdong Wang,
Hongtao Xie,
Youjian Zhao,
Ziwei Liu
Abstract:
Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the M…
▽ More
Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the Motion-Enhanced Textural-Aware ModeLing for SpeaKing Avatar Reenactment (TALK-Act) framework, which enables high-fidelity avatar reenactment from only short footage of monocular video. Our key idea is to enhance the textural awareness with explicit motion guidance in diffusion modeling. Specifically, we carefully construct 2D and 3D structural information as intermediate guidance. While recent diffusion models adopt a side network for control information injection, they fail to synthesize temporally stable results even with person-specific fine-tuning. We propose a Motion-Enhanced Textural Alignment module to enhance the bond between driving and target signals. Moreover, we build a Memory-based Hand-Recovering module to help with the difficulties in hand-shape preserving. After pre-training, our model can achieve high-fidelity 2D avatar reenactment with only 30 seconds of person-specific data. Extensive experiments demonstrate the effectiveness and superiority of our proposed framework. Resources can be found at https://guanjz20.github.io/projects/TALK-Act.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Authors:
Jiazhi Guan,
Zhiliang Xu,
Hang Zhou,
Kaisiyuan Wang,
Shengyi He,
Zhanwang Zhang,
Borong Liang,
Haocheng Feng,
Errui Ding,
Jingtuo Liu,
Jingdong Wang,
Youjian Zhao,
Ziwei Liu
Abstract:
Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn…
▽ More
Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyncer, that synchronizes generalized audio-visual facial information. The key design is revisiting and rewiring the Style-based generator to efficiently adopt 3D facial dynamics predicted by a principled style-injected Transformer. By simply re-configuring the information insertion mechanisms within the noise and style space, our framework fuses motion and appearance with unified training. Extensive experiments demonstrate that ReSyncer not only produces high-fidelity lip-synced videos according to audio, but also supports multiple appealing properties that are suitable for creating virtual presenters and performers, including fast personalized fine-tuning, video-driven lip-syncing, the transfer of speaking styles, and even face swapping. Resources can be found at https://guanjz20.github.io/projects/ReSyncer.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction
Authors:
Xinquan Yang,
Guanqun Zhou,
Wei Sun,
Youjian Zhang,
Zhongya Wang,
Jiahui He,
Zhicheng Zhang
Abstract:
In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover…
▽ More
In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discovered that the uncertainty image computed from the restoration result of initial training weights can effectively highlight high-frequency regions, including metal artifacts. This observation can be leveraged to assist the MAR network in removing metal artifacts. Therefore, we propose an uncertainty constraint (UC) loss that utilizes the uncertainty image as an adaptive weight to guide the MAR network to focus on the metal artifact region, leading to improved restoration. The proposed UC loss is designed to be a plug-and-play method, compatible with any MAR framework, and easily adoptable. To validate the effectiveness of the UC loss, we conduct extensive experiments on the public available Deeplesion and CLINIC-metal dataset. Experimental results demonstrate that the UC loss further optimizes the network training process and significantly improves the removal of metal artifacts.
△ Less
Submitted 20 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
Authors:
Tongze Wang,
Xiaohui Xie,
Wenduo Wang,
Chuyi Wang,
Youjian Zhao,
Yong Cui
Abstract:
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due…
▽ More
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due to the quadratic complexity of the widely used Transformer architecture. Secondly, they suffer from inadequate traffic representation because of discarding important byte information while retaining unwanted biases. To address these challenges, we propose NetMamba, an efficient linear-time state space model equipped with a comprehensive traffic representation scheme. We adopt a specially selected and improved unidirectional Mamba architecture for the networking field, instead of the Transformer, to address efficiency issues. In addition, we design a traffic representation scheme to extract valid information from massive traffic data while removing biased information. Evaluation experiments on six public datasets encompassing three main classification tasks showcase NetMamba's superior classification performance compared to state-of-the-art baselines. It achieves an accuracy rate of nearly 99% (some over 99%) in all tasks. Additionally, NetMamba demonstrates excellent efficiency, improving inference speed by up to 60 times while maintaining comparably low memory usage. Furthermore, NetMamba exhibits superior few-shot learning abilities, achieving better classification performance with fewer labeled data. To the best of our knowledge, NetMamba is the first model to tailor the Mamba architecture for networking.
△ Less
Submitted 20 October, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
Authors:
Bo Zou,
Shaofeng Wang,
Hao Liu,
Gaoyue Sun,
Yajie Wang,
FeiFei Zuo,
Chengbin Quan,
Youjian Zhao
Abstract:
Teeth localization, segmentation, and labeling in 2D images have great potential in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, general instance segmentation frameworks are incompetent due to 1) the subtle differences between some teeth' shapes (e.g., maxillary first premolar and second premolar), 2) the teeth's position…
▽ More
Teeth localization, segmentation, and labeling in 2D images have great potential in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, general instance segmentation frameworks are incompetent due to 1) the subtle differences between some teeth' shapes (e.g., maxillary first premolar and second premolar), 2) the teeth's position and shape variation across subjects, and 3) the presence of abnormalities in the dentition (e.g., caries and edentulism). To address these problems, we propose a ViT-based framework named TeethSEG, which consists of stacked Multi-Scale Aggregation (MSA) blocks and an Anthropic Prior Knowledge (APK) layer. Specifically, to compose the two modules, we design 1) a unique permutation-based upscaler to ensure high efficiency while establishing clear segmentation boundaries with 2) multi-head self/cross-gating layers to emphasize particular semantics meanwhile maintaining the divergence between token embeddings. Besides, we collect 3) the first open-sourced intraoral image dataset IO150K, which comprises over 150k intraoral photos, and all photos are annotated by orthodontists using a human-machine hybrid algorithm. Experiments on IO150K demonstrate that our TeethSEG outperforms the state-of-the-art segmentation models on dental image segmentation.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Authors:
Bo Zou,
Chao Yang,
Yu Qiao,
Chengbin Quan,
Youjian Zhao
Abstract:
Significant advancements in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although these image-language models can efficiently represent both video and language branches, they typically employ a goal-free vision perception process and do not interact vision with language well during the answer generation, thus omitting crucial vis…
▽ More
Significant advancements in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although these image-language models can efficiently represent both video and language branches, they typically employ a goal-free vision perception process and do not interact vision with language well during the answer generation, thus omitting crucial visual cues. In this paper, we are inspired by the human recognition and learning pattern and propose VideoDistill, a framework with language-aware (i.e., goal-driven) behavior in both vision perception and answer generation process. VideoDistill generates answers only from question-related visual embeddings and follows a thinking-observing-answering approach that closely resembles human behavior, distinguishing it from previous research. Specifically, we develop a language-aware gating mechanism to replace the standard cross-attention, avoiding language's direct fusion into visual representations. We incorporate this mechanism into two key components of the entire framework. The first component is a differentiable sparse sampling module, which selects frames containing the necessary dynamics and semantics relevant to the questions. The second component is a vision refinement module that merges existing spatial-temporal attention layers to ensure the extraction of multi-grained visual semantics associated with the questions. We conduct experimental evaluations on various challenging video question-answering benchmarks, and VideoDistill achieves state-of-the-art performance in both general and long-form VideoQA datasets. In Addition, we verify that VideoDistill can effectively alleviate the utilization of language shortcut solutions in the EgoTaskQA dataset.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Authors:
Bo Zou,
Chao Yang,
Yu Qiao,
Chengbin Quan,
Youjian Zhao
Abstract:
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile informat…
▽ More
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets. Furthermore, we unify the modeling of multi-modal tuning and language-only tuning, extending LLaMA-Excitor to a powerful visual instruction follower without the need for complex multi-modal alignment. Our proposed approach is evaluated in language-only and multi-modal tuning experimental scenarios. Notably, LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement (+6%) on the MMLU benchmark. In the visual instruction tuning, we achieve a new state-of-the-art image captioning performance of 157.5 CIDEr on MSCOCO, and a comparable performance (88.39%) on ScienceQA to cutting-edge models with more parameters and extensive vision-language pertaining.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Building an Invisible Shield for Your Portrait against Deepfakes
Authors:
Jiazhi Guan,
Tianshu Hu,
Hang Zhou,
Zhizhi Guo,
Lirui Deng,
Chengbin Quan,
Errui Ding,
Youjian Zhao
Abstract:
The issue of detecting deepfakes has garnered significant attention in the research community, with the goal of identifying facial manipulations for abuse prevention. Although recent studies have focused on developing generalized models that can detect various types of deepfakes, their performance is not always be reliable and stable, which poses limitations in real-world applications. Instead of…
▽ More
The issue of detecting deepfakes has garnered significant attention in the research community, with the goal of identifying facial manipulations for abuse prevention. Although recent studies have focused on developing generalized models that can detect various types of deepfakes, their performance is not always be reliable and stable, which poses limitations in real-world applications. Instead of learning a forgery detector, in this paper, we propose a novel framework - Integrity Encryptor, aiming to protect portraits in a proactive strategy. Our methodology involves covertly encoding messages that are closely associated with key facial attributes into authentic images prior to their public release. Unlike authentic images, where the hidden messages can be extracted with precision, manipulating the facial attributes through deepfake techniques can disrupt the decoding process. Consequently, the modified facial attributes serve as a mean of detecting manipulated images through a comparison of the decoded messages. Our encryption approach is characterized by its brevity and efficiency, and the resulting method exhibits a good robustness against typical image processing traces, such as image degradation and noise. When compared to baselines that struggle to detect deepfakes in a black-box setting, our method utilizing conditional encryption showcases superior performance when presented with a range of different types of forgeries. In experiments conducted on our protected data, our approach outperforms existing state-of-the-art methods by a significant margin.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption
Authors:
Jiazhi Guan,
Hang Zhou,
Mingming Gong,
Errui Ding,
Jingdong Wang,
Youjian Zhao
Abstract:
Despite encouraging progress in deepfake detection, generalization to unseen forgery types remains a significant challenge due to the limited forgery clues explored during training. In contrast, we notice a common phenomenon in deepfake: fake video creation inevitably disrupts the statistical regularity in original videos. Inspired by this observation, we propose to boost the generalization of dee…
▽ More
Despite encouraging progress in deepfake detection, generalization to unseen forgery types remains a significant challenge due to the limited forgery clues explored during training. In contrast, we notice a common phenomenon in deepfake: fake video creation inevitably disrupts the statistical regularity in original videos. Inspired by this observation, we propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos. Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator and create a wide range of pseudo-fake videos for training. Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner. To jointly capture the spatial and temporal disruptions, we propose a Spatio-Temporal Enhancement block to learn the regularity disruption across space and time on our self-created videos. Through comprehensive experiments, our method exhibits excellent performance on several datasets.
△ Less
Submitted 25 June, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Delving into Sequential Patches for Deepfake Detection
Authors:
Jiazhi Guan,
Hang Zhou,
Zhibin Hong,
Errui Ding,
Jingdong Wang,
Chengbin Quan,
Youjian Zhao
Abstract:
Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions. As a result, researchers have been devoted to deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods, however, they still suffer from ro…
▽ More
Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions. As a result, researchers have been devoted to deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods, however, they still suffer from robustness problem against post-processings. In this work, we propose the Local- & Temporal-aware Transformer-based Deepfake Detection (LTTD) framework, which adopts a local-to-global learning protocol with a particular focus on the valuable temporal information within local sequences. Specifically, we propose a Local Sequence Transformer (LST), which models the temporal consistency on sequences of restricted spatial regions, where low-level information is hierarchically enhanced with shallow layers of learned 3D filters. Based on the local temporal embeddings, we then achieve the final classification in a global contrastive way. Extensive experiments on popular datasets validate that our approach effectively spots local forgery cues and achieves state-of-the-art performance.
△ Less
Submitted 12 October, 2022; v1 submitted 6 July, 2022;
originally announced July 2022.
-
CORE: Consistent Representation Learning for Face Forgery Detection
Authors:
Yunsheng Ni,
Depu Meng,
Changqian Yu,
Chengbin Quan,
Dongchun Ren,
Youjian Zhao
Abstract:
Face manipulation techniques develop rapidly and arouse widespread public concerns. Despite that vanilla convolutional neural networks achieve acceptable performance, they suffer from the overfitting issue. To relieve this issue, there is a trend to introduce some erasing-based augmentations. We find that these methods indeed attempt to implicitly induce more consistent representations for differe…
▽ More
Face manipulation techniques develop rapidly and arouse widespread public concerns. Despite that vanilla convolutional neural networks achieve acceptable performance, they suffer from the overfitting issue. To relieve this issue, there is a trend to introduce some erasing-based augmentations. We find that these methods indeed attempt to implicitly induce more consistent representations for different augmentations via assigning the same label for different augmented images. However, due to the lack of explicit regularization, the consistency between different representations is less satisfactory. Therefore, we constrain the consistency of different representations explicitly and propose a simple yet effective framework, COnsistent REpresentation Learning (CORE). Specifically, we first capture the different representations with different augmentations, then regularize the cosine distance of the representations to enhance the consistency. Extensive experiments (in-dataset and cross-dataset) demonstrate that CORE performs favorably against state-of-the-art face forgery detection methods.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring
Authors:
Youjian Zhang,
Chaoyue Wang,
Dacheng Tao
Abstract:
Real-world dynamic scene deblurring has long been a challenging task since paired blurry-sharp training data is unavailable. Conventional Maximum A Posteriori estimation and deep learning-based deblurring methods are restricted by handcrafted priors and synthetic blurry-sharp training pairs respectively, thereby failing to generalize to real dynamic blurriness. To this end, we propose a Neural Max…
▽ More
Real-world dynamic scene deblurring has long been a challenging task since paired blurry-sharp training data is unavailable. Conventional Maximum A Posteriori estimation and deep learning-based deblurring methods are restricted by handcrafted priors and synthetic blurry-sharp training pairs respectively, thereby failing to generalize to real dynamic blurriness. To this end, we propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data. The proposed NeruMAP consists of a motion estimation network and a deblurring network which are trained jointly to model the (re)blurring process (i.e. likelihood function). Meanwhile, the motion estimation network is trained to explore the motion information in images by applying implicit dynamic motion prior, and in return enforces the deblurring network training (i.e. providing sharp image prior). The proposed NeurMAP is an orthogonal approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets. Experiments demonstrate our superiority on both quantitative metrics and visual quality over state-of-the-art methods. Codes are available on https://github.com/yjzhang96/NeurMAP-deblur.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Video Frame Interpolation without Temporal Priors
Authors:
Youjian Zhang,
Chaoyue Wang,
Dacheng Tao
Abstract:
Video frame interpolation, which aims to synthesize non-exist intermediate frames in a video sequence, is an important research topic in computer vision. Existing video frame interpolation methods have achieved remarkable results under specific assumptions, such as instant or known exposure time. However, in complicated real-world situations, the temporal priors of videos, i.e. frames per second (…
▽ More
Video frame interpolation, which aims to synthesize non-exist intermediate frames in a video sequence, is an important research topic in computer vision. Existing video frame interpolation methods have achieved remarkable results under specific assumptions, such as instant or known exposure time. However, in complicated real-world situations, the temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors. When test videos are taken under different exposure settings from training ones, the interpolated frames will suffer significant misalignment problems. In this work, we solve the video frame interpolation problem in a general situation, where input frames can be acquired under uncertain exposure (and interval) time. Unlike previous methods that can only be applied to a specific temporal prior, we derive a general curvilinear motion trajectory formula from four consecutive sharp frames or two consecutive blurry frames without temporal priors. Moreover, utilizing constraints within adjacent motion trajectories, we devise a novel optical flow refinement strategy for better interpolation results. Finally, experiments demonstrate that one well-trained model is enough for synthesizing high-quality slow-motion videos under complicated real-world situations. Codes are available on https://github.com/yjzhang96/UTI-VFI.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Exposure Trajectory Recovery from Motion Blur
Authors:
Youjian Zhang,
Chaoyue Wang,
Stephen J. Maybank,
Dacheng Tao
Abstract:
Motion blur in dynamic scenes is an important yet challenging research topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destro…
▽ More
Motion blur in dynamic scenes is an important yet challenging research topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation from a blurry image is highly ill-posed. By revisiting the principle of camera exposure, motion blur can be described by the relative motions of sharp content with respect to each exposed position. In this paper, we define exposure trajectories, which represent the motion information contained in a blurry image and explain the causes of motion blur. A novel motion offset estimation framework is proposed to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild constraints, our method can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, experiments demonstrate that the recovered exposure trajectories not only capture accurate and interpretable motion information from a blurry image, but also benefit motion-aware image deblurring and warping-based video extraction tasks. Codes are available on https://github.com/yjzhang96/Motion-ETR.
△ Less
Submitted 4 October, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem
Authors:
Zhiyuan Liu,
Huazheng Wang,
Bo Waggoner,
Youjian,
Liu,
Lijun Chen
Abstract:
We investigate the sparse linear contextual bandit problem where the parameter $θ$ is sparse. To relieve the sampling inefficiency, we utilize the "perturbed adversary" where the context is generated adversarilly but with small random non-adaptive perturbations. We prove that the simple online Lasso supports sparse linear contextual bandit with regret bound $\mathcal{O}(\sqrt{kT\log d})$ even when…
▽ More
We investigate the sparse linear contextual bandit problem where the parameter $θ$ is sparse. To relieve the sampling inefficiency, we utilize the "perturbed adversary" where the context is generated adversarilly but with small random non-adaptive perturbations. We prove that the simple online Lasso supports sparse linear contextual bandit with regret bound $\mathcal{O}(\sqrt{kT\log d})$ even when $d \gg T$ where $k$ and $d$ are the number of effective and ambient dimension, respectively. Compared to the recent work from Sivakumar et al. (2020), our analysis does not rely on the precondition processing, adaptive perturbation (the adaptive perturbation violates the i.i.d perturbation setting) or truncation on the error set. Moreover, the special structures in our results explicitly characterize how the perturbation affects exploration length, guide the design of perturbation together with the fundamental performance limit of perturbation method. Numerical experiments are provided to complement the theoretical analysis.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Utilizing Players' Playtime Records for Churn Prediction: Mining Playtime Regularity
Authors:
Wanshan Yang,
Ting Huang,
Junlin Zeng,
Lijun Chen,
Shivakant Mishra,
Youjian,
Liu
Abstract:
In the free online game industry, churn prediction is an important research topic. Reducing the churn rate of a game significantly helps with the success of the game. Churn prediction helps a game operator identify possible churning players and keep them engaged in the game via appropriate operational strategies, marketing strategies, and/or incentives. Playtime related features are some of the wi…
▽ More
In the free online game industry, churn prediction is an important research topic. Reducing the churn rate of a game significantly helps with the success of the game. Churn prediction helps a game operator identify possible churning players and keep them engaged in the game via appropriate operational strategies, marketing strategies, and/or incentives. Playtime related features are some of the widely used universal features for most churn prediction models. In this paper, we consider developing new universal features for churn predictions for long-term players based on players' playtime.
△ Less
Submitted 27 December, 2019; v1 submitted 14 December, 2019;
originally announced December 2019.
-
On the Necessity and Effectiveness of Learning the Prior of Variational Auto-Encoder
Authors:
Haowen Xu,
Wenxiao Chen,
Jinlin Lai,
Zhihan Li,
Youjian Zhao,
Dan Pei
Abstract:
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative way to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior whe…
▽ More
Using powerful posterior distributions is a popular approach to achieving better variational inference. However, recent works showed that the aggregated posterior may fail to match unit Gaussian prior, thus learning the prior becomes an alternative way to improve the lower-bound. In this paper, for the first time in the literature, we prove the necessity and effectiveness of learning the prior when aggregated posterior does not match unit Gaussian prior, analyze why this situation may happen, and propose a hypothesis that learning the prior may improve reconstruction loss, all of which are supported by our extensive experiment results. We show that using learned Real NVP prior and just one latent variable in VAE, we can achieve test NLL comparable to very deep state-of-the-art hierarchical VAE, outperforming many previous works with complex hierarchical VAE architectures.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
Maximum A Posteriori Probability (MAP) Joint Fine Frequency Offset and Channel Estimation for MIMO Systems with Channels of Arbitrary Correlation
Authors:
Mingda Zhou,
Zhe Feng,
Xinming Huang,
Youjian,
Liu
Abstract:
Channel and frequency offset estimation is a classic topic with a large body of prior work using mainly maximum likelihood (ML) approach together with Cramér-Rao Lower bounds (CRLB) analysis. We provide the maximum a posteriori (MAP) estimation solution which is particularly useful for for tracking where previous estimation can be used as prior knowledge. Unlike the ML cases, the corresponding Bay…
▽ More
Channel and frequency offset estimation is a classic topic with a large body of prior work using mainly maximum likelihood (ML) approach together with Cramér-Rao Lower bounds (CRLB) analysis. We provide the maximum a posteriori (MAP) estimation solution which is particularly useful for for tracking where previous estimation can be used as prior knowledge. Unlike the ML cases, the corresponding Bayesian Cramér-Rao Lower bound (BCRLB) shows clear relation with parameters and a low complexity algorithm achieves the BCRLB in almost all SNR range. We allow the time invariant channel within a packet to have arbitrary correlation and mean. The estimation is based on pilot/training signals. An unexpected result is that the joint MAP estimation is equivalent to an individual MAP estimation of the frequency offset first, again different from the ML results. We provide insight on the pilot/training signal design based on the BCRLB. Unlike past algorithms that trade performance and/or complexity for the accommodation of time varying channels, the MAP solution provides a different route for dealing with time variation. Within a short enough (segment of) packet where the channel and CFO are approximately time invariant, the low complexity algorithm can be employed. Similar to belief propagation, the estimation of the previous (segment of) packet can serve as the prior knowledge for the next (segment of) packet.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
Authors:
Haowen Xu,
Wenxiao Chen,
Nengwen Zhao,
Zeyan Li,
Jiahao Bu,
Zhihan Li,
Ying Liu,
Youjian Zhao,
Dan Pei,
Yang Feng,
Jie Chen,
Zhaogang Wang,
Honglin Qiao
Abstract:
To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without l…
▽ More
To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Why It Takes So Long to Connect to a WiFi Access Point
Authors:
Changhua Pei,
Zhi Wang,
Youjian Zhao,
Zihan Wang,
Yuan Meng,
Dan Pei,
Yuanquan Peng,
Wenliang Tang,
Xiaodong Qu
Abstract:
Today's WiFi networks deliver a large fraction of traffic. However, the performance and quality of WiFi networks are still far from satisfactory. Among many popular quality metrics (throughput, latency), the probability of successfully connecting to WiFi APs and the time cost of the WiFi connection set-up process are the two of the most critical metrics that affect WiFi users' experience. To under…
▽ More
Today's WiFi networks deliver a large fraction of traffic. However, the performance and quality of WiFi networks are still far from satisfactory. Among many popular quality metrics (throughput, latency), the probability of successfully connecting to WiFi APs and the time cost of the WiFi connection set-up process are the two of the most critical metrics that affect WiFi users' experience. To understand the WiFi connection set-up process in real-world settings, we carry out measurement studies on $5$ million mobile users from $4$ representative cities associating with $7$ million APs in $0.4$ billion WiFi sessions, collected from a mobile "WiFi Manager" App that tops the Android/iOS App market. To the best of our knowledge, we are the first to do such large scale study on: how large the WiFi connection set-up time cost is, what factors affect the WiFi connection set-up process, and what can be done to reduce the WiFi connection set-up time cost. Based on the measurement analysis, we develop a machine learning based AP selection strategy that can significantly improve WiFi connection set-up performance, against the conventional strategy purely based on signal strength, by reducing the connection set-up failures from $33\%$ to $3.6\%$ and reducing $80\%$ time costs of the connection set-up processes by more than $10$ times.
△ Less
Submitted 8 May, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Dual Link Algorithm for the Weighted Sum Rate Maximization in MIMO Interference Channels
Authors:
Xing Li,
Seungil You,
Lijun Chen,
An Liu,
Youjian,
Liu
Abstract:
MIMO interference network optimization is important for increasingly crowded wireless communication networks. We provide a new algorithm, named Dual Link algorithm, for the classic problem of weighted sum-rate maximization for MIMO multiaccess channels (MAC), broadcast channels (BC), and general MIMO interference channels with Gaussian input and a total power constraint. For MIMO MAC/BC, the algor…
▽ More
MIMO interference network optimization is important for increasingly crowded wireless communication networks. We provide a new algorithm, named Dual Link algorithm, for the classic problem of weighted sum-rate maximization for MIMO multiaccess channels (MAC), broadcast channels (BC), and general MIMO interference channels with Gaussian input and a total power constraint. For MIMO MAC/BC, the algorithm finds optimal signals to achieve the capacity region boundary. For interference channels with Gaussian input assumption, two of the previous state-of-the-art algorithms are the WMMSE algorithm and the polite water-filling (PWF) algorithm. The WMMSE algorithm is provably convergent, while the PWF algorithm takes the advantage of the optimal transmit signal structure and converges the fastest in most situations but is not guaranteed to converge in all situations. It is highly desirable to design an algorithm that has the advantages of both algorithms. The dual link algorithm is such an algorithm. Its fast and guaranteed convergence is important to distributed implementation and time varying channels. In addition, the technique and a scaling invariance property used in the convergence proof may find applications in other non-convex problems in communication networks.
△ Less
Submitted 4 April, 2016;
originally announced April 2016.
-
Downlink Analysis for a Heterogeneous Cellular Network
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X Brown
Abstract:
In this paper, a comprehensive study of the the downlink performance in a heterogeneous cellular network (or hetnet) is conducted. A general hetnet model is considered consisting of an arbitrary number of open-access and closed-access tier of base stations (BSs) arranged according to independent homogeneous Poisson point processes. The BSs of each tier have a constant transmission power, random fa…
▽ More
In this paper, a comprehensive study of the the downlink performance in a heterogeneous cellular network (or hetnet) is conducted. A general hetnet model is considered consisting of an arbitrary number of open-access and closed-access tier of base stations (BSs) arranged according to independent homogeneous Poisson point processes. The BSs of each tier have a constant transmission power, random fading coefficient with an arbitrary distribution and arbitrary path-loss exponent of the power-law path-loss model. For such a system, analytical characterizations for the coverage probability and average rate at an arbitrary mobile-station (MS), and average per-tier load are derived for both the max-SINR connectivity and nearest-BS connectivity models. Using stochastic ordering, interesting properties and simplifications for the hetnet downlink performance are derived by relating these two connectivity models to the maximum instantaneous received power (MIRP) connectivity model and the maximum biased received power (MBRP) connectivity models, respectively, providing good insights about the hetnets and the downlink performance in these complex networks. Furthermore, the results also demonstrate the effectiveness and analytical tractability of the stochastic geometric approach to study the hetnet performance.
△ Less
Submitted 29 March, 2014;
originally announced March 2014.
-
Duality and Optimization for Generalized Multi-hop MIMO Amplify-and-Forward Relay Networks with Linear Constraints
Authors:
An Liu,
Vincent K. N. Lau,
Youjian Liu
Abstract:
We consider a generalized multi-hop MIMO amplify-and-forward (AF) relay network with multiple sources/destinations and arbitrarily number of relays. We establish two dualities and the corresponding dual transformations between such a network and its dual, respectively under single network linear constraint and per-hop linear constraint. The result is a generalization of the previous dualities unde…
▽ More
We consider a generalized multi-hop MIMO amplify-and-forward (AF) relay network with multiple sources/destinations and arbitrarily number of relays. We establish two dualities and the corresponding dual transformations between such a network and its dual, respectively under single network linear constraint and per-hop linear constraint. The result is a generalization of the previous dualities under different special cases and is proved using new techniques which reveal more insight on the duality structure that can be exploited to optimize MIMO precoders. A unified optimization framework is proposed to find a stationary point for an important class of non-convex optimization problems of AF relay networks based on a local Lagrange dual method, where the primal algorithm only finds a stationary point for the inner loop problem of maximizing the Lagrangian w.r.t. the primal variables. The input covariance matrices are shown to satisfy a polite water-filling structure at a stationary point of the inner loop problem. The duality and polite water-filling are exploited to design fast primal algorithms. Compared to the existing algorithms, the proposed optimization framework with duality-based primal algorithms can be used to solve more general problems with lower computation cost.
△ Less
Submitted 20 January, 2013;
originally announced January 2013.
-
Downlink Coverage Analysis in a Heterogeneous Cellular Network
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X. Brown
Abstract:
In this paper, we consider the downlink signal-to-interference-plus-noise ratio (SINR) analysis in a heterogeneous cellular network with K tiers. Each tier is characterized by a base-station (BS) arrangement according to a homogeneous Poisson point process with certain BS density, transmission power, random shadow fading factors with arbitrary distribution, arbitrary path-loss exponent and a certa…
▽ More
In this paper, we consider the downlink signal-to-interference-plus-noise ratio (SINR) analysis in a heterogeneous cellular network with K tiers. Each tier is characterized by a base-station (BS) arrangement according to a homogeneous Poisson point process with certain BS density, transmission power, random shadow fading factors with arbitrary distribution, arbitrary path-loss exponent and a certain bias towards admitting the mobile-station (MS). The MS associates with the BS that has the maximum SINR under the open access cell association scheme. For such a general setting, we provide an analytical characterization of the coverage probability at the MS.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Stochastic Ordering based Carrier-to-Interference Ratio Analysis for the Shotgun Cellular Systems
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
A simple analytical tool based on stochastic ordering is developed to compare the distributions of carrier-to-interference ratio at the mobile station of two cellular systems where the base stations are distributed randomly according to certain non-homogeneous Poisson point processes. The comparison is conveniently done by studying only the base station densities without having to solve for the di…
▽ More
A simple analytical tool based on stochastic ordering is developed to compare the distributions of carrier-to-interference ratio at the mobile station of two cellular systems where the base stations are distributed randomly according to certain non-homogeneous Poisson point processes. The comparison is conveniently done by studying only the base station densities without having to solve for the distributions of the carrier-to-interference ratio, that are often hard to obtain.
△ Less
Submitted 14 October, 2011;
originally announced October 2011.
-
Multi-tier Network Performance Analysis using a Shotgun Cellular System
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian,
Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
This paper studies the carrier-to-interference ratio (CIR) and carrier-to-interference-plus-noise ratio (CINR) performance at the mobile station (MS) within a multi-tier network composed of M tiers of wireless networks, with each tier modeled as the homogeneous n-dimensional (n-D, n=1,2, and 3) shotgun cellular system, where the base station (BS) distribution is given by the homogeneous Poisson po…
▽ More
This paper studies the carrier-to-interference ratio (CIR) and carrier-to-interference-plus-noise ratio (CINR) performance at the mobile station (MS) within a multi-tier network composed of M tiers of wireless networks, with each tier modeled as the homogeneous n-dimensional (n-D, n=1,2, and 3) shotgun cellular system, where the base station (BS) distribution is given by the homogeneous Poisson point process in n-D. The CIR and CINR at the MS in a single tier network are thoroughly analyzed to simplify the analysis of the multi-tier network. For the multi-tier network with given system parameters, the following are the main results of this paper: (1) semi-analytical expressions for the tail probabilities of CIR and CINR; (2) a closed form expression for the tail probability of CIR in the range [1,Infinity); (3) a closed form expression for the tail probability of an approximation to CIR in the entire range [0,Infinity); (4) a lookup table based approach for obtaining the tail probability of CINR, and (5) the study of the effect of shadow fading and BSs with ideal sectorized antennas on the CIR and CINR. Based on these results, it is shown that, in a practical cellular system, the installation of additional wireless networks (microcells, picocells and femtocells) with low power BSs over the already existing macrocell network will always improve the CINR performance at the MS.
△ Less
Submitted 14 October, 2011;
originally announced October 2011.
-
MIMO B-MAC Interference Network Optimization under Rate Constraints by Polite Water-filling and Duality
Authors:
An Liu,
Youjian,
Liu,
Haige Xiang,
Wu Luo
Abstract:
We take two new approaches to design efficient algorithms for transmitter optimization under rate constraints, to guarantee the Quality of Service in general MIMO interference networks, which is a combination of multiple interfering broadcast channels (BC) and multiaccess channels (MAC) and is named B-MAC Networks. Two related optimization problems, maximizing the minimum of weighted rates under a…
▽ More
We take two new approaches to design efficient algorithms for transmitter optimization under rate constraints, to guarantee the Quality of Service in general MIMO interference networks, which is a combination of multiple interfering broadcast channels (BC) and multiaccess channels (MAC) and is named B-MAC Networks. Two related optimization problems, maximizing the minimum of weighted rates under a sum-power constraint and minimizing the sum-power under rate constraints, are considered. The first approach takes advantage of existing efficient algorithms for SINR problems by building a bridge between rate and SINR through the design of optimal mappings between them. The approach can be applied to other optimization problems as well. The second approach employs polite water-filling, which is the optimal network version of water-filling that we recently found. It replaces most generic optimization algorithms currently used for networks and reduces the complexity while demonstrating superior performance even in non-convex cases. Both centralized and distributed algorithms are designed and the performance is analyzed in addition to numeric examples.
△ Less
Submitted 6 July, 2010;
originally announced July 2010.
-
Technical Report: MIMO B-MAC Interference Network Optimization under Rate Constraints by Polite Water-filling and Duality
Authors:
An Liu,
Youjian,
Liu,
Haige Xiang,
Wu Luo
Abstract:
We take two new approaches to design efficient algorithms for transmitter optimization under rate constraints to guarantee the Quality of Service in general MIMO interference networks, named B-MAC Networks, which is a combination of multiple interfering broadcast channels (BC) and multiaccess channels (MAC). Two related optimization problems, maximizing the minimum of weighted rates under a sum-po…
▽ More
We take two new approaches to design efficient algorithms for transmitter optimization under rate constraints to guarantee the Quality of Service in general MIMO interference networks, named B-MAC Networks, which is a combination of multiple interfering broadcast channels (BC) and multiaccess channels (MAC). Two related optimization problems, maximizing the minimum of weighted rates under a sum-power constraint and minimizing the sum-power under rate constraints, are considered. The first approach takes advantage of existing efficient algorithms for SINR problems by building a bridge between rate and SINR through the design of optimal mappings between them so that the problems can be converted to SINR constraint problems. The approach can be applied to other optimization problems as well. The second approach employs polite water-filling, which is the optimal network version of water-filling that we recently found. It replaces almost all generic optimization algorithms currently used for networks and reduces the complexity while demonstrating superior performance even in non-convex cases. Both centralized and distributed algorithms are designed and the performance is analyzed in addition to numeric examples.
△ Less
Submitted 28 June, 2010;
originally announced June 2010.
-
Duality, Polite Water-filling, and Optimization for MIMO B-MAC Interference Networks and iTree Networks
Authors:
An Liu,
Youjian,
Liu,
Haige Xiang,
Wu Luo
Abstract:
This paper gives the long sought network version of water-filling named as polite water-filling. Unlike in single-user MIMO channels, where no one uses general purpose optimization algorithms in place of the simple and optimal water-filling for transmitter optimization, the traditional water-filling is generally far from optimal in networks as simple as MIMO multiaccess channels (MAC) and broadcas…
▽ More
This paper gives the long sought network version of water-filling named as polite water-filling. Unlike in single-user MIMO channels, where no one uses general purpose optimization algorithms in place of the simple and optimal water-filling for transmitter optimization, the traditional water-filling is generally far from optimal in networks as simple as MIMO multiaccess channels (MAC) and broadcast channels (BC), where steepest ascent algorithms have been used except for the sum-rate optimization. This is changed by the polite water-filling that is optimal for all boundary points of the capacity regions of MAC and BC and for all boundary points of a set of achievable regions of a more general class of MIMO B-MAC interference networks, which is a combination of multiple interfering broadcast channels, from the transmitter point of view, and multiaccess channels, from the receiver point of view, including MAC, BC, interference channels, X networks, and most practical wireless networks as special case. It is polite because it strikes an optimal balance between reducing interference to others and maximizing a link's own rate. Employing it, the related optimizations can be vastly simplified by taking advantage of the structure of the problems. Deeply connected to the polite water-filling, the rate duality is extended to the forward and reverse links of the B-MAC networks. As a demonstration, weighted sum-rate maximization algorithms based on polite water-filling and duality with superior performance and low complexity are designed for B-MAC networks and are analyzed for Interference Tree (iTree) Networks, a sub-class of the B-MAC networks that possesses promising properties for further information theoretic study.
△ Less
Submitted 4 February, 2014; v1 submitted 14 April, 2010;
originally announced April 2010.
-
Downlink Performance Analysis for a Generalized Shotgun Cellular System
Authors:
Prasanna Madhusudhanan,
Juan G. Restrepo,
Youjian Liu,
Timothy X Brown,
Kenneth R. Baker
Abstract:
In this paper, we analyze the signal-to-interference-plus-noise ratio (SINR) performance at a mobile station (MS) in a random cellular network. The cellular network is formed by base-stations (BSs) placed in a one, two or three dimensional space according to a possibly non-homogeneous Poisson point process, which is a generalization of the so-called shotgun cellular system. We develop a sequence o…
▽ More
In this paper, we analyze the signal-to-interference-plus-noise ratio (SINR) performance at a mobile station (MS) in a random cellular network. The cellular network is formed by base-stations (BSs) placed in a one, two or three dimensional space according to a possibly non-homogeneous Poisson point process, which is a generalization of the so-called shotgun cellular system. We develop a sequence of equivalence relations for the SCSs and use them to derive semi-analytical expressions for the coverage probability at the MS when the transmissions from each BS may be affected by random fading with arbitrary distributions as well as attenuation following arbitrary path-loss models. For homogeneous Poisson point processes in the interference-limited case with power-law path-loss model, we show that the SINR distribution is the same for all fading distributions and is not a function of the base station density. In addition, the influence of random transmission powers, power control, multiple channel reuse groups on the downlink performance are also discussed. The techniques developed for the analysis of SINR have applications beyond cellular networks and can be used in similar studies for cognitive radio networks, femtocell networks and other heterogeneous and multi-tier networks.
△ Less
Submitted 12 December, 2012; v1 submitted 20 February, 2010;
originally announced February 2010.
-
Sum-capacity of Interference Channels with a Local View: Impact of Distributed Decisions
Authors:
Vaneet Aggarwal,
Youjian Liu,
Ashutosh Sabharwal
Abstract:
Due to the large size of wireless networks, it is often impractical for nodes to track changes in the complete network state. As a result, nodes have to make distributed decisions about their transmission and reception parameters based on their local view of the network. In this paper, we characterize the impact of distributed decisions on the global network performance in terms of achievable su…
▽ More
Due to the large size of wireless networks, it is often impractical for nodes to track changes in the complete network state. As a result, nodes have to make distributed decisions about their transmission and reception parameters based on their local view of the network. In this paper, we characterize the impact of distributed decisions on the global network performance in terms of achievable sum-rates. We first formalize the concept of local view by proposing a protocol abstraction using the concept of local message passing. In the proposed protocol, nodes forward information about the network state to other neighboring nodes, thereby allowing network state information to trickle to all the nodes. The protocol proceeds in rounds, where all transmitters send a message followed by a message by all receivers. The number of rounds then provides a natural metric to quantify the extent of local information at each node.
We next study three network connectivities, Z-channel, a three-user double Z-channel and a reduced-parametrization $K$-user stacked Z-channel. In each case, we characterize achievable sum-rate with partial message passing leading to three main results. First, in many cases, nodes can make distributed decisions with only local information about the network and can still achieve the same sum-capacity as can be attained with global information irrespective of the actual channel gains. Second, for the case of three-user double Z-channel, we show that universal optimality is not achievable if the per node information is below a threshold. Third, using reduced parametrization $K$-user channel, we show that very few protocol rounds are needed for the case of very weak or very strong interference.
△ Less
Submitted 19 October, 2009;
originally announced October 2009.
-
A Simple Converse Proof and a Unified Capacity Formula for Channels with Input Constraints
Authors:
Youjian Liu
Abstract:
Given the single-letter capacity formula and the converse proof of a channel without constraints, we provide a simple approach to extend the results for the same channel but with constraints. The resulting capacity formula is the minimum of a Lagrange dual function. It gives an unified formula in the sense that it works regardless whether the problem is convex. If the problem is non-convex, we s…
▽ More
Given the single-letter capacity formula and the converse proof of a channel without constraints, we provide a simple approach to extend the results for the same channel but with constraints. The resulting capacity formula is the minimum of a Lagrange dual function. It gives an unified formula in the sense that it works regardless whether the problem is convex. If the problem is non-convex, we show that the capacity can be larger than the formula obtained by the naive approach of imposing constraints on the maximization in the capacity formula of the case without the constraints.
The extension on the converse proof is simply by adding a term involving the Lagrange multiplier and the constraints. The rest of the proof does not need to be changed. We name the proof method the Lagrangian Converse Proof. In contrast, traditional approaches need to construct a better input distribution for convex problems or need to introduce a time sharing variable for non-convex problems. We illustrate the Lagrangian Converse Proof for three channels, the classic discrete time memoryless channel, the channel with non-causal channel-state information at the transmitter, the channel with limited channel-state feedback. The extension to the rate distortion theory is also provided.
△ Less
Submitted 30 June, 2008;
originally announced July 2008.
-
How Many Users should be Turned On in a Multi-Antenna Broadcast Channel?
Authors:
Wei Dai,
Youjian,
Liu,
Brian C. Rider,
Wen Gao
Abstract:
This paper considers broadcast channels with L antennas at the base station and m single-antenna users, where L and m are typically of the same order. We assume that only partial channel state information is available at the base station through a finite rate feedback. Our key observation is that the optimal number of on-users (users turned on), say s, is a function of signal-to-noise ratio (SNR…
▽ More
This paper considers broadcast channels with L antennas at the base station and m single-antenna users, where L and m are typically of the same order. We assume that only partial channel state information is available at the base station through a finite rate feedback. Our key observation is that the optimal number of on-users (users turned on), say s, is a function of signal-to-noise ratio (SNR) and feedback rate. In support of this, an asymptotic analysis is employed where L, m and the feedback rate approach infinity linearly. We derive the asymptotic optimal feedback strategy as well as a realistic criterion to decide which users should be turned on. The corresponding asymptotic throughput per antenna, which we define as the spatial efficiency, turns out to be a function of the number of on-users s, and therefore s must be chosen appropriately. Based on the asymptotics, a scheme is developed for systems with finite many antennas and users. Compared with other studies in which s is presumed constant, our scheme achieves a significant gain. Furthermore, our analysis and scheme are valid for heterogeneous systems where different users may have different path loss coefficients and feedback rates.
△ Less
Submitted 17 June, 2008; v1 submitted 9 May, 2008;
originally announced May 2008.
-
Joint Beamforming for Multiaccess MIMO Systems with Finite Rate Feedback
Authors:
Wei Dai,
Brian C. Rider,
Youjian Liu
Abstract:
This paper considers multiaccess multiple-input multiple-output (MIMO) systems with finite rate feedback. The goal is to understand how to efficiently employ the given finite feedback resource to maximize the sum rate by characterizing the performance analytically. Towards this, we propose a joint quantization and feedback strategy: the base station selects the strongest users, jointly quantizes…
▽ More
This paper considers multiaccess multiple-input multiple-output (MIMO) systems with finite rate feedback. The goal is to understand how to efficiently employ the given finite feedback resource to maximize the sum rate by characterizing the performance analytically. Towards this, we propose a joint quantization and feedback strategy: the base station selects the strongest users, jointly quantizes their strongest eigen-channel vectors and broadcasts a common feedback to all the users. This joint strategy is different from an individual strategy, in which quantization and feedback are performed across users independently, and it improves upon the individual strategy in the same way that vector quantization improves upon scalar quantization. In our proposed strategy, the effect of user selection is analyzed by extreme order statistics, while the effect of joint quantization is quantified by what we term ``the composite Grassmann manifold''. The achievable sum rate is then estimated by random matrix theory. Due to its simple implementation and solid performance analysis, the proposed scheme provides a benchmark for multiaccess MIMO systems with finite rate feedback.
△ Less
Submitted 16 April, 2008; v1 submitted 2 April, 2008;
originally announced April 2008.
-
Unequal dimensional small balls and quantization on Grassmann Manifolds
Authors:
Wei Dai,
Brian Rider,
Youjian Liu
Abstract:
The Grassmann manifold G_{n,p}(L) is the set of all p-dimensional planes (through the origin) in the n-dimensional Euclidean space L^{n}, where L is either R or C. This paper considers an unequal dimensional quantization in which a source in G_{n,p}(L) is quantized through a code in G_{n,q}(L), where p and q are not necessarily the same. It is different from most works in literature where p\equi…
▽ More
The Grassmann manifold G_{n,p}(L) is the set of all p-dimensional planes (through the origin) in the n-dimensional Euclidean space L^{n}, where L is either R or C. This paper considers an unequal dimensional quantization in which a source in G_{n,p}(L) is quantized through a code in G_{n,q}(L), where p and q are not necessarily the same. It is different from most works in literature where p\equiv q. The analysis for unequal dimensional quantization is based on the volume of a metric ball in G_{n,p}(L) whose center is in G_{n,q}(L). Our chief result is a closed-form formula for the volume of a metric ball when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary n, p, q and L, while previous results pertained only to some special cases. Based on this volume formula, several bounds are derived for the rate distortion tradeoff assuming the quantization rate is sufficiently high. The lower and upper bounds on the distortion rate function are asymptotically identical, and so precisely quantify the asymptotic rate distortion tradeoff. We also show that random codes are asymptotically optimal in the sense that they achieve the minimum achievable distortion with probability one as n and the code rate approach infinity linearly. Finally, we discuss some applications of the derived results to communication theory. A geometric interpretation in the Grassmann manifold is developed for capacity calculation of additive white Gaussian noise channel. Further, the derived distortion rate function is beneficial to characterizing the effect of beamforming matrix selection in multi-antenna communications.
△ Less
Submitted 16 May, 2007;
originally announced May 2007.
-
How Many Users should be Turned On in a Multi-Antenna Broadcast Channel?
Authors:
Wei Dai,
Youjian,
Liu,
Brian Rider
Abstract:
This paper considers broadcast channels with L antennas at the base station and m single-antenna users, where each user has perfect channel knowledge and the base station obtains channel information through a finite rate feedback. The key observation of this paper is that the optimal number of on-users (users turned on), say s, is a function of signal-to-noise ratio (SNR) and other system parame…
▽ More
This paper considers broadcast channels with L antennas at the base station and m single-antenna users, where each user has perfect channel knowledge and the base station obtains channel information through a finite rate feedback. The key observation of this paper is that the optimal number of on-users (users turned on), say s, is a function of signal-to-noise ratio (SNR) and other system parameters. Towards this observation, we use asymptotic analysis to guide the design of feedback and transmission strategies. As L, m and the feedback rates approach infinity linearly, we derive the asymptotic optimal feedback strategy and a realistic criterion to decide which users should be turned on. Define the corresponding asymptotic throughput per antenna as the spatial efficiency. It is a function of the number of on-users s, and therefore, s should be appropriately chosen. Based on the above asymptotic results, we also develop a scheme for a system with finite many antennas and users. Compared with other works where s is presumed constant, our scheme achieves a significant gain by choosing the appropriate s. Furthermore, our analysis and scheme is valid for heterogeneous systems where different users may have different path loss coefficients and feedback rates.
△ Less
Submitted 15 May, 2007;
originally announced May 2007.
-
On the Information Rate of MIMO Systems with Finite Rate Channel State Feedback and Power On/Off Strategy
Authors:
Wei Dai,
Youjian Liu,
Brian Rider,
Vincent K. N. Lau
Abstract:
This paper quantifies the information rate of multiple-input multiple-output (MIMO) systems with finite rate channel state feedback and power on/off strategy. In power on/off strategy, a beamforming vector (beam) is either turned on (denoted by on-beam) with a constant power or turned off. We prove that the ratio of the optimal number of on-beams and the number of antennas converges to a constan…
▽ More
This paper quantifies the information rate of multiple-input multiple-output (MIMO) systems with finite rate channel state feedback and power on/off strategy. In power on/off strategy, a beamforming vector (beam) is either turned on (denoted by on-beam) with a constant power or turned off. We prove that the ratio of the optimal number of on-beams and the number of antennas converges to a constant for a given signal-to-noise ratio (SNR) when the number of transmit and receive antennas approaches infinity simultaneously and when beamforming is perfect. Based on this result, a near optimal strategy, i.e., power on/off strategy with a constant number of on-beams, is discussed. For such a strategy, we propose the power efficiency factor to quantify the effect of imperfect beamforming. A formula is proposed to compute the maximum power efficiency factor achievable given a feedback rate. The information rate of the overall MIMO system can be approximated by combining the asymptotic results and the formula for power efficiency factor. Simulations show that this approximation is accurate for all SNR regimes.
△ Less
Submitted 15 May, 2007;
originally announced May 2007.
-
Quantization Bounds on Grassmann Manifolds of Arbitrary Dimensions and MIMO Communications with Feedback
Authors:
Wei Dai,
Youjian Liu,
Brian Rider
Abstract:
This paper considers the quantization problem on the Grassmann manifold with dimension n and p. The unique contribution is the derivation of a closed-form formula for the volume of a metric ball in the Grassmann manifold when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary dimension n and p, while previous results are only valid for either p=1 o…
▽ More
This paper considers the quantization problem on the Grassmann manifold with dimension n and p. The unique contribution is the derivation of a closed-form formula for the volume of a metric ball in the Grassmann manifold when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary dimension n and p, while previous results are only valid for either p=1 or a fixed p with asymptotically large n. Based on the volume formula, the Gilbert-Varshamov and Hamming bounds for sphere packings are obtained. Assuming a uniformly distributed source and a distortion metric based on the squared chordal distance, tight lower and upper bounds are established for the distortion rate tradeoff. Simulation results match the derived results. As an application of the derived quantization bounds, the information rate of a Multiple-Input Multiple-Output (MIMO) system with finite-rate channel-state feedback is accurately quantified for arbitrary finite number of antennas, while previous results are only valid for either Multiple-Input Single-Output (MISO) systems or those with asymptotically large number of transmit antennas but fixed number of receive antennas.
△ Less
Submitted 15 May, 2007;
originally announced May 2007.
-
Multi-Access MIMO Systems with Finite Rate Channel State Feedback
Authors:
Wei Dai,
Brian Rider,
Youjian Liu
Abstract:
This paper characterizes the effect of finite rate channel state feedback on the sum rate of a multi-access multiple-input multiple-output (MIMO) system. We propose to control the users jointly, specifically, we first choose the users jointly and then select the corresponding beamforming vectors jointly. To quantify the sum rate, this paper introduces the composite Grassmann manifold and the com…
▽ More
This paper characterizes the effect of finite rate channel state feedback on the sum rate of a multi-access multiple-input multiple-output (MIMO) system. We propose to control the users jointly, specifically, we first choose the users jointly and then select the corresponding beamforming vectors jointly. To quantify the sum rate, this paper introduces the composite Grassmann manifold and the composite Grassmann matrix. By characterizing the distortion rate function on the composite Grassmann manifold and calculating the logdet function of a random composite Grassmann matrix, a good sum rate approximation is derived. According to the distortion rate function on the composite Grassmann manifold, the loss due to finite beamforming decreases exponentially as the feedback bits on beamforming increases.
△ Less
Submitted 15 May, 2007;
originally announced May 2007.
-
Effect of Finite Rate Feedback on CDMA Signature Optimization and MIMO Beamforming Vector Selection
Authors:
Wei Dai,
Youjian Liu,
Brian Rider
Abstract:
We analyze the effect of finite rate feedback on CDMA (code-division multiple access) signature optimization and MIMO (multi-input-multi-output) beamforming vector selection. In CDMA signature optimization, for a particular user, the receiver selects a signature vector from a codebook to best avoid interference from other users, and then feeds the corresponding index back to the specified user.…
▽ More
We analyze the effect of finite rate feedback on CDMA (code-division multiple access) signature optimization and MIMO (multi-input-multi-output) beamforming vector selection. In CDMA signature optimization, for a particular user, the receiver selects a signature vector from a codebook to best avoid interference from other users, and then feeds the corresponding index back to the specified user. For MIMO beamforming vector selection, the receiver chooses a beamforming vector from a given codebook to maximize throughput, and feeds back the corresponding index to the transmitter. These two problems are dual: both can be modeled as selecting a unit norm vector from a finite size codebook to "match" a randomly generated Gaussian matrix. In signature optimization, the least match is required while the maximum match is preferred for beamforming selection.
Assuming that the feedback link is rate limited, our main result is an exact asymptotic performance formulae where the length of the signature/beamforming vector, the dimensions of interference/channel matrix, and the feedback rate approach infinity with constant ratios. The proof rests on a large deviation principle over a random matrix ensemble. Further, we show that random codebooks generated from the isotropic distritution are asymptotically optimal not only on average, but also with probability one.
△ Less
Submitted 2 March, 2007; v1 submitted 16 December, 2006;
originally announced December 2006.
-
On the Information Rate of MIMO Systems with Finite Rate Channel State Feedback Using Beamforming and Power On/Off Strategy
Authors:
Wei Dai,
Youjian Liu,
Vincent K. N. Lau,
Brian Rider
Abstract:
It is well known that Multiple-Input Multiple-Output (MIMO) systems have high spectral efficiency, especially when channel state information at the transmitter (CSIT) is available. When CSIT is obtained by feedback, it is practical to assume that the channel state feedback rate is finite and the CSIT is not perfect. For such a system, we consider beamforming and power on/off strategy for its sim…
▽ More
It is well known that Multiple-Input Multiple-Output (MIMO) systems have high spectral efficiency, especially when channel state information at the transmitter (CSIT) is available. When CSIT is obtained by feedback, it is practical to assume that the channel state feedback rate is finite and the CSIT is not perfect. For such a system, we consider beamforming and power on/off strategy for its simplicity and near optimality, where power on/off means that a beamforming vector (beam) is either turned on with a constant power or turned off. The main contribution of this paper is to accurately evaluate the information rate as a function of the channel state feedback rate. Name a beam turned on as an on-beam and the minimum number of the transmit and receive antennas as the dimension of a MIMO system. We prove that the ratio of the optimal number of on-beams and the system dimension converges to a constant for a given signal-to-noise ratio (SNR) when the numbers of transmit and receive antennas approach infinity simultaneously and when beamforming is perfect. Asymptotic formulas are derived to evaluate this ratio and the corresponding information rate per dimension. The asymptotic results can be accurately applied to finite dimensional systems and suggest a power on/off strategy with a constant number of on-beams. For this suboptimal strategy, we take a novel approach to introduce power efficiency factor, which is a function of the feedback rate, to quantify the effect of imperfect beamforming. By combining power efficiency factor and the asymptotic formulas for perfect beamforming case, the information rate of the power on/off strategy with a constant number of on-beams is accurately characterized.
△ Less
Submitted 9 March, 2006;
originally announced March 2006.
-
Quantization Bounds on Grassmann Manifolds and Applications to MIMO Communications
Authors:
Wei Dai,
Youjian Liu,
Brian Rider
Abstract:
This paper considers the quantization problem on the Grassmann manifold \mathcal{G}_{n,p}, the set of all p-dimensional planes (through the origin) in the n-dimensional Euclidean space. The chief result is a closed-form formula for the volume of a metric ball in the Grassmann manifold when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary dimensio…
▽ More
This paper considers the quantization problem on the Grassmann manifold \mathcal{G}_{n,p}, the set of all p-dimensional planes (through the origin) in the n-dimensional Euclidean space. The chief result is a closed-form formula for the volume of a metric ball in the Grassmann manifold when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary dimension n and p, while previous results pertained only to p=1, or a fixed p with asymptotically large n. Based on this result, several quantization bounds are derived for sphere packing and rate distortion tradeoff. We establish asymptotically equivalent lower and upper bounds for the rate distortion tradeoff. Since the upper bound is derived by constructing random codes, this result implies that the random codes are asymptotically optimal. The above results are also extended to the more general case, in which \mathcal{G}_{n,q} is quantized through a code in \mathcal{G}_{n,p}, where p and q are not necessarily the same. Finally, we discuss some applications of the derived results to multi-antenna communication systems.
△ Less
Submitted 15 May, 2006; v1 submitted 9 March, 2006;
originally announced March 2006.
-
Performance Analysis of CDMA Signature Optimization with Finite Rate Feedback
Authors:
Wei Dai,
Youjian Liu,
Brian Rider
Abstract:
We analyze the performance of CDMA signature optimization with finite rate feedback. For a particular user, the receiver selects a signature vector from a signature codebook to avoid the interference from other users, and feeds the corresponding index back to this user through a finite rate and error-free feedback link. We assume the codebook is randomly constructed where the entries are indepen…
▽ More
We analyze the performance of CDMA signature optimization with finite rate feedback. For a particular user, the receiver selects a signature vector from a signature codebook to avoid the interference from other users, and feeds the corresponding index back to this user through a finite rate and error-free feedback link. We assume the codebook is randomly constructed where the entries are independent and isotropically distributed. It has been shown that the randomly constructed codebook is asymptotically optimal. In this paper, we consider two types of signature selection criteria. One is to select the signature vector that minimizes the interference from other users. The other one is to select the signature vector to match the weakest interference directions. By letting the processing gain, number of users and feedback bits approach infinity with fixed ratios, we derive the exact asymptotic formulas to calculate the average interference for both criteria. Our simulations demonstrate the theoretical formulas. The analysis can be extended to evaluate the signal-to-interference plus noise ratio performance for both match filter and linear minimum mean-square error receivers.
△ Less
Submitted 8 March, 2006;
originally announced March 2006.