-
You Only Recognize Once: Towards Fast Video Text Spotting
Authors:
Zhanzhan Cheng,
Jing Lu,
Yi Niu,
Shiliang Pu,
Fei Wu,
Shuigeng Zhou
Abstract:
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as t…
▽ More
Video text spotting is still an important research topic due to its various real-applications. Previous approaches usually fall into the four-staged pipeline: text detection in individual images, framewisely recognizing localized text regions, tracking text streams and generating final results with complicated post-processing skills, which might suffer from the huge computational cost as well as the interferences of low-quality text. In this paper, we propose a fast and robust video text spotting framework by only recognizing the localized text one-time instead of frame-wisely recognition. Specifically, we first obtain text regions in videos with a well-designed spatial-temporal detector. Then we concentrate on developing a novel text recommender for selecting the highest-quality text from text streams and only recognizing the selected ones. Here, the recommender assembles text tracking, quality scoring and recognition into an end-to-end trainable module, which not only avoids the interferences from low-quality text but also dramatically speeds up the video text spotting process. In addition, we collect a larger scale video text dataset (LSVTD) for promoting the video text spotting community, which contains 100 text videos from 22 different real-life scenarios. Extensive experiments on two public benchmarks show that our method greatly speeds up the recognition process averagely by 71 times compared with the frame-wise manner, and also achieves the remarkable state-of-the-art.
△ Less
Submitted 25 October, 2021; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Collaborative Spatio-temporal Feature Learning for Video Action Recognition
Authors:
Chao Li,
Qiaoyong Zhong,
Di Xie,
Shiliang Pu
Abstract:
Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnabl…
▽ More
Spatio-temporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel neural operation which encodes spatio-temporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. In particular, we perform 2D convolution along three orthogonal views of volumetric video data,which learns spatial appearance and temporal motion cues respectively. By sharing the convolution kernels of different views, spatial and temporal features are collaboratively learned and thus benefit from each other. The complementary features are subsequently fused by a weighted summation whose coefficients are learned end-to-end. Our approach achieves state-of-the-art performance on large-scale benchmarks and won the 1st place in the Moments in Time Challenge 2018. Moreover, based on the learned coefficients of different views, we are able to quantify the contributions of spatial and temporal features. This analysis sheds light on interpretability of the model and may also guide the future design of algorithm for video recognition.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction
Authors:
Yujin Yuan,
Liyuan Liu,
Siliang Tang,
Zhongfei Zhang,
Yueting Zhuang,
Shiliang Pu,
Fei Wu,
Xiang Ren
Abstract:
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selec…
▽ More
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C$^2$SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks
Authors:
Weijie Chen,
Yuan Zhang,
Di Xie,
Shiliang Pu
Abstract:
Neuron pruning is an efficient method to compress the network into a slimmer one for reducing the computational cost and storage overhead. Most of state-of-the-art results are obtained in a layer-by-layer optimization mode. It discards the unimportant input neurons and uses the survived ones to reconstruct the output neurons approaching to the original ones in a layer-by-layer manner. However, an…
▽ More
Neuron pruning is an efficient method to compress the network into a slimmer one for reducing the computational cost and storage overhead. Most of state-of-the-art results are obtained in a layer-by-layer optimization mode. It discards the unimportant input neurons and uses the survived ones to reconstruct the output neurons approaching to the original ones in a layer-by-layer manner. However, an unnoticed problem arises that the information loss is accumulated as layer increases since the survived neurons still do not encode the entire information as before. A better alternative is to propagate the entire useful information to reconstruct the pruned layer instead of directly discarding the less important neurons. To this end, we propose a novel Layer Decomposition-Recomposition Framework (LDRF) for neuron pruning, by which each layer's output information is recovered in an embedding space and then propagated to reconstruct the following pruned layers with useful information preserved. We mainly conduct our experiments on ILSVRC-12 benchmark with VGG-16 and ResNet-50. What should be emphasized is that our results before end-to-end fine-tuning are significantly superior owing to the information-preserving property of our proposed framework.With end-to-end fine-tuning, we achieve state-of-the-art results of 5.13x and 3x speed-up with only 0.5% and 0.65% top-5 accuracy drop respectively, which outperform the existing neuron pruning methods.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.
-
Learning Incremental Triplet Margin for Person Re-identification
Authors:
Yingying Zhang,
Qiaoyong Zhong,
Liang Ma,
Di Xie,
Shiliang Pu
Abstract:
Person re-identification (ReID) aims to match people across multiple non-overlapping video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is benefi…
▽ More
Person re-identification (ReID) aims to match people across multiple non-overlapping video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is beneficial. In particular, we propose a novel multi-stage training strategy which learns incremental triplet margin and improves triplet loss effectively. Multiple levels of feature maps are exploited to make the learned features more discriminative. Besides, we introduce global hard identity searching method to sample hard identities when generating a training batch. Extensive experiments on Market-1501, CUHK03, and DukeMTMCreID show that our approach yields a performance boost and outperforms most existing state-of-the-art methods.
△ Less
Submitted 16 December, 2018;
originally announced December 2018.
-
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Authors:
Long Chen,
Hanwang Zhang,
Jun Xiao,
Xiangnan He,
Shiliang Pu,
Shih-Fu Chang
Abstract:
Scene graphs -- objects as nodes and visual relationships as edges -- describe the whereabouts and interactions of the things and stuff in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects, fitting the dynamic nature of reasoning with visual context, eg, "perso…
▽ More
Scene graphs -- objects as nodes and visual relationships as edges -- describe the whereabouts and interactions of the things and stuff in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects, fitting the dynamic nature of reasoning with visual context, eg, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the category confidence of the two objects. However, we argue that the scene dynamics is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes are unfortunately penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach to resolve the mismatch. CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the dynamics of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art by significant performance gains under various settings and metrics.
△ Less
Submitted 9 August, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection
Authors:
Yunlu Xu,
Chengwei Zhang,
Zhanzhan Cheng,
Jianwen Xie,
Yi Niu,
Shiliang Pu,
Fei Wu
Abstract:
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weig…
▽ More
This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction of intervals of multiple actions. Specifically, we first assemble video clips according to class labels by an attention mechanism that learns class-variable attention weights and thus helps the noise relieving from background or other actions. Secondly, we build temporal relationship between actions by feeding the assembled features into an enhanced recurrent neural network. Finally, we transform the output of recurrent neural network into the corresponding action distribution. In order to generate more precise temporal proposals, we design a score term called segregated temporal gradient-weighted class activation mapping (ST-GradCAM) fused with attention weights. Experiments on THUMOS'14 and ActivityNet1.3 datasets show that our approach outperforms the state-of-the-art weakly-supervised method, and performs at par with the fully-supervised counterparts.
△ Less
Submitted 18 November, 2018;
originally announced November 2018.
-
Push-Pull Gradient Methods for Distributed Optimization in Networks
Authors:
Shi Pu,
Wei Shi,
Jinming Xu,
Angelia Nedić
Abstract:
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates,…
▽ More
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider new distributed gradient-based methods where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the gradients is pushed to the neighbors, while the information about the decision variable is pulled from the neighbors hence giving the name "push-pull gradient methods". The methods utilize two different graphs for the information exchange among agents, and as such, unify the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the proposed algorithms and their many variants converge linearly for strongly convex and smooth objective functions over a network (possibly with unidirectional data links) in both synchronous and asynchronous random-gossip settings. In particular, under the random-gossip setting, "push-pull" is the first class of algorithms for distributed optimization over directed graphs. Moreover, we numerically evaluate our proposed algorithms in both scenarios, and show that they outperform other existing linearly convergent schemes, especially for ill-conditioned problems and networks that are not well balanced.
△ Less
Submitted 6 February, 2020; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Deep Attentive Tracking via Reciprocative Learning
Authors:
Shi Pu,
Yibing Song,
Chao Ma,
Honggang Zhang,
Ming-Hsuan Yang
Abstract:
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selec…
▽ More
Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms. In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training. The deep classifier learns to attend to the regions of target objects robust to appearance changes. Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches.
△ Less
Submitted 15 October, 2018; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Eddy magnetization from the chiral Barnett effect
Authors:
Kenji Fukushima,
Shi Pu,
Zebin Qiu
Abstract:
We discuss the spin, the angular momentum, and the magnetic moment of rotating chiral fermions using a kinetic theory. We find that, in addition to the chiral vortical contribution along the rotation axis, finite circular spin polarization is induced by the spin-momentum correlation of chiral fermions, which is canceled by a change in the orbital angular momentum. We point out that the eddy magnet…
▽ More
We discuss the spin, the angular momentum, and the magnetic moment of rotating chiral fermions using a kinetic theory. We find that, in addition to the chiral vortical contribution along the rotation axis, finite circular spin polarization is induced by the spin-momentum correlation of chiral fermions, which is canceled by a change in the orbital angular momentum. We point out that the eddy magnetic moment is nonvanishing due to the $g$-factors, exhibiting the chiral Barnett effect.
△ Less
Submitted 11 April, 2019; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Extreme Network Compression via Filter Group Approximation
Authors:
Bo Peng,
Wenming Tan,
Zheyang Li,
Shun Zhang,
Di Xie,
Shiliang Pu
Abstract:
In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the fi…
▽ More
In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the filter group structure for each layer. For several commonly used CNN models, including VGG and ResNet, our method can reduce over 80% floating-point operations (FLOPs) with less accuracy drop than state-of-the-art methods on various image classification datasets. Besides, experiments demonstrate that our method is conducive to alleviating degeneracy of the compressed network, which hurts the convergence and performance of the network.
△ Less
Submitted 31 July, 2018; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Non-Equilibrium Quantum Transport of Chiral Fluids from Kinetic Theory
Authors:
Yoshimasa Hidaka,
Shi Pu,
Di-Lun Yang
Abstract:
We introduce the quantum-field-theory (QFT) derivation of chiral kinetic theory (CKT) from the Wigner-function approach, which manifests side jumps and non-scalar distribution functions associated with Lorentz covariance and incorporates both background fields and collisions. The formalism is utilized to investigate second-order responses of chiral fluids near local equilibrium. Such non-equilibri…
▽ More
We introduce the quantum-field-theory (QFT) derivation of chiral kinetic theory (CKT) from the Wigner-function approach, which manifests side jumps and non-scalar distribution functions associated with Lorentz covariance and incorporates both background fields and collisions. The formalism is utilized to investigate second-order responses of chiral fluids near local equilibrium. Such non-equilibrium anomalous transport is dissipative and affected by interactions. Contributions from both quantum corrections in anomalous hydrodynamic equations (EOM) of motion and those from the CKT and Wigner functions (WF) are considered in a relaxation-time approximation (RTA). Anomalous charged Hall currents engendered by background electric fields and temperature/chemical-potential gradients are obtained. Furthermore, chiral magnetic/vortical effects (CME/CVE) receive viscous corrections as non-equilibrium modifications stemming from the interplay between side jumps, magnetic-moment coupling, and chiral anomaly.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Axial Ward identity and the Schwinger mechanism -- Applications to the real-time chiral magnetic effect and condensates
Authors:
Patrick Copinger,
Kenji Fukushima,
Shi Pu
Abstract:
We elucidate chirality production under parity breaking constant electromagnetic fields, with which we clarify qualitative differences in and out of equilibrium. For a strong magnetic field the pair production from the Schwinger mechanism increments the chirality. The pair production rate is exponentially suppressed with mass according to the Schwinger formula, while the mass dependence of chirali…
▽ More
We elucidate chirality production under parity breaking constant electromagnetic fields, with which we clarify qualitative differences in and out of equilibrium. For a strong magnetic field the pair production from the Schwinger mechanism increments the chirality. The pair production rate is exponentially suppressed with mass according to the Schwinger formula, while the mass dependence of chirality production in the axial Ward identity appears in the pesudo-scalar term. We demonstrate that in equilibrium field theory calculus the axial anomaly is canceled by the pseudo-scalar condensate for any mass. In a real-time formulation with in- and out-states, we show that the axial Ward identity leads to the chirality production rate consistent with the Schwinger formula. We illuminate that such an in- and out-states formulation makes clear the chiral magnetic effect in and out of equilibrium, and we discuss further applications to real-time condensates.
△ Less
Submitted 12 July, 2018;
originally announced July 2018.
-
Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation
Authors:
Tao Song,
Leiyu Sun,
Di Xie,
Haiming Sun,
Shiliang Pu
Abstract:
A critical issue in pedestrian detection is to detect small-scale objects that will introduce feeble contrast and motion blur in images and videos, which in our opinion should partially resort to deep-rooted annotation bias. Motivated by this, we propose a novel method integrated with somatic topological line localization (TLL) and temporal feature aggregation for detecting multi-scale pedestrians…
▽ More
A critical issue in pedestrian detection is to detect small-scale objects that will introduce feeble contrast and motion blur in images and videos, which in our opinion should partially resort to deep-rooted annotation bias. Motivated by this, we propose a novel method integrated with somatic topological line localization (TLL) and temporal feature aggregation for detecting multi-scale pedestrians, which works particularly well with small-scale pedestrians that are relatively far from the camera. Moreover, a post-processing scheme based on Markov Random Field (MRF) is introduced to eliminate ambiguities in occlusion cases. Applying with these methodologies comprehensively, we achieve best detection performance on Caltech benchmark and improve performance of small-scale objects significantly (miss rate decreases from 74.53% to 60.79%). Beyond this, we also achieve competitive performance on CityPersons dataset and show the existence of annotation bias in KITTI dataset.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Swarming for Faster Convergence in Stochastic Optimization
Authors:
Shi Pu,
Alfredo Garcia
Abstract:
We study a distributed framework for stochastic optimization which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of $N > 1$ independent threads, implements in a distributed and unsynchronized fashion, a stochastic gradient-descent algorithm which is perturbed by a swarming potent…
▽ More
We study a distributed framework for stochastic optimization which is inspired by models of collective motion found in nature (e.g., swarming) with mild communication requirements. Specifically, we analyze a scheme in which each one of $N > 1$ independent threads, implements in a distributed and unsynchronized fashion, a stochastic gradient-descent algorithm which is perturbed by a swarming potential. Assuming the overhead caused by synchronization is not negligible, we show the swarming-based approach exhibits better performance than a centralized algorithm (based upon the average of $N$ observations) in terms of (real-time) convergence speed. We also derive an error bound that is monotone decreasing in network size and connectivity. We characterize the scheme's finite-time performances for both convex and non-convex objective functions.
△ Less
Submitted 6 August, 2018; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Distributed Stochastic Gradient Tracking Methods
Authors:
Shi Pu,
Angelia Nedić
Abstract:
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distri…
▽ More
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size $n$, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.
△ Less
Submitted 10 March, 2020; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Berry phase of the composite-fermion Fermi Sea: Effect of Landau-level mixing
Authors:
Songyang Pu,
Mikael Fremling,
J. K. Jain
Abstract:
We construct explicit lowest-Landau-level wave functions for the composite-fermion Fermi sea and its low energy excitations following a recently developed approach [Pu, Wu and Jain, Phys. Rev. B 96, 195302 (2018)] and demonstrate them to be very accurate representations of the Coulomb eigenstates. We further ask how the Berry phase associated with a closed loop around the Fermi circle, predicted t…
▽ More
We construct explicit lowest-Landau-level wave functions for the composite-fermion Fermi sea and its low energy excitations following a recently developed approach [Pu, Wu and Jain, Phys. Rev. B 96, 195302 (2018)] and demonstrate them to be very accurate representations of the Coulomb eigenstates. We further ask how the Berry phase associated with a closed loop around the Fermi circle, predicted to be $π$ in a Dirac composite fermion theory satisfying particle-hole symmetry [D. T. Son, Phys. Rev. X 5, 031027 (2015)], is affected by Landau level mixing. For this purpose, we consider a simple model wherein we determine the variational ground state as a function of Landau level mixing within the space spanned by two basis functions: the lowest-Landau-level projected and the unprojected composite-fermion Fermi sea wave functions. We evaluate Berry phase for a path around the Fermi circle within this model following a recent prescription, and find that it rotates rapidly as a function of Landau level mixing. We also consider the effect of a particle-hole symmetry breaking three-body interaction on the Berry phase while confining the Hilbert space to the lowest Landau level. Our study deepens the connection between the $π$ Berry phase and the exact particle-hole symmetry in the lowest Landau level.
△ Less
Submitted 1 October, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
A practical convolutional neural network as loop filter for intra frame
Authors:
Xiaodan Song,
Jiabao Yao,
Lulu Zhou,
Li Wang,
Xiaoyang Wu,
Di Xie,
Shiliang Pu
Abstract:
Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float poin…
▽ More
Loop filters are used in video coding to remove artifacts or improve performance. Recent advances in deploying convolutional neural network (CNN) to replace traditional loop filters show large gains but with problems for practical application. First, different model is used for frames encoded with different quantization parameter (QP), respectively. It is expensive for hardware. Second, float points operation in CNN leads to inconsistency between encoding and decoding across different platforms. Third, redundancy within CNN model consumes precious computational resources.
This paper proposes a CNN as the loop filter for intra frames and proposes a scheme to solve the above problems. It aims to design a single CNN model with low redundancy to adapt to decoded frames with different qualities and ensure consistency. To adapt to reconstructions with different qualities, both reconstruction and QP are taken as inputs. After training, the obtained model is compressed to reduce redundancy. To ensure consistency, dynamic fixed points (DFP) are adopted in testing CNN. Parameters in the compressed model are first quantized to DFP and then used for inference of CNN. Outputs of each layer in CNN are computed by DFP operations. Experimental results on JEM 7.0 report 3.14%, 5.21%, 6.28% BD-rate savings for luma and two chroma components with all intra configuration when replacing all traditional filters.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Edit Probability for Scene Text Recognition
Authors:
Fan Bai,
Zhanzhan Cheng,
Yi Niu,
Shiliang Pu,
Shuigeng Zhou
Abstract:
We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or s…
▽ More
We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The advantage lies in that the training process can focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem can be alleviated or even overcome. We conduct extensive experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets. Experimental results show that the EP can substantially boost scene text recognition performance.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation
Authors:
Chao Li,
Qiaoyong Zhong,
Di Xie,
Shiliang Pu
Abstract:
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature le…
▽ More
Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
A Distributed Stochastic Gradient Tracking Method
Authors:
Shi Pu,
Angelia Nedić
Abstract:
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distri…
▽ More
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method. We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant step size choice). More importantly, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size, which is a comparable performance to a centralized stochastic gradient algorithm. Numerical examples further demonstrate the effectiveness of the method.
△ Less
Submitted 1 August, 2019; v1 submitted 21 March, 2018;
originally announced March 2018.
-
A Push-Pull Gradient Method for Distributed Optimization in Networks
Authors:
Shi Pu,
Wei Shi,
Jinming Xu,
Angelia Nedić
Abstract:
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider a new distributed gradient-based method where each node maintains two estimates…
▽ More
In this paper, we focus on solving a distributed convex optimization problem in a network, where each agent has its own convex cost function and the goal is to minimize the sum of the agents' cost functions while obeying the network connectivity structure. In order to minimize the sum of the cost functions, we consider a new distributed gradient-based method where each node maintains two estimates, namely, an estimate of the optimal decision variable and an estimate of the gradient for the average of the agents' objective functions. From the viewpoint of an agent, the information about the decision variable is pushed to the neighbors, while the information about the gradients is pulled from the neighbors (hence giving the name "push-pull gradient method"). The method unifies the algorithms with different types of distributed architecture, including decentralized (peer-to-peer), centralized (master-slave), and semi-centralized (leader-follower) architecture. We show that the algorithm converges linearly for strongly convex and smooth objective functions over a directed static network. In our numerical test, the algorithm performs well even for time-varying directed networks.
△ Less
Submitted 1 August, 2019; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Abelian and non-Abelian Berry curvatures in lattice QCD
Authors:
Shi Pu,
Arata Yamamoto
Abstract:
We studied the Berry curvature of the massive Dirac fermion in 3+1 dimensions. For the non-interacting Dirac fermion, the Berry curvature is non-Abelian because of the degeneracy of positive and negative helicity modes. We calculated the non-Abelian Berry curvature analytically and numerically. For the interacting Dirac fermion in QCD, the degeneracy is lost because gluons carry helicity and color…
▽ More
We studied the Berry curvature of the massive Dirac fermion in 3+1 dimensions. For the non-interacting Dirac fermion, the Berry curvature is non-Abelian because of the degeneracy of positive and negative helicity modes. We calculated the non-Abelian Berry curvature analytically and numerically. For the interacting Dirac fermion in QCD, the degeneracy is lost because gluons carry helicity and color charge. We calculated the Abelian Berry curvature in lattice QCD.
△ Less
Submitted 5 June, 2018; v1 submitted 6 December, 2017;
originally announced December 2017.
-
AON: Towards Arbitrarily-Oriented Text Recognition
Authors:
Zhanzhan Cheng,
Yangliu Xu,
Fan Bai,
Yi Niu,
Shiliang Pu,
Shuigeng Zhou
Abstract:
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which h…
▽ More
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.
△ Less
Submitted 22 March, 2018; v1 submitted 11 November, 2017;
originally announced November 2017.
-
Cascade Region Proposal and Global Context for Deep Object Detection
Authors:
Qiaoyong Zhong,
Chao Li,
Yingying Zhang,
Di Xie,
Shicai Yang,
Shiliang Pu
Abstract:
Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a p…
▽ More
Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a performance boost (4.2% mAP gain on the ILSVRC 2016 validation set). Besides, we apply the idea of pre-training extensively and show its importance in both steps. Together with common training and testing tricks, we improve Faster R-CNN baseline by a large margin. In particular, we obtain 87.9% mAP on the PASCAL VOC 2012 test set, 65.3% on the ILSVRC 2016 test set and 36.8% on the COCO test-std set.
△ Less
Submitted 29 October, 2017;
originally announced October 2017.
-
Nonlinear Responses of Chiral Fluids from Kinetic Theory
Authors:
Yoshimasa Hidaka,
Shi Pu,
Di-Lun Yang
Abstract:
The second-order nonlinear responses of inviscid chiral fluids near local equilibrium are investigated by applying the chiral kinetic theory (CKT) incorporating side-jump effects. It is shown that the local equilibrium distribution function can be non-trivially introduced in a co-moving frame with respect to the fluid velocity when the quantum corrections in collisions are involved. For the study…
▽ More
The second-order nonlinear responses of inviscid chiral fluids near local equilibrium are investigated by applying the chiral kinetic theory (CKT) incorporating side-jump effects. It is shown that the local equilibrium distribution function can be non-trivially introduced in a co-moving frame with respect to the fluid velocity when the quantum corrections in collisions are involved. For the study of anomalous transport, contributions from both quantum corrections in anomalous hydrodynamic equations of motion and those from the CKT and Wigner functions are considered under the relaxation-time (RT) approximation, which result in anomalous charge Hall currents propagating along the cross product of the background electric field and the temperature (or chemical-potential) gradient and of the temperature and chemical-potential gradients. On the other hand, the nonlinear quantum correction on the charge density vanishes in the classical RT approximation, which in fact satisfies the matching condition given by the anomalous equation obtained from the CKT.
△ Less
Submitted 29 May, 2018; v1 submitted 30 September, 2017;
originally announced October 2017.
-
Thermal vorticity production in relativistic dissipative fluids
Authors:
Yang-guang Yang,
Shi Pu
Abstract:
We have computed the circulation integrations of thermal vorticity with and without charged currents in dissipative fluids. We find that the relativistic Kelvin circulation theorem will be modified by the dissipative effects, therefore, the circulation integrations of thermal vorticity may not be conserved during the fluid evolution.
We have computed the circulation integrations of thermal vorticity with and without charged currents in dissipative fluids. We find that the relativistic Kelvin circulation theorem will be modified by the dissipative effects, therefore, the circulation integrations of thermal vorticity may not be conserved during the fluid evolution.
△ Less
Submitted 23 September, 2017;
originally announced September 2017.
-
A Flocking-based Approach for Distributed Stochastic Optimization
Authors:
Shi Pu,
Alfredo Garcia
Abstract:
In recent years, the paradigm of cloud computing has emerged as an architecture for computing that makes use of distributed (networked) computing resources. In this paper, we consider a distributed computing algorithmic scheme for stochastic optimization which relies on modest communication requirements amongst processors and most importantly, does not require synchronization. Specifically, we ana…
▽ More
In recent years, the paradigm of cloud computing has emerged as an architecture for computing that makes use of distributed (networked) computing resources. In this paper, we consider a distributed computing algorithmic scheme for stochastic optimization which relies on modest communication requirements amongst processors and most importantly, does not require synchronization. Specifically, we analyze a scheme with $N>1$ independent threads implementing each a stochastic gradient algorithm. The threads are coupled via a perturbation of the gradient (with attractive and repulsive forces) in a similar manner to mathematical models of flocking, swarming and other group formations found in nature with mild communication requirements. When the objective function is convex, we show that a flocking-like approach for distributed stochastic optimization provides a noise reduction effect similar to that of a centralized stochastic gradient algorithm based upon the average of $N$ gradient samples at each step. The distributed nature of flocking makes it an appealing computational alternative. We show that when the overhead related to the time needed to gather $N$ samples and synchronization is not negligible, the flocking implementation outperforms a centralized stochastic gradient algorithm based upon the average of $N$ gradient samples at each step. When the objective function is not convex, the flocking-based approach seems better suited to escape locally optimal solutions due to the repulsive force which enforces a certain level of diversity in the set of candidate solutions. Here again, we show that the noise reduction effect is similar to that associated to the centralized stochastic gradient algorithm based upon the average of $N$ gradient samples at each step.
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
Focusing Attention: Towards Accurate Text Recognition in Natural Images
Authors:
Zhanzhan Cheng,
Fan Bai,
Yunlu Xu,
Gang Zheng,
Shiliang Pu,
Shuigeng Zhou
Abstract:
Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reaso…
▽ More
Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is the attention-based encoder-decoder framework that learns the mapping between input images and output sequences in a purely data-driven way. However, we observe that existing attention-based methods perform poorly on complicated and/or low-quality images. One major reason is that existing methods cannot get accurate alignments between feature areas and targets for such images. We call this phenomenon "attention drift". To tackle this problem, in this paper we propose the FAN (the abbreviation of Focusing Attention Network) method that employs a focusing attention mechanism to automatically draw back the drifted attention. FAN consists of two major components: an attention network (AN) that is responsible for recognizing character targets as in the existing methods, and a focusing network (FN) that is responsible for adjusting attention by evaluating whether AN pays attention properly on the target areas in the images. Furthermore, different from the existing methods, we adopt a ResNet-based network to enrich deep representations of scene text images. Extensive experiments on various benchmarks, including the IIIT5k, SVT and ICDAR datasets, show that the FAN method substantially outperforms the existing methods.
△ Less
Submitted 17 October, 2017; v1 submitted 6 September, 2017;
originally announced September 2017.
-
Composite Fermions on a Torus
Authors:
Songyang Pu,
Ying-Hai Wu,
J. K. Jain
Abstract:
We achieve an explicit construction of the lowest Landau level (LLL) projected wave functions for composite fermions in the periodic (torus) geometry. To this end, we first demonstrate how the vortex attachment of the composite fermion (CF) theory can be accomplished in the torus geometry to produce the "unprojected" wave functions satisfying the correct (quasi-)periodic boundary conditions. We th…
▽ More
We achieve an explicit construction of the lowest Landau level (LLL) projected wave functions for composite fermions in the periodic (torus) geometry. To this end, we first demonstrate how the vortex attachment of the composite fermion (CF) theory can be accomplished in the torus geometry to produce the "unprojected" wave functions satisfying the correct (quasi-)periodic boundary conditions. We then consider two methods for projecting these wave functions into the LLL. The direct projection produces valid wave functions but can be implemented only for very small systems. The more powerful and more useful projection method of Jain and Kamilla fails in the torus geometry because it does not preserve the periodic boundary conditions and thus takes us out of the original Hilbert space. We have succeeded in constructing a modified projection method that is consistent with both the periodic boundary conditions and the general structure of the CF theory. This method is valid for a large class of states of composite fermions, called "proper states," which includes the incompressible ground states at electron filling factors $ν=\frac{n}{2pn+ 1}$, their charged and neutral excitations, and also the quasidegenerate ground states at arbitrary filling factors of the form $ν=\frac{ν^*}{2pν^*+ 1}$, where $n$ and $p$ are integers and $ν^*$ is the CF filling factor. Comparison with exact results known for small systems for the ground and excited states at filling factors $ν=1/3$, 2/5 and 3/7 demonstrates our LLL-projected wave functions to be extremely accurate representations of the actual Coulomb eigenstates. Our construction enables the study of large systems of composite fermions on the torus, thereby opening the possibility of investigating numerous interesting questions and phenomena.
△ Less
Submitted 7 November, 2017; v1 submitted 29 August, 2017;
originally announced August 2017.
-
Effect of intense magnetic fields on reduced-MHD evolution in $\sqrt{s_{\rm NN}}$ = 200 GeV Au+Au collisions
Authors:
Victor Roy,
Shi Pu,
Luciano Rezzolla,
Dirk H. Rischke
Abstract:
We investigate the effect of large magnetic fields on the $2+1$ dimensional reduced-magnetohydrodynamical expansion of hot and dense nuclear matter produced in $\sqrt{s_{\rm NN}}$ = 200 GeV Au+Au collisions. For the sake of simplicity, we consider the case where the magnetic field points in the direction perpendicular to the reaction plane. We also consider this field to be external, with energy d…
▽ More
We investigate the effect of large magnetic fields on the $2+1$ dimensional reduced-magnetohydrodynamical expansion of hot and dense nuclear matter produced in $\sqrt{s_{\rm NN}}$ = 200 GeV Au+Au collisions. For the sake of simplicity, we consider the case where the magnetic field points in the direction perpendicular to the reaction plane. We also consider this field to be external, with energy density parametrized as a two-dimensional Gaussian. The width of the Gaussian along the directions orthogonal to the beam axis varies with the centrality of the collision. The dependence of the magnetic field on proper time ($τ$) for the case of zero electrical conductivity of the QGP is parametrized following [Deng 2012], and for finite electrical conductivity following [Tuchin 2013]. We solve the equations of motion of ideal hydrodynamics for such an external magnetic field. For collisions with non-zero impact parameter we observe considerable changes in the evolution of the momentum eccentricities of the fireball when comparing the case when the magnetic field decays in a conducting QGP medium and when no magnetic field is present. The elliptic-flow coefficient $v_2$ of $π^{-}$ is shown to increase in the presence of an external magnetic field and the increment in $v_2$ is found to depend on the evolution and the initial magnitude of the magnetic field.
△ Less
Submitted 16 December, 2017; v1 submitted 16 June, 2017;
originally announced June 2017.
-
Boost invariant formulation of the chiral kinetic theory
Authors:
Shu Ebihara,
Kenji Fukushima,
Shi Pu
Abstract:
We formulate the chiral kinetic equation with the longitudinal boost invariance. We particularly focus on the physical interpretation of the particle number conservation. There appear two terms associated with the expansion, which did not exist in the non-chiral kinetic equation. One is a contribution to the transverse current arising from the side-jump effect, and the other is a change in the den…
▽ More
We formulate the chiral kinetic equation with the longitudinal boost invariance. We particularly focus on the physical interpretation of the particle number conservation. There appear two terms associated with the expansion, which did not exist in the non-chiral kinetic equation. One is a contribution to the transverse current arising from the side-jump effect, and the other is a change in the density whose flow makes the longitudinal current. We point out a characteristic pattern in the transverse current driven by the expansion, which we call the chiral circular displacement.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
Skeleton-based Action Recognition with Convolutional Neural Networks
Authors:
Chao Li,
Qiaoyong Zhong,
Di Xie,
Shiliang Pu
Abstract:
Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is des…
▽ More
Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.
△ Less
Submitted 25 April, 2017;
originally announced April 2017.
-
Covariant chiral kinetic equation in Wigner function approach
Authors:
Jian-hua Gao,
Shi Pu,
Qun Wang
Abstract:
The covariant chiral kinetic equation (CCKE) is derived from the 4-dimensional Wigner function by an improved perturbative method under the static equilibrium conditions. The chiral kinetic equation in 3-dimensions can be obtained by intergation over the time component of the 4-momentum. There is freedom to add more terms to the CCKE allowed by conservation laws. In the derivation of the 3-dimensi…
▽ More
The covariant chiral kinetic equation (CCKE) is derived from the 4-dimensional Wigner function by an improved perturbative method under the static equilibrium conditions. The chiral kinetic equation in 3-dimensions can be obtained by intergation over the time component of the 4-momentum. There is freedom to add more terms to the CCKE allowed by conservation laws. In the derivation of the 3-dimensional equation, there is also freedom to choose coefficients of some terms in $dx_{0}/dτ$ and $d\mathbf{x}/dτ$ ($τ$ is a parameter along the worldline, and $(x_{0},\mathbf{x})$ denotes the time-space position of a particle) whose 3-mometum integrals are vanishing. So the 3-dimensional chiral kinetic equation derived from the CCKE is not uniquely determined in the current approach. To go beyond the current approach, one needs a new way of building up the 3-dimensional chiral kinetic equation from the CCKE or directly from covariant Wigner equations.
△ Less
Submitted 1 April, 2017;
originally announced April 2017.
-
Fixed points and flow analysis on off-equilibrium dynamics in the boson Boltzmann equation
Authors:
Kenji Fukushima,
Koichi Murase,
Shi Pu
Abstract:
We consider fixed points of steady solutions and flow directions using the boson Boltzmann equation that is a one-dimensionally reduced kinetic equation after the angular integration. With an elastic collision integral of the two-to-two scattering process, in the dense (dilute) regime where the distribution function is large (small), the boson Boltzmann equation has approximate fixed points with a…
▽ More
We consider fixed points of steady solutions and flow directions using the boson Boltzmann equation that is a one-dimensionally reduced kinetic equation after the angular integration. With an elastic collision integral of the two-to-two scattering process, in the dense (dilute) regime where the distribution function is large (small), the boson Boltzmann equation has approximate fixed points with a power-law spectrum in addition to the thermal distribution function. We argue that the power-law fixed point can be exact in special cases. We elaborate a graphical presentation to display evolving flow directions similarly to the renormalization group flow, which explicitly exhibits how fixed points are connected and parameter space is separated by critical lines. We discuss that such a flow diagram contains useful information on thermalization processes out of equilibrium.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation
Authors:
Di Xie,
Jiang Xiong,
Shiliang Pu
Abstract:
Deep neural network is difficult to train and this predicament becomes worse as the depth increases. The essence of this problem exists in the magnitude of backpropagated errors that will result in gradient vanishing or exploding phenomenon. We show that a variant of regularizer which utilizes orthonormality among different filter banks can alleviate this problem. Moreover, we design a backward er…
▽ More
Deep neural network is difficult to train and this predicament becomes worse as the depth increases. The essence of this problem exists in the magnitude of backpropagated errors that will result in gradient vanishing or exploding phenomenon. We show that a variant of regularizer which utilizes orthonormality among different filter banks can alleviate this problem. Moreover, we design a backward error modulation mechanism based on the quasi-isometry assumption between two consecutive parametric layers. Equipped with these two ingredients, we propose several novel optimization solutions that can be utilized for training a specific-structured (repetitively triple modules of Conv-BNReLU) extremely deep convolutional neural network (CNN) WITHOUT any shortcuts/ identity mappings from scratch. Experiments show that our proposed solutions can achieve distinct improvements for a 44-layer and a 110-layer plain networks on both the CIFAR-10 and ImageNet datasets. Moreover, we can successfully train plain CNNs to match the performance of the residual counterparts.
Besides, we propose new principles for designing network structure from the insights evoked by orthonormality. Combined with residual structure, we achieve comparative performance on the ImageNet dataset.
△ Less
Submitted 9 April, 2017; v1 submitted 6 March, 2017;
originally announced March 2017.
-
Residual Convolutional CTC Networks for Automatic Speech Recognition
Authors:
Yisen Wang,
Xuejiao Deng,
Songbai Pu,
Zhiheng Huang
Abstract:
Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However, most CNNs used in existing work have less than 10 layers which may not be deep enough to capture all human speech signal information. In this paper, we propose a…
▽ More
Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However, most CNNs used in existing work have less than 10 layers which may not be deep enough to capture all human speech signal information. In this paper, we propose a novel deep and wide CNN architecture denoted as RCNN-CTC, which has residual connections and Connectionist Temporal Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can exploit temporal and spectral structures of speech signals simultaneously. Furthermore, we introduce a CTC-based system combination, which is different from the conventional frame-wise senone-based one. The basic subsystems adopted in the combination are different types and thus mutually complementary to each other. Experimental results show that our proposed single system RCNN-CTC can achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets, compared to several widely used neural network systems in ASR. In addition, the proposed system combination can offer a further error reduction on these two data sets, resulting in relative WER reductions of $14.91\%$ and $6.52\%$ on WSJ dev93 and Tencent Chat data sets respectively.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
Relativistic Chiral Kinetic Theory from Quantum Field Theories
Authors:
Yoshimasa Hidaka,
Shi Pu,
Di-Lun Yang
Abstract:
The chiral kinetic theory of Weyl fermions with collisions in the presence of weak electric and magnetic fields is derived from quantum field theories. It is found that the side-jump terms in the perturbative solution of Wigner functions play a significant role for the derivation. Moreover, such terms manifest the breaking of Lorentz symmetry for distribution functions. The Lorentz covariance of W…
▽ More
The chiral kinetic theory of Weyl fermions with collisions in the presence of weak electric and magnetic fields is derived from quantum field theories. It is found that the side-jump terms in the perturbative solution of Wigner functions play a significant role for the derivation. Moreover, such terms manifest the breaking of Lorentz symmetry for distribution functions. The Lorentz covariance of Wigner functions thus leads to modified Lorentz transformation associated with side-jump phenomena further influenced by background fields and collisions.
△ Less
Submitted 17 April, 2017; v1 submitted 14 December, 2016;
originally announced December 2016.
-
Analytic Solutions of Transverse Magneto-hydrodynamics under Bjorken Expansion
Authors:
Shi Pu,
Di-Lun Yang
Abstract:
We review the recent developments of analytic solutions in transverse magneto-hydrodynamics under Bjorken expansion. It is found that the time dependence of magnetic fields can either increase or reduce the energy density depending on the decay exponent of magnetic fields. Moreover, perturbative solutions under weak magnetic fields with spatial inhomogeneity results in transverse flow, where the d…
▽ More
We review the recent developments of analytic solutions in transverse magneto-hydrodynamics under Bjorken expansion. It is found that the time dependence of magnetic fields can either increase or reduce the energy density depending on the decay exponent of magnetic fields. Moreover, perturbative solutions under weak magnetic fields with spatial inhomogeneity results in transverse flow, where the directions of flow also depend on the decay exponent of magnetic fields in time.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
Mixed context networks for semantic segmentation
Authors:
Haiming Sun,
Di Xie,
Shiliang Pu
Abstract:
Semantic segmentation is challenging as it requires both object-level information and pixel-level accuracy. Recently, FCN-based systems gained great improvement in this area. Unlike classification networks, combining features of different layers plays an important role in these dense prediction models, as these features contains information of different levels. A number of models have been propose…
▽ More
Semantic segmentation is challenging as it requires both object-level information and pixel-level accuracy. Recently, FCN-based systems gained great improvement in this area. Unlike classification networks, combining features of different layers plays an important role in these dense prediction models, as these features contains information of different levels. A number of models have been proposed to show how to use these features. However, what is the best architecture to make use of features of different layers is still a question. In this paper, we propose a module, called mixed context network, and show that our presented system outperforms most existing semantic segmentation systems by making use of this module.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
Positions of the magnetoroton minima in the fractional quantum Hall effect
Authors:
Ajit C. Balram,
Songyang Pu
Abstract:
The multitude of excitations of the fractional quantum Hall state are very accurately understood, microscopically, as excitations of composite fermions across their Landau-like $Λ$ levels. In particular, the dispersion of the composite fermion exciton, which is the lowest energy spin conserving neutral excitation, displays filling-factor-specific minima called "magnetoroton" minima. Simon and Halp…
▽ More
The multitude of excitations of the fractional quantum Hall state are very accurately understood, microscopically, as excitations of composite fermions across their Landau-like $Λ$ levels. In particular, the dispersion of the composite fermion exciton, which is the lowest energy spin conserving neutral excitation, displays filling-factor-specific minima called "magnetoroton" minima. Simon and Halperin employed the Chern-Simons field theory of composite fermions [Phys. Rev. B {\bf 48}, 17368 (1993)] to predict the magnetoroton minima positions. Recently, Golkar \emph{et al.} [Phys. Rev. Lett. {\bf 117}, 216403 (2016)] have modeled the neutral excitations as deformations of the composite fermion Fermi sea, which results in a prediction for the positions of the magnetoroton minima. Using methods of the microscopic composite fermion theory we calculate the positions of the roton minima for filling factors up to 5/11 along the sequence $s/(2s+1)$ and find them to be in reasonably good agreement with both the Chern-Simons field theory of composite fermions and Golkar \emph{et al.}'s theory. We also find that the positions of the roton minima are insensitive to the microscopic interaction in agreement with Golkar \emph{et al.}'s theory. As a byproduct of our calculations, we obtain the charge and neutral gaps for the fully spin polarized states along the sequence $s/(2s\pm 1)$ in the lowest Landau level and the $n=1$ Landau level of graphene.
△ Less
Submitted 23 June, 2017; v1 submitted 6 September, 2016;
originally announced September 2016.
-
Iterative Mechanisms for Electricity Markets
Authors:
Shi Pu,
Alfredo Garcia
Abstract:
In order to deal with market power that sporadically results from contingencies (e.g., severe weather, plant outages) most electricity markets have institutions in charge of monitoring market performance and mitigating market power. The latter task is often achieved by producing estimates of marginal costs (also referred to as "reference levels") that may replace the actual bids by generators with…
▽ More
In order to deal with market power that sporadically results from contingencies (e.g., severe weather, plant outages) most electricity markets have institutions in charge of monitoring market performance and mitigating market power. The latter task is often achieved by producing estimates of marginal costs (also referred to as "reference levels") that may replace the actual bids by generators with market power. In this paper, we propose an iterative mechanism that constitutes an alternative to outright regulatory intervention in those sporadic situations in which market power is a significant concern. The iterative mechanism proposed is based upon relatively simple information exchange between the market maker and market participants and is equipped to handle general non-linear (convex) constraints related to technical and/or reliability requirements. We show the mechanism proposed has many desirable properties (approximately): incentive compatibility, efficiency, individual rationality and (weak) budget balance. In addition, we show it is robust to imperfect information regarding costs. To illustrate we consider its application to the joint clearing of day ahead dispatch and reserves, an approach that has been proposed for the large-scale integration of renewable generation. In this context, we relax the assumption that each generator has perfect information on its own expected marginal costs of adjustment and show the mechanism retains most of its properties.
△ Less
Submitted 4 March, 2017; v1 submitted 31 August, 2016;
originally announced August 2016.
-
Nonlinear Chiral Transport Phenomena
Authors:
Jiunn-Wei Chen,
Takeaki Ishii,
Shi Pu,
Naoki Yamamoto
Abstract:
We study the nonlinear responses of relativistic chiral matter to the external fields, such as the electric field ${\bf E}$, gradients of temperature and chemical potential, ${\bf \nabla} T$ and ${\bf \nabla} μ$. Using the kinetic theory with Berry curvature corrections under the relaxation time approximation, we compute the transport coefficients of possible new electric currents that are forbidd…
▽ More
We study the nonlinear responses of relativistic chiral matter to the external fields, such as the electric field ${\bf E}$, gradients of temperature and chemical potential, ${\bf \nabla} T$ and ${\bf \nabla} μ$. Using the kinetic theory with Berry curvature corrections under the relaxation time approximation, we compute the transport coefficients of possible new electric currents that are forbidden in usual chirally symmetric matter, but are allowed in chirally asymmetric matter by parity. In particular, we find a new type of electric current proportional to ${\bf \nabla} μ\times {\bf E}$ due to the interplay between the effects of the Berry curvature and collisions. We also derive an analogue of the "Wiedemann-Franz" law specific for anomalous nonlinear transport in relativistic chiral matter.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
Transverse flow induced by inhomogeneous magnetic fields in the Bjorken expansion
Authors:
Shi Pu,
Di-Lun Yang
Abstract:
We investigate the magnetohydrodynamics in the presence of an external magnetic field following the power-law decay in proper time and having spatial inhomogeneity characterized by a Gaussian distribution in one of transverse coordinates under the Bjorken expansion. The leading-order solution is obtained in the weak-field approximation, where both energy density and fluid velocity are modified. It…
▽ More
We investigate the magnetohydrodynamics in the presence of an external magnetic field following the power-law decay in proper time and having spatial inhomogeneity characterized by a Gaussian distribution in one of transverse coordinates under the Bjorken expansion. The leading-order solution is obtained in the weak-field approximation, where both energy density and fluid velocity are modified. It is found that the spatial gradient of the magnetic field results in transverse flow, where the flow direction depends on the decay exponents of the magnetic field. We suggest that such a magnetic-field-induced effect might influence anisotropic flow in heavy ion collisions.
△ Less
Submitted 20 March, 2016; v1 submitted 16 February, 2016;
originally announced February 2016.
-
Bjorken flow in one-dimensional relativistic magnetohydrodynamics with magnetization
Authors:
Shi Pu,
Victor Roy,
Luciano Rezzolla,
Dirk H. Rischke
Abstract:
We study the one-dimensional, longitudinally boost-invariant motion of an ideal fluid with infinite conductivity in the presence of a transverse magnetic field, i.e., in the ideal transverse magnetohydrodynamical limit. In an extension of our previous work Roy et al., [Phys. Lett. B 750, 45 (2015)], we consider the fluid to have a non-zero magnetization. First, we assume a constant magnetic suscep…
▽ More
We study the one-dimensional, longitudinally boost-invariant motion of an ideal fluid with infinite conductivity in the presence of a transverse magnetic field, i.e., in the ideal transverse magnetohydrodynamical limit. In an extension of our previous work Roy et al., [Phys. Lett. B 750, 45 (2015)], we consider the fluid to have a non-zero magnetization. First, we assume a constant magnetic susceptibility $χ_{m}$ and consider an ultrarelativistic ideal gas equation of state. For a paramagnetic fluid (i.e., with $χ_{m}>0$), the decay of the energy density slows down since the fluid gains energy from the magnetic field. For a diamagnetic fluid (i.e., with $χ_{m}<0$), the energy density decays faster because it feeds energy into the magnetic field. Furthermore, when the magnetic field is taken to be external and to decay in proper time $τ$ with a power law $\simτ^{-a}$, two distinct solutions can be found depending on the values of $a$ and $χ_m$. Finally, we also solve the ideal magnetohydrodynamical equations for one-dimensional Bjorken flow with a temperature-dependent magnetic susceptibility and a realistic equation of state given by lattice-QCD data. We find that the temperature and energy density decay more slowly because of the non-vanishing magnetization. For values of the magnetic field typical for heavy-ion collisions, this effect is, however, rather small. Only for magnetic fields which are about an order of magnitude larger than expected for heavy-ion collisions, the system is substantially reheated and the lifetime of the quark phase might be extended.
△ Less
Submitted 13 April, 2016; v1 submitted 16 February, 2016;
originally announced February 2016.
-
Passively Q-switched EDFL using Fe3O4-nanoparticle saturable absorber
Authors:
Xuekun Bai,
Chengbo Mou,
Luxi Xu,
Sujuan Huang,
Tingyun Wang,
Shengli Pu,
Xianglong Zeng
Abstract:
We experimentally demonstrate a passively Q-switched erbium-doped fiber laser (EDFL) operation by using a saturable absorber based on Fe3O4 nanoparticles (FONP) in magnetic fluid (MF). As a kind of transition metal oxide, the FONP has a large nonlinear optical response with a fast response time for saturable absorber. By depositing MF at the end of optical fiber ferrule, we fabricated a FONP-based…
▽ More
We experimentally demonstrate a passively Q-switched erbium-doped fiber laser (EDFL) operation by using a saturable absorber based on Fe3O4 nanoparticles (FONP) in magnetic fluid (MF). As a kind of transition metal oxide, the FONP has a large nonlinear optical response with a fast response time for saturable absorber. By depositing MF at the end of optical fiber ferrule, we fabricated a FONP-based saturable absorber, which enables a strong light-matter interaction owing to the confined transmitted optical field within the single mode fiber. Because of large third-order optical nonlinearities of FONP-based saturable absorber, large modulation depth of 8.2% and non saturable absorption of 56.6% are demonstrated. As a result, stable passively Q-switched EDFL pulses with maximum output pulse energy of 23.76 nJ, repetition rate of 33.3 kHz, and pulse width of 3.2 μs are achieved when the input pump power is 110 mW at the wavelength of 980 nm. The laser features a low threshold pump power of ~15 mW.
△ Less
Submitted 22 November, 2015;
originally announced November 2015.
-
Event-by-event distribution of magnetic field energy over initial fluid energy density in $\sqrt{s_{\rm NN}}$= 200 GeV Au-Au collisions
Authors:
Victor Roy,
Shi Pu
Abstract:
We estimate the event-by-event (e-by-e) distribution of the ratio ($σ$) of the magnetic field energy to the fluid energy density in the transverse plane of Au-Au collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV. A Monte-Carlo (MC) Glauber model is used to calculate the $σ$ in the transverse plane for impact parameter b=0, 12 fm at time $τ_i\sim$0.5 fm. The fluid energy density is obtained by using Gaus…
▽ More
We estimate the event-by-event (e-by-e) distribution of the ratio ($σ$) of the magnetic field energy to the fluid energy density in the transverse plane of Au-Au collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV. A Monte-Carlo (MC) Glauber model is used to calculate the $σ$ in the transverse plane for impact parameter b=0, 12 fm at time $τ_i\sim$0.5 fm. The fluid energy density is obtained by using Gaussian smoothing with two different smoothing parameter $σ_g$=0.25 , 0.5 fm. For $b=0~\rm fm$ collisions $σ$ is found to be $\ll$ 1 in the central region of the fireball and $σ\gtrsim$ 1 at the periphery. For b=12 fm collisions $σ\gtrsim$ 1. The e-by-e correlation between $σ$ and the fluid energy density ($\varepsilon$) is studied. We did not find strong correlation between $σ$ and $\varepsilon$ at the centre of the fireball, whereas they are mostly anti-correlated at the periphery of the fireball.
△ Less
Submitted 15 August, 2015;
originally announced August 2015.
-
Analytic Bjorken flow in one-dimensional relativistic magnetohydrodynamics
Authors:
Victor Roy,
Shi Pu,
Luciano Rezzolla,
Dirk Rischke
Abstract:
In the initial stage of relativistic heavy-ion collisions, strong magnetic fields appear due to the large velocity of the colliding charges. The evolution of these fields appears as a novel and intriguing feature in the fluid-dynamical description of heavy-ion collisions. In this work, we study analytically the one-dimensional, longitudinally boost-invariant motion of an ideal fluid in the presenc…
▽ More
In the initial stage of relativistic heavy-ion collisions, strong magnetic fields appear due to the large velocity of the colliding charges. The evolution of these fields appears as a novel and intriguing feature in the fluid-dynamical description of heavy-ion collisions. In this work, we study analytically the one-dimensional, longitudinally boost-invariant motion of an ideal fluid in the presence of a transverse magnetic field. Interestingly, we find that, in the limit of ideal magnetohydrodynamics, i.e., for infinite conductivity, and irrespective of the strength of the initial magnetization, the decay of the fluid energy density $e$ with proper time $τ$ is the same as for the time-honored "Bjorken flow" without magnetic field. Furthermore, when the magnetic field is assumed to decay $\sim τ^{-a}$, where $a$ is an arbitrary number, two classes of analytic solutions can be found depending on whether $a$ is larger or smaller than one. In summary, the analytic solutions presented here highlight that the Bjorken flow is far more general than formerly thought. These solutions can serve both to gain insight on the dynamics of heavy-ion collisions in the presence of strong magnetic fields and as testbeds for numerical codes.
△ Less
Submitted 22 August, 2015; v1 submitted 22 June, 2015;
originally announced June 2015.
-
Relativistic viscous hydrodynamics order by order
Authors:
Jian-Hua Gao,
Shi Pu
Abstract:
In this paper, we propose a method of solving the viscous hydrodynamics order by order in a derivative expansion. In such a method, the zero-order solution is just one of the ideal hydrodynamics. All the other higher order corrections satisfy the same first-order partial differential equations but with different inhomogeneous terms. We take the Bjorken flow as an example to test the validity of ou…
▽ More
In this paper, we propose a method of solving the viscous hydrodynamics order by order in a derivative expansion. In such a method, the zero-order solution is just one of the ideal hydrodynamics. All the other higher order corrections satisfy the same first-order partial differential equations but with different inhomogeneous terms. We take the Bjorken flow as an example to test the validity of our method and present how to deal with the problems about the initial condition and perturbation evolution in our formalism.
△ Less
Submitted 20 October, 2015; v1 submitted 1 September, 2014;
originally announced September 2014.
-
Chiral Hall Effect and Chiral Electric Waves
Authors:
Shi Pu,
Shang-Yu Wu,
Di-Lun Yang
Abstract:
We investigate the vector and axial currents induced by external electromagnetic fields and chemical potentials in chiral systems at finite temperature. Similar to the normal Hall effect, we find that an axial Hall current is generated in the presence of the electromagnetic fields along with an axial chemical potential, which may be dubbed as the "chiral Hall effect"(CHE). The CHE is related to th…
▽ More
We investigate the vector and axial currents induced by external electromagnetic fields and chemical potentials in chiral systems at finite temperature. Similar to the normal Hall effect, we find that an axial Hall current is generated in the presence of the electromagnetic fields along with an axial chemical potential, which may be dubbed as the "chiral Hall effect"(CHE). The CHE is related to the interactions of chiral fermions and exists with the a nonzero axial chemical potential. We argue that the CHE could lead to nontrivial charge distributions at different rapidity in asymmetric heavy ion collisions. Moreover, we study the chiral electric waves(CEW) led by the fluctuations of the vector and axial chemical potentials along with the chiral electric separation effect(CESE), where a density wave propagates along the applied electric field. Combining with the normal/chiral Hall effects, the fluctuations of chemical potentials thus result in Hall density waves. The Hall density waves may survive even at zero chemical potentials and become non-dissipative. We further study the transport coefficients including the Hall conductivities, damping times, wave velocities, and diffusion constants of CEW in a strongly coupled plasma via the AdS/CFT correspondence.
△ Less
Submitted 31 December, 2014; v1 submitted 11 July, 2014;
originally announced July 2014.