-
A Hierarchical Reinforcement Learning Framework for Multi-UAV Combat Using Leader-Follower Strategy
Authors:
Jinhui Pang,
Jinglin He,
Noureldin Mohamed Abdelaal Ahmed Mohamed,
Changqing Lin,
Zhihui Zhang,
Xiaoshuai Hao
Abstract:
Multi-UAV air combat is a complex task involving multiple autonomous UAVs, an evolving field in both aerospace and artificial intelligence. This paper aims to enhance adversarial performance through collaborative strategies. Previous approaches predominantly discretize the action space into predefined actions, limiting UAV maneuverability and complex strategy implementation. Others simplify the pr…
▽ More
Multi-UAV air combat is a complex task involving multiple autonomous UAVs, an evolving field in both aerospace and artificial intelligence. This paper aims to enhance adversarial performance through collaborative strategies. Previous approaches predominantly discretize the action space into predefined actions, limiting UAV maneuverability and complex strategy implementation. Others simplify the problem to 1v1 combat, neglecting the cooperative dynamics among multiple UAVs. To address the high-dimensional challenges inherent in six-degree-of-freedom space and improve cooperation, we propose a hierarchical framework utilizing the Leader-Follower Multi-Agent Proximal Policy Optimization (LFMAPPO) strategy. Specifically, the framework is structured into three levels. The top level conducts a macro-level assessment of the environment and guides execution policy. The middle level determines the angle of the desired action. The bottom level generates precise action commands for the high-dimensional action space. Moreover, we optimize the state-value functions by assigning distinct roles with the leader-follower strategy to train the top-level policy, followers estimate the leader's utility, promoting effective cooperation among agents. Additionally, the incorporation of a target selector, aligned with the UAVs' posture, assesses the threat level of targets. Finally, simulation experiments validate the effectiveness of our proposed method.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Privacy-Preserving Distributed Maximum Consensus Without Accuracy Loss
Authors:
Wenrui Yu,
Richard Heusdens,
Jun Pang,
Qiongxiu Li
Abstract:
In distributed networks, calculating the maximum element is a fundamental task in data analysis, known as the distributed maximum consensus problem. However, the sensitive nature of the data involved makes privacy protection essential. Despite its importance, privacy in distributed maximum consensus has received limited attention in the literature. Traditional privacy-preserving methods typically…
▽ More
In distributed networks, calculating the maximum element is a fundamental task in data analysis, known as the distributed maximum consensus problem. However, the sensitive nature of the data involved makes privacy protection essential. Despite its importance, privacy in distributed maximum consensus has received limited attention in the literature. Traditional privacy-preserving methods typically add noise to updates, degrading the accuracy of the final result. To overcome these limitations, we propose a novel distributed optimization-based approach that preserves privacy without sacrificing accuracy. Our method introduces virtual nodes to form an augmented graph and leverages a carefully designed initialization process to ensure the privacy of honest participants, even when all their neighboring nodes are dishonest. Through a comprehensive information-theoretical analysis, we derive a sufficient condition to protect private data against both passive and eavesdropping adversaries. Extensive experiments validate the effectiveness of our approach, demonstrating that it not only preserves perfect privacy but also maintains accuracy, outperforming existing noise-based methods that typically suffer from accuracy loss.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression
Authors:
Jiahao Pang,
Kevin Bui,
Dong Tian
Abstract:
The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for e…
▽ More
The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
Authors:
Junfeng Long,
Zirui Wang,
Quanyi Li,
Jiawei Gao,
Liu Cao,
Jiangmiao Pang
Abstract:
Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu…
▽ More
Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.
△ Less
Submitted 1 January, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
WrappingNet: Mesh Autoencoder via Deep Sphere Deformation
Authors:
Eric Lei,
Muhammad Asad Lodhi,
Jiahao Pang,
Junghyun Ahn,
Dong Tian
Abstract:
There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific t…
▽ More
There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Capacity-achieving Polar-based Codes with Sparsity Constraints on the Generator Matrices
Authors:
James Chin-Jen Pang,
Hessam Mahdavifar,
S. Sandeep Pradhan
Abstract:
In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization ker…
▽ More
In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization kernel such that the corresponding polar code is capacity-achieving with the \textit{rate of polarization} $s/2$, and the GM column weights being bounded from above by $N^s$. To improve the sparsity versus error rate trade-off, we devise a column-splitting algorithm and two coding schemes for BEC and then for general BMS channels. The \textit{polar-based} codes generated by the two schemes inherit several fundamental properties of polar codes with the original $2 \times 2$ kernel including the decay in error probability, decoding complexity, and the capacity-achieving property. Furthermore, they demonstrate the additional property that their GM column weights are bounded from above sublinearly in $N$, while the original polar codes have some column weights that are linear in $N$. In particular, for any BEC and $β<0.5$, the existence of a sequence of capacity-achieving polar-based codes where all the GM column weights are bounded from above by $N^λ$ with $λ\approx 0.585$, and with the error probability bounded by $O(2^{-N^β} )$ under a decoder with complexity $O(N\log N)$, is shown. The existence of similar capacity-achieving polar-based codes with the same decoding complexity is shown for any BMS channel and $β<0.5$ with $λ\approx 0.631$.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full poten…
▽ More
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full potential of Transformer unexplored. In this paper, we consider the characteristics of speech and propose a general structure-based framework, called SpeechFormer++, for paralinguistic speech processing. More concretely, following the component relationship in the speech signal, we design a unit encoder to model the intra- and inter-unit information (i.e., frames, phones, and words) efficiently. According to the hierarchical relationship, we utilize merging blocks to generate features at different granularities, which is consistent with the structural pattern in the speech signal. Moreover, a word encoder is introduced to integrate word-grained features into each unit encoder, which effectively balances fine-grained and coarse-grained information. SpeechFormer++ is evaluated on the speech emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) and Alzheimer's disease detection (Pitt) tasks. The results show that SpeechFormer++ outperforms the standard Transformer while greatly reducing the computational cost. Furthermore, it delivers superior results compared to the state-of-the-art approaches.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
DST: Deformable Speech Transformer for Emotion Recognition
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can seve…
▽ More
Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on input speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable information embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Electromagnetic-Compliant Channel Modeling and Performance Evaluation for Holographic MIMO
Authors:
Tengjiao Wang,
Wei Han,
Zhimeng Zhong,
Jiyong Pang,
Guohua Zhou,
Shaobo Wang,
Qiang Li
Abstract:
Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO s…
▽ More
Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO systems, which is able to model both the characteristics of the propagation channel and the non-ideal factors caused by mutual coupling at the transceivers, including the antenna pattern distortion and the decrease of antenna efficiency. Based on the proposed channel model, a more realistic performance evaluation is conducted to show the performance of the holographic MIMO system in both the single-user and the multi-user scenarios. Key challenges and future research directions are further provided based on the theoretical analyses and numerical results.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
The applicability of transperceptual and deep learning approaches to the study and mimicry of complex cartilaginous tissues
Authors:
J. Waghorne,
C. Howard,
H. Hu,
J. Pang,
W. J. Peveler,
L. Harris,
O. Barrera
Abstract:
Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and r…
▽ More
Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and replicated. Here we explore a combined audio-visual approach - so called transperceptual - to generate artificial architectures mimicking the native ones. The proposed method uses both traditional imagery, and sound generated from each image as a method of rapidly comparing and contrasting the porosity and pore size within the samples. We have trained and tested a generative adversarial network (GAN) on the 2D image stacks. The impact of the training set of images on the similarity of the artificial to the original dataset was assessed by analyzing two samples. The first consisting of n=478 pairs of audio and image files for which the images were downsampled to 64 $\times$ 64 pixels, the second one consisting of n=7640 pairs of audio and image files for which the full resolution 256 $\times$ 256 pixels is retained but each image is divided into 16 squares to maintain the limit of 64 $\times$ 64 pixels required by the GAN. We reconstruct the 2D stacks of artificially generated datasets into 3D objects and run image analysis algorithms to characterize statistically the architectural parameters - pore size, tortuosity and pore connectivity - and compare them with the original dataset. Results show that the artificially generated dataset that undergoes downsampling performs better in terms of parameter matching. Our audiovisual approach has the potential to be extended to larger data sets to explore both how similarities and differences can be audibly recognized across multiple samples.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
A Review of Intelligent Music Generation Systems
Authors:
Lei Wang,
Ziyi Zhao,
Hanwei Liu,
Junwei Pang,
Yi Qin,
Qidi Wu
Abstract:
With the introduction of ChatGPT, the public's perception of AI-generated content (AIGC) has begun to reshape. Artificial intelligence has significantly reduced the barrier to entry for non-professionals in creative endeavors, enhancing the efficiency of content creation. Recent advancements have seen significant improvements in the quality of symbolic music generation, which is enabled by the use…
▽ More
With the introduction of ChatGPT, the public's perception of AI-generated content (AIGC) has begun to reshape. Artificial intelligence has significantly reduced the barrier to entry for non-professionals in creative endeavors, enhancing the efficiency of content creation. Recent advancements have seen significant improvements in the quality of symbolic music generation, which is enabled by the use of modern generative algorithms to extract patterns implicit in a piece of music based on rule constraints or a musical corpus. Nevertheless, existing literature reviews tend to present a conventional and conservative perspective on future development trajectories, with a notable absence of thorough benchmarking of generative models. This paper provides a survey and analysis of recent intelligent music generation techniques, outlining their respective characteristics and discussing existing methods for evaluation. Additionally, the paper compares the different characteristics of music generation techniques in the East and West as well as analysing the field's development prospects.
△ Less
Submitted 17 November, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression
Authors:
Jiahao Pang,
Muhammad Asad Lodhi,
Dong Tian
Abstract:
Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among poi…
▽ More
Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among points for compression. Motivated by an analysis with fractal dimension, we propose a heterogeneous approach with deep learning for lossy point cloud geometry compression. On top of a base layer compressing a coarse representation of the input, an enhancement layer is designed to cope with the challenging geometric residual/details. Specifically, a point-based network is applied to convert the erratic local details to latent features residing on the coarse point cloud. Then a sparse convolutional neural network operating on the coarse point cloud is launched. It utilizes the continuity/smoothness of the coarse geometry to compress the latent features as an enhancement bit-stream that greatly benefits the reconstruction quality. When this bit-stream is unavailable, e.g., due to packet loss, we support a skip mode with the same architecture which generates geometric details from the coarse point cloud directly. Experimentation on both dense and sparse point clouds demonstrate the state-of-the-art compression performance achieved by our proposal. Our code is available at https://github.com/InterDigitalInc/GRASP-Net.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness
Authors:
Dianwen Ng,
Jin Hui Pang,
Yang Xiao,
Biao Tian,
Qiang Fu,
Eng Siong Chng
Abstract:
It is critical for a keyword spotting model to have a small footprint as it typically runs on-device with low computational resources. However, maintaining the previous SOTA performance with reduced model size is challenging. In addition, a far-field and noisy environment with multiple signals interference aggravates the problem causing the accuracy to degrade significantly. In this paper, we pres…
▽ More
It is critical for a keyword spotting model to have a small footprint as it typically runs on-device with low computational resources. However, maintaining the previous SOTA performance with reduced model size is challenging. In addition, a far-field and noisy environment with multiple signals interference aggravates the problem causing the accuracy to degrade significantly. In this paper, we present a multi-channel ConvMixer for speech command recognitions. The novel architecture introduces an additional audio channel mixing for channel audio interaction in a multi-channel audio setting to achieve better noise-robust features with more efficient computation. Besides, we proposed a centroid based awareness component to enhance the system by equipping it with additional spatial geometry information in the latent feature projection space. We evaluate our model using the new MISP challenge 2021 dataset. Our model achieves significant improvement against the official baseline with a 55% gain in the competition score (0.152) on raw microphone array input and a 63% (0.126) boost upon front-end speech enhancement.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Authors:
Weidong Chen,
Xiaofen Xing,
Xiangmin Xu,
Jianxin Pang,
Lan Du
Abstract:
Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computati…
▽ More
Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computational burden due to its full attention operation. In this paper, a hierarchical efficient framework, called SpeechFormer, which considers the structural characteristics of speech, is proposed and can be served as a general-purpose backbone for cognitive speech signal processing. The proposed SpeechFormer consists of frame, phoneme, word and utterance stages in succession, each performing a neighboring attention according to the structural pattern of speech with high computational efficiency. SpeechFormer is evaluated on speech emotion recognition (IEMOCAP & MELD) and neurocognitive disorder detection (Pitt & DAIC-WOZ) tasks, and the results show that SpeechFormer outperforms the standard Transformer-based framework while greatly reducing the computational cost. Furthermore, our SpeechFormer achieves comparable results to the state-of-the-art approaches.
△ Less
Submitted 9 March, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
New Bounds on the Size of Binary Codes with Large Minimum Distance
Authors:
James Chin-Jen Pang,
Hessam Mahdavifar,
S. Sandeep Pradhan
Abstract:
Let $A(n, d)$ denote the maximum size of a binary code of length $n$ and minimum Hamming distance $d$. Studying $A(n, d)$, including efforts to determine it as well to derive bounds on $A(n, d)$ for large $n$'s, is one of the most fundamental subjects in coding theory. In this paper, we explore new lower and upper bounds on $A(n, d)$ in the large-minimum distance regime, in particular, when…
▽ More
Let $A(n, d)$ denote the maximum size of a binary code of length $n$ and minimum Hamming distance $d$. Studying $A(n, d)$, including efforts to determine it as well to derive bounds on $A(n, d)$ for large $n$'s, is one of the most fundamental subjects in coding theory. In this paper, we explore new lower and upper bounds on $A(n, d)$ in the large-minimum distance regime, in particular, when $d = n/2 - Ω(\sqrt{n})$. We first provide a new construction of cyclic codes, by carefully selecting specific roots in the binary extension field for the check polynomial, with length $n= 2^m -1$, distance $d \geq n/2 - 2^{c-1}\sqrt{n}$, and size $n^{c+1/2}$, for any $m\geq 4$ and any integer $c$ with $0 \leq c \leq m/2 - 1$. These code parameters are slightly worse than those of the Delsarte--Goethals (DG) codes that provide the previously known best lower bound in the large-minimum distance regime. However, using a similar and extended code construction technique we show a sequence of cyclic codes that improve upon DG codes and provide the best lower bound in a narrower range of the minimum distance $d$, in particular, when $d = n/2 - Ω(n^{2/3})$. Furthermore, by leveraging a Fourier-analytic view of Delsarte's linear program, upper bounds on $A(n, n/2 - ρ\sqrt{n})$ with $ρ\in (0.5, 9.5)$ are obtained that scale polynomially in $n$. To the best of authors' knowledge, the upper bound due to Barg and Nogin \cite{barg2006spectral} is the only previously known upper bound that scale polynomially in $n$ in this regime. We numerically demonstrate that our upper bound improves upon the Barg-Nogin upper bound in the specified high-minimum distance regime.
△ Less
Submitted 23 May, 2023; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement
Authors:
Xue Zhang,
Gene Cheung,
Jiahao Pang,
Yash Sanghvi,
Abhiram Gnanasambandam,
Stanley H. Chan
Abstract:
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a pri…
▽ More
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors.
Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from a representative depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint mapping and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We minimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes in two established point cloud quality metrics.
△ Less
Submitted 6 October, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Key-Sparse Transformer for Multimodal Speech Emotion Recognition
Authors:
Weidong Chen,
Xiaofeng Xing,
Xiangmin Xu,
Jichen Yang,
Jianxin Pang
Abstract:
Speech emotion recognition is a challenging research topic that plays a critical role in human-computer interaction. Multimodal inputs further improve the performance as more emotional information is used. However, existing studies learn all the information in the sample while only a small portion of it is about emotion. The redundant information will become noises and limit the system performance…
▽ More
Speech emotion recognition is a challenging research topic that plays a critical role in human-computer interaction. Multimodal inputs further improve the performance as more emotional information is used. However, existing studies learn all the information in the sample while only a small portion of it is about emotion. The redundant information will become noises and limit the system performance. In this paper, a key-sparse Transformer is proposed for efficient emotion recognition by focusing more on emotion related information. The proposed method is evaluated on the IEMOCAP and LSSED. Experimental results show that the proposed method achieves better performance than the state-of-the-art approaches.
△ Less
Submitted 27 February, 2023; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Target Control of Asynchronous Boolean Networks
Authors:
Cui Su,
Jun Pang
Abstract:
We study the target control of asynchronous Boolean networks, to identify efficacious interventions that can drive the dynamics of a given Boolean network from any initial state to the desired target attractor. Based on the application time, the control can be realised with three types of perturbations, including instantaneous, temporary and permanent perturbations. We develop efficient methods to…
▽ More
We study the target control of asynchronous Boolean networks, to identify efficacious interventions that can drive the dynamics of a given Boolean network from any initial state to the desired target attractor. Based on the application time, the control can be realised with three types of perturbations, including instantaneous, temporary and permanent perturbations. We develop efficient methods to compute the target control for a given target attractor with three types of perturbations. We compare our methods with the stable motif-based control on a variety of real-life biological networks to evaluate their performance. We show that our methods scale well for large Boolean networks and they are able to identify a rich set of solutions with a small number of perturbations.
△ Less
Submitted 3 January, 2021;
originally announced January 2021.
-
A Dynamics-based Approach for the Target Control of Boolean Networks
Authors:
Cui Su,
Jun Pang
Abstract:
We study the target control problem of asynchronous Boolean networks, to identify a set of nodes, the perturbation of which can drive the dynamics of the network from any initial state to the desired steady state (or attractor). We are particularly interested in temporary perturbations, which are applied for sufficient time and then released to retrieve the original dynamics. Temporary perturbatio…
▽ More
We study the target control problem of asynchronous Boolean networks, to identify a set of nodes, the perturbation of which can drive the dynamics of the network from any initial state to the desired steady state (or attractor). We are particularly interested in temporary perturbations, which are applied for sufficient time and then released to retrieve the original dynamics. Temporary perturbations have the apparent advantage of averting unforeseen consequences, which might be induced by permanent perturbations. Despite the infamous state-space explosion problem, in this work, we develop an efficient method to compute the temporary target control for a given target attractor of a Boolean network. We apply our method to a number of real-life biological networks and compare its performance with the stable motif-based control method to demonstrate its efficacy and efficiency.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Models Genesis
Authors:
Zongwei Zhou,
Vatsal Sodha,
Jiaxuan Pang,
Michael B. Gotway,
Jianming Liang
Abstract:
Transfer learning from natural images to medical images has been established as one of the most practical paradigms in deep learning for medical image analysis. To fit this paradigm, however, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information, thereby inevitably compromising its performance. T…
▽ More
Transfer learning from natural images to medical images has been established as one of the most practical paradigms in deep learning for medical image analysis. To fit this paradigm, however, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information, thereby inevitably compromising its performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learnt by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch and existing pre-trained 3D models in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D/2.5D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated and recurrent anatomy in medical images can serve as strong yet free supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all codes and pre-trained Models Genesis are available at https://github.com/MrGiovanni/ModelsGenesis.
△ Less
Submitted 16 December, 2020; v1 submitted 9 April, 2020;
originally announced April 2020.
-
Sequential Control of Boolean Networks with Temporary and Permanent Perturbations
Authors:
Cui Su,
Jun Pang
Abstract:
Direct cell reprogramming makes it feasible to reprogram abundant somatic cells into desired cells. It has great potential for regenerative medicine and tissue engineering. In this work, we study the control of biological networks, modelled as Boolean networks, to identify control paths driving the dynamics of the network from a source attractor (undesired cells) to the target attractor (desired c…
▽ More
Direct cell reprogramming makes it feasible to reprogram abundant somatic cells into desired cells. It has great potential for regenerative medicine and tissue engineering. In this work, we study the control of biological networks, modelled as Boolean networks, to identify control paths driving the dynamics of the network from a source attractor (undesired cells) to the target attractor (desired cells). Instead of achieving control in one step, we develop attractor-based sequential temporary and permanent control methods (AST and ASP) to identify a sequence of interventions that can alter the dynamics in a stepwise manner. To improve their feasibility, both AST and ASP only use biologically observable attractors as intermediates. They can find the shortest sequential paths and guarantee 100% reachability of the target attractor. We apply the two methods to several real-life biological networks and compare their performance with the attractor-based sequential instantaneous control (ASI). The results demonstrate that AST and ASP have the ability to identify a richer set of control paths with fewer perturbations than ASI, which will greatly facilitate practical applications.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
3D Point Cloud Enhancement using Graph-Modelled Multiview Depth Measurements
Authors:
Xue Zhang,
Gene Cheung,
Jiahao Pang,
Dong Tian
Abstract:
A 3D point cloud is often synthesized from depth measurements collected by sensors at different viewpoints. The acquired measurements are typically both coarse in precision and corrupted by noise. To improve quality, previous works denoise a synthesized 3D point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements on the sensed images a…
▽ More
A 3D point cloud is often synthesized from depth measurements collected by sensors at different viewpoints. The acquired measurements are typically both coarse in precision and corrupted by noise. To improve quality, previous works denoise a synthesized 3D point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements on the sensed images a priori, exploiting inherent 3D geometric correlation across views, before synthesizing a 3D point cloud from the improved measurements. By enhancing closer to the actual sensing process, we benefit from optimization targeting specifically the depth image formation model, before subsequent processing steps that can further obscure measurement errors. Mathematically, for each pixel row in a pair of rectified viewpoint depth images, we first construct a graph reflecting inter-pixel similarities via metric learning using data in previous enhanced rows. To optimize left and right viewpoint images simultaneously, we write a non-linear mapping function from left pixel row to the right based on 3D geometry relations. We formulate a MAP optimization problem, which, after suitable linear approximations, results in an unconstrained convex and differentiable objective, solvable using fast gradient method (FGM). Experimental results show that our method noticeably outperforms recent denoising algorithms that enhance after 3D point clouds are synthesized.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Capacity-achieving Polar-based LDGM Codes with Crowdsourcing Applications
Authors:
James Chin-Jen Pang,
Hessam Mahdavifar,
S. Sandeep Pradhan
Abstract:
In this paper we study codes with sparse generator matrices. More specifically, codes with a certain constraint on the weight of all the columns in the generator matrix are considered. The end result is the following. For any binary-input memoryless symmetric (BMS) channel and any epsilon > 2 epsilon*, where epsilon^* = \frac{1}{6}-\frac{5}{3}\log{\frac{4}{3}} \approx 0.085, we show an explicit se…
▽ More
In this paper we study codes with sparse generator matrices. More specifically, codes with a certain constraint on the weight of all the columns in the generator matrix are considered. The end result is the following. For any binary-input memoryless symmetric (BMS) channel and any epsilon > 2 epsilon*, where epsilon^* = \frac{1}{6}-\frac{5}{3}\log{\frac{4}{3}} \approx 0.085, we show an explicit sequence of capacity-achieving codes with all the column wights of the generator matrix upper bounded by (\log N)^{1+epsilon}, where N is the code block length. The constructions are based on polar codes. Applications to crowdsourcing are also shown.
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
MMDetection: Open MMLab Detection Toolbox and Benchmark
Authors:
Kai Chen,
Jiaqi Wang,
Jiangmiao Pang,
Yuhang Cao,
Yu Xiong,
Xiaoxiao Li,
Shuyang Sun,
Wansen Feng,
Ziwei Liu,
Jiarui Xu,
Zheng Zhang,
Dazhi Cheng,
Chenchen Zhu,
Tianheng Cheng,
Qijie Zhao,
Buyu Li,
Xin Lu,
Rui Zhu,
Yue Wu,
Jifeng Dai,
Jingdong Wang,
Jianping Shi,
Wanli Ouyang,
Chen Change Loy,
Dahua Lin
Abstract:
We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not onl…
▽ More
We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not only includes training and inference codes, but also provides weights for more than 200 network models. We believe this toolbox is by far the most complete detection toolbox. In this paper, we introduce the various features of this toolbox. In addition, we also conduct a benchmarking study on different methods, components, and their hyper-parameters. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. Code and models are available at https://github.com/open-mmlab/mmdetection. The project is under active development and we will keep this document updated.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Online Observability of Boolean Control Networks
Authors:
Guisen Wu,
Liyun Dai,
Zhiming Liu,
Taolue Chen,
Jun Pang
Abstract:
Observabililty is an important topic of Boolean control networks (BCNs). In this paper, we propose a new type of observability named online observability to present the sufficient and necessary condition of determining the initial states of BCNs, when their initial states cannot be reset. And we design an algorithm to decide whether a BCN has the online observability. Moreover, we prove that a BCN…
▽ More
Observabililty is an important topic of Boolean control networks (BCNs). In this paper, we propose a new type of observability named online observability to present the sufficient and necessary condition of determining the initial states of BCNs, when their initial states cannot be reset. And we design an algorithm to decide whether a BCN has the online observability. Moreover, we prove that a BCN is identifiable iff it satisfies controllability and the online observability, which reveals the essence of identification problem of BCNs.
△ Less
Submitted 5 December, 2020; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Competitive Online Optimization under Inventory Constraints
Authors:
Qiulin Lin,
Hanling Yi,
John Pang,
Minghua Chen,
Adam Wierman,
Michael Honig,
Yuanzhang Xiao
Abstract:
This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf…
▽ More
This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf{CR-Pursuit}, and prove that it achieves the minimal competitive ratio among all deterministic algorithms (up to a problem-dependent constant factor) for inventory-constrained online optimization. Our algorithm and its analysis not only simplify and unify the state-of-the-art results for the standard one-way trading problem, but they also establish novel bounds for generalizations including concave revenue functions. For example, for one-way trading with price elasticity, the \textsf{CR-Pursuit} algorithm achieves a competitive ratio that is within a small additive constant (i.e., 1/3) to the lower bound of $\ln θ+1$, where $θ$ is the ratio between the maximum and minimum base prices.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
Towards the Existential Control of Boolean Networks: A Preliminary Report (Extended Abstract)
Authors:
Soumya Paul,
Jun Pang,
Cui Su
Abstract:
Given a Boolean network BN and a subset A of attractors of BN, we study the problem of identifying a minimal subset C of vertices of BN, such that the dynamics of BN can reach from a state s in any attractor As in A to any attractor At in A by controlling or toggling a subset of vertices in C in a single time step. We describe a method based on the decomposition of the network structure into stron…
▽ More
Given a Boolean network BN and a subset A of attractors of BN, we study the problem of identifying a minimal subset C of vertices of BN, such that the dynamics of BN can reach from a state s in any attractor As in A to any attractor At in A by controlling or toggling a subset of vertices in C in a single time step. We describe a method based on the decomposition of the network structure into strongly connected components called blocks. The control subset can be locally computed for each such block and the results then merged to derive the global control subset C. This potentially improves the efficiency for many real-life networks that are large but modular and well-structured. We are currently in the process of implementing our method in software.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
A Decomposition-based Approach towards the Control of Boolean Networks (Technical Report)
Authors:
Soumya Paul,
Cui Su,
Jun Pang,
Andrzej Mizera
Abstract:
We study the problem of computing a minimal subset of nodes of a given asynchronous Boolean network that need to be controlled to drive its dynamics from an initial steady state (or attractor) to a target steady state. Due to the phenomenon of state-space explosion, a simple global approach that performs computations on the entire network, may not scale well for large networks. We believe that eff…
▽ More
We study the problem of computing a minimal subset of nodes of a given asynchronous Boolean network that need to be controlled to drive its dynamics from an initial steady state (or attractor) to a target steady state. Due to the phenomenon of state-space explosion, a simple global approach that performs computations on the entire network, may not scale well for large networks. We believe that efficient algorithms for such networks must exploit the structure of the networks together with their dynamics. Taking such an approach, we derive a decomposition-based solution to the minimal control problem which can be significantly faster than the existing approaches on large networks. We apply our solution to both real-life biological networks and randomly generated networks, demonstrating promising results.
△ Less
Submitted 17 May, 2018; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Distributed Optimal Frequency Control Considering a Nonlinear Network-Preserving Model
Authors:
Zhaojian Wang,
Feng Liu,
John Z. F. Pang,
Steven Low,
Shengwei Mei
Abstract:
This paper addresses the distributed optimal frequency control of power systems considering a network-preserving model with nonlinear power flows and excitation voltage dynamics. Salient features of the proposed distributed control strategy are fourfold: i) nonlinearity is considered to cope with large disturbances; ii) only a part of generators are controllable; iii) no load measurement is requir…
▽ More
This paper addresses the distributed optimal frequency control of power systems considering a network-preserving model with nonlinear power flows and excitation voltage dynamics. Salient features of the proposed distributed control strategy are fourfold: i) nonlinearity is considered to cope with large disturbances; ii) only a part of generators are controllable; iii) no load measurement is required; iv) communication connectivity is required only for the controllable generators. To this end, benefiting from the concept of 'virtual load demand', we first design the distributed controller for the controllable generators by leveraging the primal-dual decomposition technique. We then propose a method to estimate the virtual load demand of each controllable generator based on local frequencies. We derive incremental passivity conditions for the uncontrollable generators. Finally, we prove that the closed-loop system is asymptotically stable and its equilibrium attains the optimal solution to the associated economic dispatch problem. Simulations, including small and large-disturbance scenarios, are carried on the New England system, demonstrating the effectiveness of our design.
△ Less
Submitted 13 February, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.