-
Realization of Precise Perforating Using Dynamic Threshold and Physical Plausibility Algorithm for Self-Locating Perforating in Oil and Gas Wells
Authors:
Siyu Xiao,
Guohui Ren,
Tianhao Mao,
Yuqiao Chen,
YiAn Liu,
Junjie Wang,
Kai Tang,
Xindi Zhao,
Zhijian Yu,
Shuang Liu,
Tupei Chen,
Yang Liu
Abstract:
Accurate depth measurement is essential for optimizing oil and gas resource development, as it directly impacts production efficiency. However, achieving precise depth and perforating at the correct location remains a significant challenge due to field operational constraints and equipment limitations. In this work, we propose the Dynamic Threshold and Physical Plausibility Depth Measurement and P…
▽ More
Accurate depth measurement is essential for optimizing oil and gas resource development, as it directly impacts production efficiency. However, achieving precise depth and perforating at the correct location remains a significant challenge due to field operational constraints and equipment limitations. In this work, we propose the Dynamic Threshold and Physical Plausibility Depth Measurement and Perforation Control (DTPPMP) system, a solution integrated into perforating guns that enables real-time, precise depth measurement and perforation at designated perforating intervals. The system autonomously samples, processes and identifies signals from a casing collar locator (CCL) in situ within oil and gas wells. Casing collar identification is achieved using a lightweight dynamic threshold and physical plausibility algorithm deployed on an embedded platform, which serves as the system's processor. Field tests conducted in an actual oil well in Sichuan, China, demonstrated the DTPPMP's ability to accurately identify casing collar signals, measure depths, and effectively perforate at designated perforating intervals in real-time. The system achieved a perforation variation of less than the length of a single perforating interval and a F1 score of 98.6% for casing collar identification. These results provide valuable recommendations for advancing automation and intelligence in future perforation operations.
△ Less
Submitted 30 August, 2025;
originally announced September 2025.
-
A Selection of Distributions and Their Fourier Transforms with Applications in Magnetic Resonance Imaging
Authors:
Kaibo Tang
Abstract:
This note presents a rigorous introduction to a selection of distributions along with their Fourier transforms, which are commonly encountered in signal processing and, in particular, magnetic resonance imaging (MRI). In contrast to many textbooks on the principles of MRI, which place more emphasis on the signal processing aspect, this note will take a more mathematical approach. In particular, we…
▽ More
This note presents a rigorous introduction to a selection of distributions along with their Fourier transforms, which are commonly encountered in signal processing and, in particular, magnetic resonance imaging (MRI). In contrast to many textbooks on the principles of MRI, which place more emphasis on the signal processing aspect, this note will take a more mathematical approach. In particular, we will make explicit the underlying topological space of interest and clarify the exact sense in which these distributions and their Fourier transforms are defined. Key results presented in this note involve the Poisson summation formula and the Fourier transform of a Gaussian function via an ordinary differential equation (ODE) argument, etc. Although the readers are expected to have prior exposure to functional analysis and distribution theory, this note is intended to be self-contained.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
Authors:
Nitin Venkateswaran,
Kevin Tang,
Ratree Wayland
Abstract:
Traditional models of accent perception underestimate the role of gradient variations in phonological features which listeners rely upon for their accent judgments. We investigate how pretrained representations from current self-supervised learning (SSL) models of speech encode phonological feature-level variations that influence the perception of segmental accent. We focus on three segments: the…
▽ More
Traditional models of accent perception underestimate the role of gradient variations in phonological features which listeners rely upon for their accent judgments. We investigate how pretrained representations from current self-supervised learning (SSL) models of speech encode phonological feature-level variations that influence the perception of segmental accent. We focus on three segments: the labiodental approximant, the rhotic tap, and the retroflex stop, which are uniformly produced in the English of native speakers of Hindi as well as other languages in the Indian sub-continent. We use the CSLU Foreign Accented English corpus (Lander, 2007) to extract, for these segments, phonological feature probabilities using Phonet (Vásquez-Correa et al., 2019) and pretrained representations from Wav2Vec2-BERT (Barrault et al., 2023) and WavLM (Chen et al., 2022) along with accent judgements by native speakers of American English. Probing analyses show that accent strength is best predicted by a subset of the segment's pretrained representation features, in which perceptually salient phonological features that contrast the expected American English and realized non-native English segments are given prominent weighting. A multinomial logistic regression of pretrained representation-based segment distances from American and Indian English baselines on accent ratings reveals strong associations between the odds of accent strength and distances from the baselines, in the expected directions. These results highlight the value of self-supervised speech representations for modeling accent perception using interpretable phonological features.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Automatic Speech Recognition Biases in Newcastle English: an Error Analysis
Authors:
Dana Serditova,
Kevin Tang,
Jochen Steffens
Abstract:
Automatic Speech Recognition (ASR) systems struggle with regional dialects due to biased training which favours mainstream varieties. While previous research has identified racial, age, and gender biases in ASR, regional bias remains underexamined. This study investigates ASR performance on Newcastle English, a well-documented regional dialect known to be challenging for ASR. A two-stage analysis…
▽ More
Automatic Speech Recognition (ASR) systems struggle with regional dialects due to biased training which favours mainstream varieties. While previous research has identified racial, age, and gender biases in ASR, regional bias remains underexamined. This study investigates ASR performance on Newcastle English, a well-documented regional dialect known to be challenging for ASR. A two-stage analysis was conducted: first, a manual error analysis on a subsample identified key phonological, lexical, and morphosyntactic errors behind ASR misrecognitions; second, a case study focused on the systematic analysis of ASR recognition of the regional pronouns ``yous'' and ``wor''. Results show that ASR errors directly correlate with regional dialectal features, while social factors play a lesser role in ASR mismatches. We advocate for greater dialectal diversity in ASR training data and highlight the value of sociolinguistic analysis in diagnosing and addressing regional biases.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Automatic Speech Recognition of African American English: Lexical and Contextual Effects
Authors:
Hamid Mojarad,
Kevin Tang
Abstract:
Automatic Speech Recognition (ASR) models often struggle with the phonetic, phonological, and morphosyntactic features found in African American English (AAE). This study focuses on two key AAE variables: Consonant Cluster Reduction (CCR) and ING-reduction. It examines whether the presence of CCR and ING-reduction increases ASR misrecognition. Subsequently, it investigates whether end-to-end ASR s…
▽ More
Automatic Speech Recognition (ASR) models often struggle with the phonetic, phonological, and morphosyntactic features found in African American English (AAE). This study focuses on two key AAE variables: Consonant Cluster Reduction (CCR) and ING-reduction. It examines whether the presence of CCR and ING-reduction increases ASR misrecognition. Subsequently, it investigates whether end-to-end ASR systems without an external Language Model (LM) are more influenced by lexical neighborhood effect and less by contextual predictability compared to systems with an LM. The Corpus of Regional African American Language (CORAAL) was transcribed using wav2vec 2.0 with and without an LM. CCR and ING-reduction were detected using the Montreal Forced Aligner (MFA) with pronunciation expansion. The analysis reveals a small but significant effect of CCR and ING on Word Error Rate (WER) and indicates a stronger presence of lexical neighborhood effect in ASR systems without LMs.
△ Less
Submitted 23 August, 2025; v1 submitted 7 June, 2025;
originally announced June 2025.
-
Optimal Sensor Placement Using Combinations of Hybrid Measurements for Source Localization
Authors:
Kang Tang,
Sheng Xu,
Yuqi Yang,
He Kong,
Yongsheng Ma
Abstract:
This paper focuses on static source localization employing different combinations of measurements, including time-difference-of-arrival (TDOA), received-signal-strength (RSS), angle-of-arrival (AOA), and time-of-arrival (TOA) measurements. Since sensor-source geometry significantly impacts localization accuracy, the strategies of optimal sensor placement are proposed systematically using combinati…
▽ More
This paper focuses on static source localization employing different combinations of measurements, including time-difference-of-arrival (TDOA), received-signal-strength (RSS), angle-of-arrival (AOA), and time-of-arrival (TOA) measurements. Since sensor-source geometry significantly impacts localization accuracy, the strategies of optimal sensor placement are proposed systematically using combinations of hybrid measurements. Firstly, the relationship between sensor placement and source estimation accuracy is formulated by a derived Cramér-Rao bound (CRB). Secondly, the A-optimality criterion, i.e., minimizing the trace of the CRB, is selected to calculate the smallest reachable estimation mean-squared-error (MSE) in a unified manner. Thirdly, the optimal sensor placement strategies are developed to achieve the optimal estimation bound. Specifically, the specific constraints of the optimal geometries deduced by specific measurement, i.e., TDOA, AOA, RSS, and TOA, are found and discussed theoretically. Finally, the new findings are verified by simulation studies.
△ Less
Submitted 9 April, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
Prototyping and Test of the "Canis" HTS Planar Coil Array for Stellarator Field Shaping
Authors:
D. Nash,
D. A. Gates,
W. S. Walsh,
M. Slepchenkov,
D. Guan,
A. D. Cate,
B. Chen,
M. Dickerson,
W. Harris,
U. Khera,
M. Korman,
S. Srinivasan,
C. P. S. Swanson,
A. van Riel,
R. H. Wu,
A. S. Basurto,
B. Berzin,
E. Brown,
C. Chen,
T. Ikuss,
W. B. Kalb,
C. Khurana,
B. D. Koehne,
T. G. Kruger,
S. Noronha
, et al. (8 additional authors not shown)
Abstract:
Thea Energy, Inc. is currently developing the "Eos" planar coil stellarator, the Company's first integrated fusion system capable of forming optimized stellarator magnetic fields without complex and costly modular coils. To demonstrate the field shaping capability required to enable Eos, Thea Energy designed, constructed, and tested the "Canis" 3x3 array of high-temperature superconductor (HTS) pl…
▽ More
Thea Energy, Inc. is currently developing the "Eos" planar coil stellarator, the Company's first integrated fusion system capable of forming optimized stellarator magnetic fields without complex and costly modular coils. To demonstrate the field shaping capability required to enable Eos, Thea Energy designed, constructed, and tested the "Canis" 3x3 array of high-temperature superconductor (HTS) planar shaping coils after successfully demonstrating a single shaping coil prototype. Through the Canis 3x3 magnet array program, Thea Energy manufactured nine HTS shaping coils and developed the cryogenic test and measurement infrastructure necessary to validate the array's performance. Thea Energy operated the array at 20 K, generating several stellarator-relevant magnetic field shapes and demonstrating closed loop field control of the superconducting magnets to within 1% of predicted field, a margin of error acceptable for operation of an integrated stellarator. The Canis magnet array test campaign provides a proof of concept for HTS planar shaping coils as a viable approach to confining stellarator plasmas.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Sig2text, a Vision-language model for Non-cooperative Radar Signal Parsing
Authors:
Hancong Feng KaiLI Jiang Bin tang
Abstract:
Automatic non-cooperative analysis of intercepted radar signals is essential for intelligent equipment in both military and civilian domains. Accurate modulation identification and parameter estimation enable effective signal classification, threat assessment, and the development of countermeasures. In this paper, we propose a symbolic approach for radar signal recognition and parameter estimation…
▽ More
Automatic non-cooperative analysis of intercepted radar signals is essential for intelligent equipment in both military and civilian domains. Accurate modulation identification and parameter estimation enable effective signal classification, threat assessment, and the development of countermeasures. In this paper, we propose a symbolic approach for radar signal recognition and parameter estimation based on a vision-language model that combines context-free grammar with time-frequency representation of radar waveforms. The proposed model, called Sig2text, leverages the power of vision transformers for time-frequency feature extraction and transformer-based decoders for symbolic parsing of radar waveforms. By treating radar signal recognition as a parsing problem, Sig2text can effectively recognize and parse radar waveforms with different modulation types and parameters. We evaluate the performance of Sig2text on a synthetic radar signal dataset and demonstrate its effectiveness in recognizing and parsing radar waveforms with varying modulation types and parameters. The training code of the model is available at https://github.com/Na-choneko/sig2text.
△ Less
Submitted 15 April, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Thévenin Equivalent Parameters Identification Based on Statistical Characteristics of System Ambient Data
Authors:
Boying Zhou,
Chen Shen,
Kexuan Tang
Abstract:
This paper proposes a novel method for identifying Thévenin equivalent parameters (TEP) in power system, based on the statistical characteristics of the system's stochastic response. The method leverages stochastic fluctuation data under steady-state grid conditions and applies sliding window techniques to compute sensitivity parameters between voltage magnitude, current magnitude and power. This…
▽ More
This paper proposes a novel method for identifying Thévenin equivalent parameters (TEP) in power system, based on the statistical characteristics of the system's stochastic response. The method leverages stochastic fluctuation data under steady-state grid conditions and applies sliding window techniques to compute sensitivity parameters between voltage magnitude, current magnitude and power. This enables high-accuracy and robust TEP identification. In contrast to traditional methods, the proposed approach does not rely on large disturbances or probing signals but instead utilizes the natural fluctuation behavior of the system. Additionally, the method supports distributed implementation using local measurements of voltage magnitude, current magnitude, and power, offering significant practical value for engineering applications. The theoretical analysis demonstrates the method's robustness in the presence of low signal-to-noise ratio (SNR), asynchronous measurements, and data collinearity issues. Simulation results further confirm the effectiveness of the proposed method in diverse practical scenarios, demonstrating its ability to consistently provide accurate and reliable identification of TEP using system ambient data.
△ Less
Submitted 30 June, 2025; v1 submitted 11 December, 2024;
originally announced December 2024.
-
A Probabilistic Approach for Queue Length Estimation Using License Plate Recognition Data: Considering Overtaking in Multi-lane Scenarios
Authors:
Lyuzhou Luo,
Hao Wu,
Jiahao Liu,
Keshuang Tang,
Chaopeng Tan
Abstract:
Multi-section license plate recognition (LPR) data provides input-output information and sampled travel times of the investigated link, serving as an ideal data source for lane-based queue length estimation in recent studies. However, most of these studies assumed the strict FIFO rule or a specific arrival process, thus ignoring the potential impact of overtaking and the variation of traffic flows…
▽ More
Multi-section license plate recognition (LPR) data provides input-output information and sampled travel times of the investigated link, serving as an ideal data source for lane-based queue length estimation in recent studies. However, most of these studies assumed the strict FIFO rule or a specific arrival process, thus ignoring the potential impact of overtaking and the variation of traffic flows, especially in multi-lane scenarios. To address this issue, we propose a probabilistic approach to derive the stochastic queue length by constructing a conditional probability model of no-delay arrival time (NAT), i.e., the arrival time of vehicles without experiencing any delay, based on multi-section LPR data. First, the NAT conditions for all vehicles are established based on upstream and downstream vehicle departure times and sequences. To reduce the computational dimensionality and complexity, a DP-based algorithm is developed for vehicle group partitioning based on potential interactions between vehicles. Then, the conditional probability of NATs of each vehicle group is derived and an MCMC sampling method is employed for calculation. Subsequently, the stochastic queue profile and maximum queue length for each cycle can be derived based on the NATs of vehicles. Eventually, to leverage the LPR data sufficiently, we extend our approach to multi-lane scenarios, where the problem can be converted to a weighted general exact coverage problem and solved by a backtracking algorithm with heuristics. Empirical and simulation experiments have shown that the proposed approach outperforms the state-of-the-art method, demonstrating significant improvements in accuracy and robustness across various traffic conditions, including different V/C ratios, matching rates, and FIFO violation rates. In addition, the performance of the proposed approach can be further improved by utilizing multi-lane LPR data.
△ Less
Submitted 24 July, 2024;
originally announced August 2024.
-
Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices
Authors:
Hannah Zhou,
Allison Chen,
Celine Buer,
Emily Chen,
Kayleen Tang,
Lauryn Gong,
Zhiqi Liu,
Jianbin Tang
Abstract:
This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors i…
▽ More
This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors investigate fall detection models that are based on different sensors, comparing their accuracy rates and performance. Furthermore, they employ the technique of knowledge distillation to enhance the models' precision, resulting in refined accurate configurations that consume lower power. As a result, this proposed solution presents a compelling avenue for the development of energy-efficient fall detection systems for future advancements in this critical domain.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease
Authors:
Lan Wang,
Ruiling He,
Lili Zhao,
Jia Wang,
Zhengzi Geng,
Tao Ren,
Guo Zhang,
Peng Zhang,
Kaiqiang Tang,
Chaofei Gao,
Fei Chen,
Liting Zhang,
Yonghe Zhou,
Xin Li,
Fanbin He,
Hui Huan,
Wenjuan Wang,
Yunxiao Liang,
Juan Tang,
Fang Ai,
Tingyu Wang,
Liyun Zheng,
Zhongwei Zhao,
Jiansong Ji,
Wei Liu
, et al. (22 additional authors not shown)
Abstract:
Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV).
Design: A prospective multicenter study was conducted in patients with…
▽ More
Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV).
Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV.
Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01).
Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
MetaUE: Model-based Meta-learning for Underwater Image Enhancement
Authors:
Zhenwei Zhang,
Haorui Yan,
Ke Tang,
Yuping Duan
Abstract:
The challenges in recovering underwater images are the presence of diverse degradation factors and the lack of ground truth images. Although synthetic underwater image pairs can be used to overcome the problem of inadequately observing data, it may result in over-fitting and enhancement degradation. This paper proposes a model-based deep learning method for restoring clean images under various und…
▽ More
The challenges in recovering underwater images are the presence of diverse degradation factors and the lack of ground truth images. Although synthetic underwater image pairs can be used to overcome the problem of inadequately observing data, it may result in over-fitting and enhancement degradation. This paper proposes a model-based deep learning method for restoring clean images under various underwater scenarios, which exhibits good interpretability and generalization ability. More specifically, we build up a multi-variable convolutional neural network model to estimate the clean image, background light and transmission map, respectively. An efficient loss function is also designed to closely integrate the variables based on the underwater image model. The meta-learning strategy is used to obtain a pre-trained model on the synthetic underwater dataset, which contains different types of degradation to cover the various underwater environments. The pre-trained model is then fine-tuned on real underwater datasets to obtain a reliable underwater image enhancement model, called MetaUE. Numerical experiments demonstrate that the pre-trained model has good generalization ability, allowing it to remove the color degradation for various underwater attenuation images such as blue, green and yellow, etc. The fine-tuning makes the model able to adapt to different underwater datasets, the enhancement results of which outperform the state-of-the-art underwater image restoration methods. All our codes and data are available at \url{https://github.com/Duanlab123/MetaUE}.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Hyperuniform disordered parametric loudspeaker array
Authors:
Kun Tang,
Yuqi Wang,
Shaobo Wang,
Da Gao,
Haojie Li,
Xindong Liang,
Patrick Sebbah,
Yibin Li,
Jin Zhang,
Junhui Shi
Abstract:
A steerable parametric loudspeaker array is known for its directivity and narrow beam width. However, it often suffers from the grating lobes due to periodic array distributions. Here we propose the array configuration of hyperuniform disorder, which is short-range random while correlated at large scales, as a promising alternative distribution of acoustic antennas in phased arrays. Angle-resolved…
▽ More
A steerable parametric loudspeaker array is known for its directivity and narrow beam width. However, it often suffers from the grating lobes due to periodic array distributions. Here we propose the array configuration of hyperuniform disorder, which is short-range random while correlated at large scales, as a promising alternative distribution of acoustic antennas in phased arrays. Angle-resolved measurements reveal that the proposed array suppresses grating lobes and maintains a minimal radiation region in the vicinity of the main lobe for the primary frequency waves. These distinctive emission features benefit the secondary frequency wave in canceling the grating lobes regardless of the frequencies of the primary waves. Besides that, the hyperuniform disordered array is duplicatable, which facilitates extra-large array design without any additional computational efforts.
△ Less
Submitted 13 April, 2023; v1 submitted 2 January, 2023;
originally announced January 2023.
-
Economic Potential for Hybrid Electric Vehicles in Urban Signal-free Intersections with Decentralized MPC
Authors:
Kai Tang,
Weijie Wang,
Xiao Pan,
Boli Chen,
Simos A. Evangelou
Abstract:
The development of electric and connected vehicles as well as automated driving technologies are key towards the smart city, with convenient urban mobility and high energy economy performance. However, the global rise in electricity price provokes renewed interest on CAVs with hybrid electric powertrains rather than considering battery electric powertrains. This paper provides a decentralized coor…
▽ More
The development of electric and connected vehicles as well as automated driving technologies are key towards the smart city, with convenient urban mobility and high energy economy performance. However, the global rise in electricity price provokes renewed interest on CAVs with hybrid electric powertrains rather than considering battery electric powertrains. This paper provides a decentralized coordination strategy for a group of connected and autonomous vehicles (CAVs) with a series hybrid electric (sHEV) powertrain at signal-free intersections. The problem is formulated as a convex form with suitable relaxation and approximation of the powertrain model and solved by decentralized model predictive control, which is able to ensure a rapid search and unique solution in real time. Numerical examples validate the effectiveness of the proposed methods concerning physical and safety constraints. By utilizing the petrol fuel and battery charging prices over the last year, the performance of the proposed approach is evaluated against the optimal results produced by two benchmark solutions, conventional vehicles (CVs) and battery electric vehicles (BEVs). The comparison results show that the traveling cost of sHEVs approaches and even under some circumstances reaches the same level as for BEVs, which indicates the importance of hybridization, particularly under the current rising electricity price situation.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
Adversarial Attacks on ASR Systems: An Overview
Authors:
Xiao Zhang,
Hao Tan,
Xuan Huang,
Denghui Zhang,
Keke Tang,
Zhaoquan Gu
Abstract:
With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment become easier, ASR systems are getting closer to our life. On the one hand, we often use APPs or APIs of ASR to generate subtitles and record meetings. On the other hand, smart speaker and self-driving car rely on ASR syste…
▽ More
With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment become easier, ASR systems are getting closer to our life. On the one hand, we often use APPs or APIs of ASR to generate subtitles and record meetings. On the other hand, smart speaker and self-driving car rely on ASR systems to control AIoT devices. In past few years, there are a lot of works on adversarial examples attacks against ASR systems. By adding a small perturbation to the waveforms, the recognition results make a big difference. In this paper, we describe the development of ASR system, different assumptions of attacks, and how to evaluate these attacks. Next, we introduce the current works on adversarial examples attacks from two attack assumptions: white-box attack and black-box attack. Different from other surveys, we pay more attention to which layer they perturb waveforms in ASR system, the relationship between these attacks, and their implementation methods. We focus on the effect of their works.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Fast Multi-grid Methods for Minimizing Curvature Energy
Authors:
Zhenwei Zhang,
Ke Chen,
Ke Tang,
Yuping Duan
Abstract:
The geometric high-order regularization methods such as mean curvature and Gaussian curvature, have been intensively studied during the last decades due to their abilities in preserving geometric properties including image edges, corners, and contrast. However, the dilemma between restoration quality and computational efficiency is an essential roadblock for high-order methods. In this paper, we p…
▽ More
The geometric high-order regularization methods such as mean curvature and Gaussian curvature, have been intensively studied during the last decades due to their abilities in preserving geometric properties including image edges, corners, and contrast. However, the dilemma between restoration quality and computational efficiency is an essential roadblock for high-order methods. In this paper, we propose fast multi-grid algorithms for minimizing both mean curvature and Gaussian curvature energy functionals without sacrificing accuracy for efficiency. Unlike the existing approaches based on operator splitting and the Augmented Lagrangian method (ALM), no artificial parameters are introduced in our formulation, which guarantees the robustness of the proposed algorithm. Meanwhile, we adopt the domain decomposition method to promote parallel computing and use the fine-to-coarse structure to accelerate convergence. Numerical experiments are presented on image denoising, CT, and MRI reconstruction problems to demonstrate the superiority of our method in preserving geometric structures and fine details. The proposed method is also shown effective in dealing with large-scale image processing problems by recovering an image of size $1024\times 1024$ within $40$s, while the ALM method requires around $200$s.
△ Less
Submitted 11 March, 2023; v1 submitted 17 April, 2022;
originally announced April 2022.
-
Approximate Optimal Filter for Linear Gaussian Time-invariant Systems
Authors:
Kaiming Tang,
Shengbo Eben Li,
Yuming Yin,
Yang Guan,
Jingliang Duan,
Wenhan Cao,
Jie Li
Abstract:
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, call…
▽ More
State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Denoising convolutional neural networks for photoacoustic microscopy
Authors:
Xianlin Song,
Kanggao Tang,
Jianshuang Wei,
Lingfang Song
Abstract:
Photoacoustic imaging is a new imaging technology in recent years, which combines the advantages of high resolution and rich contrast of optical imaging with the advantages of high penetration depth of acoustic imaging. Photoacoustic imaging has been widely used in biomedical fields, such as brain imaging, tumor detection and so on. The signal-to-noise ratio (SNR) of image signals in photoacoustic…
▽ More
Photoacoustic imaging is a new imaging technology in recent years, which combines the advantages of high resolution and rich contrast of optical imaging with the advantages of high penetration depth of acoustic imaging. Photoacoustic imaging has been widely used in biomedical fields, such as brain imaging, tumor detection and so on. The signal-to-noise ratio (SNR) of image signals in photoacoustic imaging is generally low due to the limitation of laser pulse energy, electromagnetic interference in the external environment and system noise. In order to solve the problem of low SNR of photoacoustic images, we use feedforward denoising convolutional neural network to further process the obtained images, so as to obtain higher SNR images and improve image quality. We use Python language to manage the referenced Python external library through Anaconda, and build a feedforward noise-reducing convolutional neural network on Pycharm platform.We first processed and segmated a training set containing 400 images, and then used it for network training. Finally, we tested it with a series of cerebrovascular photoacoustic microscopy images.The results show that the peak signal-to-noise ratio (PSNR) of the image increases significantly before and after denoising.The experimental results verify that the feed-forward noise reduction convolutional neural network can effectively improve the quality of photoacoustic microscopic images, which provides a good foundation for the subsequent biomedical research.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Reinforcement Solver for H-infinity Filter with Bounded Noise
Authors:
Jie Li,
Shengbo Eben Li,
Kaiming Tang,
Yao Lv,
Wenhan Cao
Abstract:
H-infinity filter has been widely applied in engineering field, but copping with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measu…
▽ More
H-infinity filter has been widely applied in engineering field, but copping with bounded noise is still an open problem and difficult to solve. This paper considers the H-infinity filtering problem for linear system with bounded process and measurement noise. The problem is first formulated as a zero-sum game where the dynamic of estimation error is non-affine with respect to filter gain and measurement noise. A nonquadratic Hamilton-Jacobi-Isaacs (HJI) equation is then derived by employing a nonquadratic cost to characterize bounded noise, which is extremely difficult to solve due to its non-affine and nonlinear properties. Next, a reinforcement learning algorithm based on gradient descent method which can handle nonlinearity is proposed to update the gain of reinforcement filter, where measurement noise is fixed to tackle non-affine property and increase the convexity of Hamiltonian. Two examples demonstrate the convergence and effectiveness of the proposed algorithm.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Prosody leaks into the memories of words
Authors:
Kevin Tang,
Jason A. Shaw
Abstract:
The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stor…
▽ More
The average predictability (aka informativity) of a word in context has been shown to condition word duration (Seyfarth, 2014). All else being equal, words that tend to occur in more predictable environments are shorter than words that tend to occur in less predictable environments. One account of the informativity effect on duration is that the acoustic details of probabilistic reduction are stored as part of a word's mental representation. Other research has argued that predictability effects are tied to prosodic structure in integral ways. With the aim of assessing a potential prosodic basis for informativity effects in speech production, this study extends past work in two directions; it investigated informativity effects in another large language, Mandarin Chinese, and broadened the study beyond word duration to additional acoustic dimensions, pitch and intensity, known to index prosodic prominence. The acoustic information of content words was extracted from a large telephone conversation speech corpus with over 400,000 tokens and 6,000 word types spoken by 1,655 individuals and analyzed for the effect of informativity using frequency statistics estimated from a 431 million word subtitle corpus. Results indicated that words with low informativity have shorter durations, replicating the effect found in English. In addition, informativity had significant effects on maximum pitch and intensity, two phonetic dimensions related to prosodic prominence. Extending this interpretation, these results suggest that predictability is closely linked to prosodic prominence, and that the lexical representation of a word includes phonetic details associated with its average prosodic prominence in discourse. In other words, the lexicon absorbs prosodic influences on speech production.
△ Less
Submitted 29 December, 2020; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Error Model of Radio Fingerprint and PDR Fusion Indoor Localization
Authors:
Haojun Ai,
Kaifeng Tang,
Sheng Zhang,
Yuhong Yang
Abstract:
Multi-source fusion positioning is one of the technical frameworks for obtaining sufficient indoor positioning accuracy. In order to evaluate the effect of multi-source fusion positioning, it is necessary to establish a fusion error model. In this paper, we first use the least squares method to fuse the radio fingerprint and the PDR positioning, and then apply the variance propagation laws to calc…
▽ More
Multi-source fusion positioning is one of the technical frameworks for obtaining sufficient indoor positioning accuracy. In order to evaluate the effect of multi-source fusion positioning, it is necessary to establish a fusion error model. In this paper, we first use the least squares method to fuse the radio fingerprint and the PDR positioning, and then apply the variance propagation laws to calculate the error distribution of indoor multi-source localization methods. Based on the fusion error model, we developed an indoor positioning simulation system. The system can give a better positioning source layout scheme under a given condition, and can evaluate the signal strength distribution and the error distribution.
△ Less
Submitted 5 January, 2020;
originally announced January 2020.
-
Multiuser Video Streaming Rate Adaptation: A Physical Layer Resource-Aware Deep Reinforcement Learning Approach
Authors:
Kexin Tang,
Nuowen Kan,
Junni Zou,
Xiao Fu,
Mingyi Hong,
Hongkai Xiong
Abstract:
We consider a multi-user video streaming service optimization problem over a time-varying and mutually interfering multi-cell wireless network. The key research challenge is to appropriately adapt each user's video streaming rate according to the radio frequency environment (e.g., channel fading and interference level) and service demands (e.g., play request), so that the users' long-term experien…
▽ More
We consider a multi-user video streaming service optimization problem over a time-varying and mutually interfering multi-cell wireless network. The key research challenge is to appropriately adapt each user's video streaming rate according to the radio frequency environment (e.g., channel fading and interference level) and service demands (e.g., play request), so that the users' long-term experience for watching videos can be optimized. To address the above challenge, we propose a novel two-level cross-layer optimization framework for multiuser adaptive video streaming over wireless networks. The key idea is to jointly design the physical layer optimization-based beamforming scheme (performed at the base stations) and the application layer Deep Reinforcement Learning (DRL)-based scheme (performed at the user terminals), so that a highly complex multi-user, cross-layer, time-varying video streaming problem can be decomposed into relatively simple problems and solved effectively. Our strategy represents a significant departure for the existing schemes where either short-term user experience optimization is considered, or only single-user point-to-point long-term optimization is considered. Extensive simulations based on real-data sets show that the proposed cross-layer design is effective and promising.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Controllability of networked MIMO systems
Authors:
Lin Wang,
Guanrong Chen,
Xiaofan Wang,
Wallace K. S. Tang
Abstract:
In this paper, we consider the state controllability of networked systems, where the network topology is directed and weighted and the nodes are higher-dimensional linear time-invariant (LTI) dynamical systems. We investigate how the network topology, the node-system dynamics, the external control inputs, and the inner interactions affect the controllability of a networked system, and show that fo…
▽ More
In this paper, we consider the state controllability of networked systems, where the network topology is directed and weighted and the nodes are higher-dimensional linear time-invariant (LTI) dynamical systems. We investigate how the network topology, the node-system dynamics, the external control inputs, and the inner interactions affect the controllability of a networked system, and show that for a general networked multi-input/multi-output (MIMO) system: 1) the controllability of the overall network is an integrated result of the aforementioned relevant factors, which cannot be decoupled into the controllability of individual node-systems and the properties solely determined by the network topology, quite different from the familiar notion of consensus or formation controllability; 2) if the network topology is uncontrollable by external inputs, then the networked system with identical nodes will be uncontrollable, even if it is structurally controllable; 3) with a controllable network topology, controllability and observability of the nodes together are necessary for the controllability of the networked systems under some mild conditions, but nevertheless they are not sufficient. For a networked system with single-input/single-output (SISO) LTI nodes, we present precise necessary and sufficient conditions for the controllability of a general network topology.
△ Less
Submitted 30 October, 2015; v1 submitted 6 May, 2015;
originally announced May 2015.