Search | arXiv e-print repository

arXiv:2503.06686 [pdf, other]

ImplicitCell: Resolution Cell Modeling of Joint Implicit Volume Reconstruction and Pose Refinement in Freehand 3D Ultrasound

Authors: Sheng Song, Yiting Chen, Duo Xu, Songhan Ge, Yunqian Huang, Junni Shi, Man Chen, Hongbo Chen, Rui Zheng

Abstract: Freehand 3D ultrasound enables volumetric imaging by tracking a conventional ultrasound probe during freehand scanning, offering enriched spatial information that improves clinical diagnosis. However, the quality of reconstructed volumes is often compromised by tracking system noise and irregular probe movements, leading to artifacts in the final reconstruction. To address these challenges, we pro… ▽ More Freehand 3D ultrasound enables volumetric imaging by tracking a conventional ultrasound probe during freehand scanning, offering enriched spatial information that improves clinical diagnosis. However, the quality of reconstructed volumes is often compromised by tracking system noise and irregular probe movements, leading to artifacts in the final reconstruction. To address these challenges, we propose ImplicitCell, a novel framework that integrates Implicit Neural Representation (INR) with an ultrasound resolution cell model for joint optimization of volume reconstruction and pose refinement. Three distinct datasets are used for comprehensive validation, including phantom, common carotid artery, and carotid atherosclerosis. Experimental results demonstrate that ImplicitCell significantly reduces reconstruction artifacts and improves volume quality compared to existing methods, particularly in challenging scenarios with noisy tracking data. These improvements enhance the clinical utility of freehand 3D ultrasound by providing more reliable and precise diagnostic information. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2503.03348 [pdf, other]

doi 10.1109/TCE.2025.3548520

Composite Nonlinear Trajectory Tracking Control of Co-Driving Vehicles Using Self-Triggered Adaptive Dynamic Programming

Authors: Chuan Hu, Sicheng Ge, Yingkui Shi, Weinan Gao, Wenfeng Guo, Xi Zhang

Abstract: This article presents a composite nonlinear feedback (CNF) control method using self-triggered (ST) adaptive dynamic programming (ADP) algorithm in a human-machine shared steering framework. For the overall system dynamics, a two-degrees-of-freedom (2-DOF) vehicle model is established and a two-point preview driver model is adopted. A dynamic authority allocation strategy based on cooperation leve… ▽ More This article presents a composite nonlinear feedback (CNF) control method using self-triggered (ST) adaptive dynamic programming (ADP) algorithm in a human-machine shared steering framework. For the overall system dynamics, a two-degrees-of-freedom (2-DOF) vehicle model is established and a two-point preview driver model is adopted. A dynamic authority allocation strategy based on cooperation level is proposed to combine the steering input of the human driver and the automatic controller. To make further improvements in the controller design, three main contributions are put forward. Firstly, the CNF controller is designed for trajectory tracking control with refined transient performance. Besides, the self-triggered rule is applied such that the system will update in discrete times to save computing resources and increase efficiency. Moreover, by introducing the data-based ADP algorithm, the optimal control problem can be solved through iteration using system input and output information, reducing the need for accurate knowledge of system dynamics. The effectiveness of the proposed control method is validated through Carsim-Simulink co-simulations in diverse driving scenarios. △ Less

Submitted 5 March, 2025; originally announced March 2025.

Comments: Accepted by IEEE Transactions on Consumer Electronics (12 pages)

arXiv:2502.04837 [pdf]

Online Robot Motion Planning Methodology Guided by Group Social Proxemics Feature

Authors: Xuan Mu, Xiaorui Liu, Shuai Guo, Wenzheng Chi, Wei Wang, Shuzhi Sam Ge

Abstract: Nowadays robot is supposed to demonstrate human-like perception, reasoning and behavior pattern in social or service application. However, most of the existing motion planning methods are incompatible with above requirement. A potential reason is that the existing navigation algorithms usually intend to treat people as another kind of obstacle, and hardly take the social principle or awareness int… ▽ More Nowadays robot is supposed to demonstrate human-like perception, reasoning and behavior pattern in social or service application. However, most of the existing motion planning methods are incompatible with above requirement. A potential reason is that the existing navigation algorithms usually intend to treat people as another kind of obstacle, and hardly take the social principle or awareness into consideration. In this paper, we attempt to model the proxemics of group and blend it into the scenario perception and navigation of robot. For this purpose, a group clustering method considering both social relevance and spatial confidence is introduced. It can enable robot to identify individuals and divide them into groups. Next, we propose defining the individual proxemics within magnetic dipole model, and further established the group proxemics and scenario map through vector-field superposition. On the basis of the group clustering and proxemics modeling, we present the method to obtain the optimal observation positions (OOPs) of group. Once the OOPs grid and scenario map are established, a heuristic path is employed to generate path that guide robot cruising among the groups for interactive purpose. A series of experiments are conducted to validate the proposed methodology on the practical robot, the results have demonstrated that our methodology has achieved promising performance on group recognition accuracy and path-generation efficiency. This concludes that the group awareness evolved as an important module to make robot socially behave in the practical scenario. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 14 pages,14 figures

arXiv:2501.00064 [pdf, other]

Lungmix: A Mixup-Based Strategy for Generalization in Respiratory Sound Classification

Authors: Shijia Ge, Weixiang Zhang, Shuzhao Xie, Baixu Yan, Zhi Wang

Abstract: Respiratory sound classification plays a pivotal role in diagnosing respiratory diseases. While deep learning models have shown success with various respiratory sound datasets, our experiments indicate that models trained on one dataset often fail to generalize effectively to others, mainly due to data collection and annotation \emph{inconsistencies}. To address this limitation, we introduce \emph… ▽ More Respiratory sound classification plays a pivotal role in diagnosing respiratory diseases. While deep learning models have shown success with various respiratory sound datasets, our experiments indicate that models trained on one dataset often fail to generalize effectively to others, mainly due to data collection and annotation \emph{inconsistencies}. To address this limitation, we introduce \emph{Lungmix}, a novel data augmentation technique inspired by Mixup. Lungmix generates augmented data by blending waveforms using loudness and random masks while interpolating labels based on their semantic meaning, helping the model learn more generalized representations. Comprehensive evaluations across three datasets, namely ICBHI, SPR, and HF, demonstrate that Lungmix significantly enhances model generalization to unseen data. In particular, Lungmix boosts the 4-class classification score by up to 3.55\%, achieving performance comparable to models trained directly on the target dataset. △ Less

Submitted 29 December, 2024; originally announced January 2025.

Comments: 4pages, 3 figures, conference paper

arXiv:2405.07478 [pdf, other]

Coded Event-triggered Control for Nonlinear Systems

Authors: Ruihang Ji, Shuzhi Sam Ge, Kai Zhao

Abstract: This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather compl… ▽ More This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather complicated yet challenging where the performance constraints are satisfied some time after (rather than from the beginning of) system operation, rendering normally employed prescribed performance control invalid because they may be not defined in the initial interval. By introducing auxiliary functions, we develop a Self-adjustable Prescribed Performance (SPP) mechanism which can flexibly adjust the symmetric or asymmetric performance boundaries to accommodate different initial conditions, providing an effective solution for the underlying tracking problem. In this way, the resulted CEC can not only consume less communication resources but also regulate the tracking error under any initial condition into an allowable set before a given time in a bounded and customizable manner. Simulation results verify and clarify the theoretical findings. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2312.12066 [pdf, other]

Automatic bony structure segmentation and curvature estimation on ultrasound cervical spine images -- a feasibility study

Authors: Songhan Ge, Haoyuan Tian, Wei Zhang, Rui Zheng

Abstract: The loss of cervical lordosis is a common degenerative disorder known to be associated with abnormal spinal alignment. In recent years, ultrasound (US) imaging has been widely applied in the assessment of spine deformity and has shown promising results. The objectives of this study are to automatically segment bony structures from the 3D US cervical spine image volume and to assess the cervical lo… ▽ More The loss of cervical lordosis is a common degenerative disorder known to be associated with abnormal spinal alignment. In recent years, ultrasound (US) imaging has been widely applied in the assessment of spine deformity and has shown promising results. The objectives of this study are to automatically segment bony structures from the 3D US cervical spine image volume and to assess the cervical lordosis on the key sagittal frames. In this study, a portable ultrasound imaging system was applied to acquire cervical spine image volume. The nnU-Net was trained on to segment bony structures on the transverse images and validated by 5-fold-cross-validation. The volume data were reconstructed from the segmented image series. An energy function indicating intensity levels and integrity of bony structures was designed to extract the proxy key sagittal frames on both left and right sides for the cervical curve measurement. The mean absolute difference (MAD), standard deviation (SD) and correlation between the spine curvatures of the left and right sides were calculated for quantitative evaluation of the proposed method. The DSC value of the nnU-Net model in segmenting ROI was 0.973. For the measurement of 22 lamina curve angles, the MAD, SD and correlation between the left and right sides of the cervical spine were 3.591, 3.432 degrees and 0.926, respectively. The results indicate that our method has a high accuracy and reliability in the automatic segmentation of the cervical spine and shows the potential of diagnosing the loss of cervical lordosis using the 3D ultrasound imaging technique. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2308.05005 [pdf, other]

Deep Learning Model Transfer in Forest Mapping using Multi-source Satellite SAR and Optical Images

Authors: Shaojia Ge, Oleg Antropov, Tuomas Häme, Ronald E. McRoberts, Jukka Miettinen

Abstract: Deep learning (DL) models are gaining popularity in forest variable prediction using Earth Observation images. However, in practical forest inventories, reference datasets are often represented by plot- or stand-level measurements, while high-quality representative wall-to-wall reference data for end-to-end training of DL models are rarely available. Transfer learning facilitates expansion of the… ▽ More Deep learning (DL) models are gaining popularity in forest variable prediction using Earth Observation images. However, in practical forest inventories, reference datasets are often represented by plot- or stand-level measurements, while high-quality representative wall-to-wall reference data for end-to-end training of DL models are rarely available. Transfer learning facilitates expansion of the use of deep learning models into areas with sub-optimal training data by allowing pretraining of the model in areas where high-quality teaching data are available. In this study, we perform a "model transfer" (or domain adaptation) of a pretrained DL model into a target area using plot-level measurements and compare performance versus other machine learning models. We use an earlier developed UNet based model (SeUNet) to demonstrate the approach on two distinct taiga sites with varying forest structure and composition. Multisource Earth Observation (EO) data are represented by a combination of Copernicus Sentinel-1 C-band SAR and Sentinel-2 multispectral images, JAXA ALOS-2 PALSAR-2 SAR mosaic and TanDEM-X bistatic interferometric radar data. The training study site is located in Finnish Lapland, while the target site is located in Southern Finland. By leveraging transfer learning, the prediction of SeUNet achieved root mean squared error (RMSE) of 2.70 m and R$^2$ of 0.882, considerably more accurate than traditional benchmark methods. We expect such forest-specific DL model transfer can be suitable also for other forest variables and other EO data sources that are sensitive to forest structure. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03137 [pdf, other]

Digital Self-Interference Cancellation With Robust Multi-layered Total Least Mean Squares Adaptive Filters

Authors: Shiyu Song, Yanqun Tang, Xizhang Wei, Yu Zhou, Xianjie Lu, Zhengpeng Wang, Songhu Ge

Abstract: In simultaneous transmit and receive (STAR) wireless communications, digital self-interference (SI) cancellation is required before estimating the remote transmission (RT) channel. Considering the inherent connection between SI channel reconstruction and RT channel estimation, we propose a multi-layered M-estimate total least mean squares (m-MTLS) joint estimator to estimate both channels. In each… ▽ More In simultaneous transmit and receive (STAR) wireless communications, digital self-interference (SI) cancellation is required before estimating the remote transmission (RT) channel. Considering the inherent connection between SI channel reconstruction and RT channel estimation, we propose a multi-layered M-estimate total least mean squares (m-MTLS) joint estimator to estimate both channels. In each layer, our proposed m-MTLS estimator first employs an M-estimate total least mean squares (MTLS) algorithm to eliminate residual SI from the received signal and give a new estimation of the RT channel. Then, it gives the final RT channel estimation based on the weighted sum of the estimation values obtained from each layer. Compared to traditional minimum mean square error (MMSE) estimator and single-layered MTLS estimator, it demonstrates that the m-MTLS estimator has better performance of normalized mean squared difference (NMSD). Besides, the simulation results also show the robustness of m-MTLS estimator even in scenarios where the local reference signal is contaminated with noise, and the received signal is impacted by strong impulse noise. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2303.17210 [pdf, other]

DecentRAN: Decentralized Radio Access Network for 5.5G and beyond

Authors: Hao Xu, Xun Liu, Qinghai Zeng, Qiang Li, Shibin Ge, Guohua Zhou, Raymond Forbes

Abstract: Radio Access Network faces challenges from privacy and flexible wide area and local area network access. RAN is limited from providing local service directly due to centralized design of cellular network and concerns of user privacy and data security. DecentRAN or Decentralized Radio Access Network offers an alternative perspective to cope with the emerging demands of 5G Non-public Network and the… ▽ More Radio Access Network faces challenges from privacy and flexible wide area and local area network access. RAN is limited from providing local service directly due to centralized design of cellular network and concerns of user privacy and data security. DecentRAN or Decentralized Radio Access Network offers an alternative perspective to cope with the emerging demands of 5G Non-public Network and the hybrid deployment of 5GS and Wi-Fi in the campus network. Starting from Public key as an Identity, independent mutual authentication between UE and RAN are made possible in a privacy-preserving manner. With the introduction of decentralized architecture and network functions using blockchain and smart contracts, DecentRAN has ability to provide users with locally managed, end-to-end encrypted 5G NPN and the potential connectivity to Local Area Network via campus routers. Furthermore, the performance regarding throughput and latency are discussed, offering the deployment guidance for DecentRAN. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.02456 [pdf, other]

Fixed-time Adaptive Neural Control for Physical Human-Robot Collaboration with Time-Varying Workspace Constraints

Authors: Yuzhu Sun, Mien Van, Stephen McIlvanna, Nguyen Minh Nhat, Sean McLoone, Dariusz Ceglarek, Shuzhi Sam Ge

Abstract: Physical human-robot collaboration (pHRC) requires both compliance and safety guarantees since robots coordinate with human actions in a shared workspace. This paper presents a novel fixed-time adaptive neural control methodology for handling time-varying workspace constraints that occur in physical human-robot collaboration while also guaranteeing compliance during intended force interactions. Th… ▽ More Physical human-robot collaboration (pHRC) requires both compliance and safety guarantees since robots coordinate with human actions in a shared workspace. This paper presents a novel fixed-time adaptive neural control methodology for handling time-varying workspace constraints that occur in physical human-robot collaboration while also guaranteeing compliance during intended force interactions. The proposed methodology combines the benefits of compliance control, time-varying integral barrier Lyapunov function (TVIBLF) and fixed-time techniques, which not only achieve compliance during physical contact with human operators but also guarantee time-varying workspace constraints and fast tracking error convergence without any restriction on the initial conditions. Furthermore, a neural adaptive control law is designed to compensate for the unknown dynamics and disturbances of the robot manipulator such that the proposed control framework is overall fixed-time converged and capable of online learning without any prior knowledge of robot dynamics and disturbances. The proposed approach is finally validated on a simulated two-link robot manipulator. Simulation results show that the proposed controller is superior in the sense of both tracking error and convergence time compared with the existing barrier Lyapunov functions based controllers, while simultaneously guaranteeing compliance and safety. △ Less

Submitted 26 April, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2302.01537 [pdf, other]

Gradient and Variable Tracking with Multiple Local SGD for Decentralized Non-Convex Learning

Authors: Songyang Ge, Tsung-Hui Chang

Abstract: Stochastic distributed optimization methods that solve an optimization problem over a multi-agent network have played an important role in a variety of large-scale signal processing and machine leaning applications. Among the existing methods, the gradient tracking (GT) method is found robust against the variance between agents' local data distribution, in contrast to the distributed stochastic gr… ▽ More Stochastic distributed optimization methods that solve an optimization problem over a multi-agent network have played an important role in a variety of large-scale signal processing and machine leaning applications. Among the existing methods, the gradient tracking (GT) method is found robust against the variance between agents' local data distribution, in contrast to the distributed stochastic gradient descent (SGD) methods which have a slowed convergence speed when the agents have heterogeneous data distributions. However, the GT method can be communication expensive due to the need of a large number of iterations for convergence. In this paper, we intend to reduce the communication cost of the GT method by integrating it with the local SGD technique. Specifically, we propose a new local stochastic GT (LSGT) algorithm where, within each communication round, the agents perform multiple SGD updates locally. Theoretically, we build the convergence conditions of the LSGT algorithm and show that it can have an improved convergence rate of $\mathcal{O}(1/\sqrt{ET})$, where $E$ is the number of local SGD updates and $T$ is the number of communication rounds. We further extend the LSGT algorithm to solve a more complex learning problem which has linearly coupled variables inside the objective function. Experiment results demonstrate that the proposed algorithms have significantly improved convergence speed even under heterogeneous data distribution. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 46 pages, 6 figures

arXiv:2212.14747 [pdf, other]

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

Authors: Hongye Zeng, kang Zhou, Songhan Ge, Yuchong Gao, Jianhao Zhao, Shenghua Gao, Rui Zheng

Abstract: Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two… ▽ More Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: 15 pages, 8 figures

arXiv:2212.00246 [pdf, other]

doi 10.1109/LGRS.2023.3281526

A Novel Semisupervised Contrastive Regression Framework for Forest Inventory Mapping with Multisensor Satellite Data

Authors: Shaojia Ge, Hong Gu, Weimin Su, Anne Lönnqvist, Oleg Antropov

Abstract: Accurate mapping of forests is critical for forest management and carbon stocks monitoring. Deep learning is becoming more popular in Earth Observation (EO), however, the availability of reference data limits its potential in wide-area forest mapping. To overcome those limitations, here we introduce contrastive regression into EO based forest mapping and develop a novel semisupervised regression f… ▽ More Accurate mapping of forests is critical for forest management and carbon stocks monitoring. Deep learning is becoming more popular in Earth Observation (EO), however, the availability of reference data limits its potential in wide-area forest mapping. To overcome those limitations, here we introduce contrastive regression into EO based forest mapping and develop a novel semisupervised regression framework for wall-to-wall mapping of continuous forest variables. It combines supervised contrastive regression loss and semi-supervised Cross-Pseudo Regression loss. The framework is demonstrated over a boreal forest site using Copernicus Sentinel-1 and Sentinel-2 imagery for mapping forest tree height. Achieved prediction accuracies are strongly better compared to using vanilla UNet or traditional regression models, with relative RMSE of 15.1% on stand level. We expect that developed framework can be used for modeling other forest variables and EO datasets. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.13229 [pdf, other]

DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

Authors: Xian Wu, Shuxin Yang, Zhaopeng Qiu, Shen Ge, Yangtian Yan, Xingwang Wu, Yefeng Zheng, S. Kevin Zhou, Li Xiao

Abstract: Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate… ▽ More Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate medical reports automatically. Different from typical image captioning approaches that generate reports with an encoder and a decoder, DeltaNet applies a conditional generation process. In particular, given a medical image, DeltaNet employs three steps to generate a report: 1) first retrieving related medical reports, i.e., the historical reports from the same or similar patients; 2) then comparing retrieved images and current image to find the differences; 3) finally generating a new report to accommodate identified differences based on the conditional report. We evaluate DeltaNet on a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches. Besides COVID-19, the proposed DeltaNet can be applied to other diseases as well. We validate its generalization capabilities on the public IU-Xray and MIMIC-CXR datasets for chest-related diseases. Code is available at \url{https://github.com/LX-doctorAI1/DeltaNet}. △ Less

Submitted 12 November, 2022; originally announced November 2022.

arXiv:2208.08607

Event-triggered Finite-time Control Using Inverse-optimal Implicit Lyapunov Function

Authors: Peng Wang, Shuzhi Sam Ge, Xiaobing Zhang

Abstract: This work deals with the event-triggered finite-time control for high-order systems based on an implicit Lyapunov function (ILF). With the construction of an inverse optimal problem, a novel expression of ILF is obtained. By designing the event-triggering mechanism elaborately, it is guaranteed that the trivial solution of the closed-loop system is globally finite-time stable and there exists no Z… ▽ More This work deals with the event-triggered finite-time control for high-order systems based on an implicit Lyapunov function (ILF). With the construction of an inverse optimal problem, a novel expression of ILF is obtained. By designing the event-triggering mechanism elaborately, it is guaranteed that the trivial solution of the closed-loop system is globally finite-time stable and there exists no Zeno phenomenon. Extensions to the scenario with a multi-agent system are studied where a finite-time tracking control drives all the agents to reach a consensus. The obtained theoretical results are supported by numerical simulations. △ Less

Submitted 9 November, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: To be revised and corrected

arXiv:2205.04674 [pdf, other]

Balanced control between performance and saturation for constrained nonlinear systems

Authors: Peng Wang, Haibin Wang, Shuzhi Sam Ge, Xiaobing Zhang

Abstract: This paper addresses the balanced control between performance and saturation for a class of constrained nonlinear systems, including the branches: balanced command filtered backstepping (BCFB) and balanced performance control (BPC). To balance the interconnection and conflict between performance and saturation constraints, define a performance safety evaluation (PSE) function, which evaluates the… ▽ More This paper addresses the balanced control between performance and saturation for a class of constrained nonlinear systems, including the branches: balanced command filtered backstepping (BCFB) and balanced performance control (BPC). To balance the interconnection and conflict between performance and saturation constraints, define a performance safety evaluation (PSE) function, which evaluates the system safety under the destabilizing effect variables (DEVs) like saturation quantity and filter errors, then the cumulative effects of DEVs are fully utilized and compensated for the performance recovery. Specifically, there exists some degree of tolerance for the DEVs in the safety region, and the compensation operation works when the evaluation of the system goes dangerous. The advantages of the proposed methodology are illustrated in the numerical simulation. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 9 pages, 7 figures

arXiv:2204.14272 [pdf, other]

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

Authors: Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

Abstract: In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows given the speech… ▽ More In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows given the speech documents. In this task, our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering. To this end, instead of directly adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which effectively ingests cross-modal information to achieve fine-grained representations of the speech and language modalities. Moreover, we propose a simple and novel mechanism, termed Dual Attention, by encouraging better alignments between audio and text to ease the process of knowledge transfer. To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations. The performance of the existing state-of-the-art methods significantly degrade on our dataset, hence demonstrating the necessity of cross-modal information integration. Our experimental results demonstrate that our proposed method achieves superior performance in spoken conversational question answering tasks. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: In Findings of NAACL 2022. arXiv admin note: substantial text overlap with arXiv:2010.08923

arXiv:2204.10513 [pdf]

MIPR:Automatic Annotation of Medical Images with Pixel Rearrangement

Authors: Pingping Dai, Haiming Zhu, Shuang Ge, Ruihan Zhang, Xiang Qian, Xi Li, Kehong Yuan

Abstract: Most of the state-of-the-art semantic segmentation reported in recent years is based on fully supervised deep learning in the medical domain. How?ever, the high-quality annotated datasets require intense labor and domain knowledge, consuming enormous time and cost. Previous works that adopt semi?supervised and unsupervised learning are proposed to address the lack of anno?tated data through assist… ▽ More Most of the state-of-the-art semantic segmentation reported in recent years is based on fully supervised deep learning in the medical domain. How?ever, the high-quality annotated datasets require intense labor and domain knowledge, consuming enormous time and cost. Previous works that adopt semi?supervised and unsupervised learning are proposed to address the lack of anno?tated data through assisted training with unlabeled data and achieve good perfor?mance. Still, these methods can not directly get the image annotation as doctors do. In this paper, inspired by self-training of semi-supervised learning, we pro?pose a novel approach to solve the lack of annotated data from another angle, called medical image pixel rearrangement (short in MIPR). The MIPR combines image-editing and pseudo-label technology to obtain labeled data. As the number of iterations increases, the edited image is similar to the original image, and the labeled result is similar to the doctor annotation. Therefore, the MIPR is to get labeled pairs of data directly from amounts of unlabled data with pixel rearrange?ment, which is implemented with a designed conditional Generative Adversarial Networks and a segmentation network. Experiments on the ISIC18 show that the effect of the data annotated by our method for segmentation task is is equal to or even better than that of doctors annotations △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2203.10095 [pdf, other]

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Authors: Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

Abstract: Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the da… ▽ More Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted by MICCAI 2021 (the 24th International Conference on Medical Image Computing and Computer Assisted Intervention)

arXiv:2112.15011 [pdf, other]

Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment

Authors: Shuxin Yang, Xian Wu, Shen Ge, S. Kevin Zhou, Li Xiao

Abstract: In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray imag… ▽ More In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distil and restore medical knowledge from textual embedding without manual labour; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, with the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics. △ Less

Submitted 1 June, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

arXiv:2112.15009 [pdf, ps, other]

doi 10.1016/j.media.2022.102510

Knowledge Matters: Radiology Report Generation with General and Specific Knowledge

Authors: Shuxin Yang, Xian Wu, Shen Ge, Shaohua Kevin Zhou, Li Xiao

Abstract: Automatic radiology report generation is critical in clinics which can relieve experienced radiologists from the heavy workload and remind inexperienced radiologists of misdiagnosis or missed diagnose. Existing approaches mainly formulate radiology report generation as an image captioning task and adopt the encoder-decoder framework. However, in the medical domain, such pure data-driven approaches… ▽ More Automatic radiology report generation is critical in clinics which can relieve experienced radiologists from the heavy workload and remind inexperienced radiologists of misdiagnosis or missed diagnose. Existing approaches mainly formulate radiology report generation as an image captioning task and adopt the encoder-decoder framework. However, in the medical domain, such pure data-driven approaches suffer from the following problems: 1) visual and textual bias problem; 2) lack of expert knowledge. In this paper, we propose a knowledge-enhanced radiology report generation approach introduces two types of medical knowledge: 1) General knowledge, which is input independent and provides the broad knowledge for report generation; 2) Specific knowledge, which is input dependent and provides the fine-grained knowledge for report generation. To fully utilize both the general and specific knowledge, we also propose a knowledge-enhanced multi-head attention mechanism. By merging the visual features of the radiology image with general knowledge and specific knowledge, the proposed model can improve the quality of generated reports. Experimental results on two publicly available datasets IU-Xray and MIMIC-CXR show that the proposed knowledge enhanced approach outperforms state-of-the-art image captioning based methods. Ablation studies also demonstrate that both general and specific knowledge can help to improve the performance of radiology report generation. △ Less

Submitted 6 November, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Comments: Medical Image Analysis

arXiv:2107.13431 [pdf]

AI assisted method for efficiently generating breast ultrasound screening reports

Authors: Shuang Ge, Qiongyu Ye, Wenquan Xie, Desheng Sun, Huabin Zhang, Xiaobo Zhou, Kehong Yuan

Abstract: Background: Ultrasound is one of the preferred choices for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Aim: We proposed a new pipeline to automatically generate AI breast ultrasound screening reports based on ultrasound images, aiming to assist doctors in improvin… ▽ More Background: Ultrasound is one of the preferred choices for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Aim: We proposed a new pipeline to automatically generate AI breast ultrasound screening reports based on ultrasound images, aiming to assist doctors in improving the efficiency of clinical screening and reducing repetitive report writing. Methods: AI was used to efficiently generate personalized breast ultrasound screening preliminary reports, especially for benign and normal cases which account for the majority. Based on the preliminary AI report, doctors then make simple adjustments or corrections to quickly generate the final report. The approach has been trained and tested using a database of 4809 breast tumor instances. Results: Experimental results indicate that this pipeline improves doctors' work efficiency by up to 90%, which greatly reduces repetitive work. Conclusion: Personalized report generation is more widely recognized by doctors in clinical practice compared with non-intelligent reports based on fixed templates or containing options to fill in the blanks. △ Less

Submitted 22 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

arXiv:2105.03847 [pdf]

Automatic segmentation of vertebral features on ultrasound spine images using Stacked Hourglass Network

Authors: Hong-Ye Zeng, Song-Han Ge, Yu-Chong Gao, De-Sen Zhou, Kang Zhou, Xu-Ming He, Edmond Lou, Rui Zheng

Abstract: Objective: The spinous process angle (SPA) is one of the essential parameters to denote three-dimensional (3-D) deformity of spine. We propose an automatic segmentation method based on Stacked Hourglass Network (SHN) to detect the spinous processes (SP) on ultrasound (US) spine images and to measure the SPAs of clinical scoliotic subjects. Methods: The network was trained to detect vertebral SP an… ▽ More Objective: The spinous process angle (SPA) is one of the essential parameters to denote three-dimensional (3-D) deformity of spine. We propose an automatic segmentation method based on Stacked Hourglass Network (SHN) to detect the spinous processes (SP) on ultrasound (US) spine images and to measure the SPAs of clinical scoliotic subjects. Methods: The network was trained to detect vertebral SP and laminae as five landmarks on 1200 ultrasound transverse images and validated on 100 images. All the processed transverse images with highlighted SP and laminae were reconstructed into a 3D image volume, and the SPAs were measured on the projected coronal images. The trained network was tested on 400 images by calculating the percentage of correct keypoints (PCK); and the SPA measurements were evaluated on 50 scoliotic subjects by comparing the results from US images and radiographs. Results: The trained network achieved a high average PCK (86.8%) on the test datasets, particularly the PCK of SP detection was 90.3%. The SPAs measured from US and radiographic methods showed good correlation (r>0.85), and the mean absolute differences (MAD) between two modalities were 3.3°, which was less than the clinical acceptance error (5°). Conclusion: The vertebral features can be accurately segmented on US spine images using SHN, and the measurement results of SPA from US data was comparable to the gold standard from radiography. △ Less

Submitted 23 May, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

Comments: 9 pages,5 figures

arXiv:2103.05378 [pdf, other]

Decentralized Non-Convex Learning with Linearly Coupled Constraints

Authors: Jiawei Zhang, Songyang Ge, Tsung-Hui Chang, Zhi-Quan Luo

Abstract: Motivated by the need for decentralized learning, this paper aims at designing a distributed algorithm for solving nonconvex problems with general linear constraints over a multi-agent network. In the considered problem, each agent owns some local information and a local variable for jointly minimizing a cost function, but local variables are coupled by linear constraints. Most of the existing met… ▽ More Motivated by the need for decentralized learning, this paper aims at designing a distributed algorithm for solving nonconvex problems with general linear constraints over a multi-agent network. In the considered problem, each agent owns some local information and a local variable for jointly minimizing a cost function, but local variables are coupled by linear constraints. Most of the existing methods for such problems are only applicable for convex problems or problems with specific linear constraints. There still lacks a distributed algorithm for such problems with general linear constraints and under nonconvex setting. In this paper, to tackle this problem, we propose a new algorithm, called "proximal dual consensus" (PDC) algorithm, which combines a proximal technique and a dual consensus method. We build the theoretical convergence conditions and show that the proposed PDC algorithm can converge to an $ε$-Karush-Kuhn-Tucker solution within $\mathcal{O}(1/ε)$ iterations. For computation reduction, the PDC algorithm can choose to perform cheap gradient descent per iteration while preserving the same order of $\mathcal{O}(1/ε)$ iteration complexity. Numerical results are presented to demonstrate the good performance of the proposed algorithms for solving a regression problem and a classification problem over a network where agents have only partial observations of data features. △ Less

Submitted 22 June, 2022; v1 submitted 9 March, 2021; originally announced March 2021.

arXiv:2012.15432 [pdf]

SharpGAN: Receptive Field Block Net for Dynamic Scene Deblurring

Authors: Hui Feng, Jundong Guo, Sam Shuzhi Ge

Abstract: When sailing at sea, the smart ship will inevitably produce swaying motion due to the action of wind, wave and current, which makes the image collected by the visual sensor appear motion blur. This will have an adverse effect on the object detection algorithm based on the vision sensor, thereby affect the navigation safety of the smart ship. In order to remove the motion blur in the images during… ▽ More When sailing at sea, the smart ship will inevitably produce swaying motion due to the action of wind, wave and current, which makes the image collected by the visual sensor appear motion blur. This will have an adverse effect on the object detection algorithm based on the vision sensor, thereby affect the navigation safety of the smart ship. In order to remove the motion blur in the images during the navigation of the smart ship, we propose SharpGAN, a new image deblurring method based on the generative adversarial network. First of all, the Receptive Field Block Net (RFBNet) is introduced to the deblurring network to strengthen the network's ability to extract the features of blurred image. Secondly, we propose a feature loss that combines different levels of image features to guide the network to perform higher-quality deblurring and improve the feature similarity between the restored images and the sharp image. Finally, we propose to use the lightweight RFB-s module to improve the real-time performance of deblurring network. Compared with the existing deblurring methods on large-scale real sea image datasets and large-scale deblurring datasets, the proposed method not only has better deblurring performance in visual perception and quantitative criteria, but also has higher deblurring efficiency. △ Less

Submitted 30 December, 2020; originally announced December 2020.

Comments: 15 pages, 6 figures

ACM Class: I.2.10

arXiv:2006.07907 [pdf, other]

Trajectory Generation by Chance Constrained Nonlinear MPC with Probabilistic Prediction

Authors: Xiaoxue Zhang, Jun Ma, Zilong Cheng, Sunan Huang, Shuzhi Sam Ge, Tong Heng Lee

Abstract: Continued great efforts have been dedicated towards high-quality trajectory generation based on optimization methods, however, most of them do not suitably and effectively consider the situation with moving obstacles; and more particularly, the future position of these moving obstacles in the presence of uncertainty within some possible prescribed prediction horizon. To cater to this rather major… ▽ More Continued great efforts have been dedicated towards high-quality trajectory generation based on optimization methods, however, most of them do not suitably and effectively consider the situation with moving obstacles; and more particularly, the future position of these moving obstacles in the presence of uncertainty within some possible prescribed prediction horizon. To cater to this rather major shortcoming, this work shows how a variational Bayesian Gaussian mixture model (vBGMM) framework can be employed to predict the future trajectory of moving obstacles; and then with this methodology, a trajectory generation framework is proposed which will efficiently and effectively address trajectory generation in the presence of moving obstacles, and also incorporating presence of uncertainty within a prediction horizon. In this work, the full predictive conditional probability density function (PDF) with mean and covariance is obtained, and thus a future trajectory with uncertainty is formulated as a collision region represented by a confidence ellipsoid. To avoid the collision region, chance constraints are imposed to restrict the collision probability, and subsequently a nonlinear MPC problem is constructed with these chance constraints. It is shown that the proposed approach is able to predict the future position of the moving obstacles effectively; and thus based on the environmental information of the probabilistic prediction, it is also shown that the timing of collision avoidance can be earlier than the method without prediction. The tracking error and distance to obstacles of the trajectory with prediction are smaller compared with the method without prediction. △ Less

Submitted 4 August, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: 13 pages, 13 figures

arXiv:2005.11501 [pdf, other]

doi 10.1109/TAI.2021.3074106

Adaptive Feedforward Neural Network Control with an Optimized Hidden Node Distribution

Authors: Qiong Liu, Dongyu Li, Shuzhi Sam Ge, Zhong Ouyang

Abstract: Composite adaptive radial basis function neural network (RBFNN) control with a lattice distribution of hidden nodes has three inherent demerits: 1) the approximation domain of adaptive RBFNNs is difficult to be determined a priori; 2) only a partial persistence of excitation (PE) condition can be guaranteed; and 3) in general, the required number of hidden nodes of RBFNNs is enormous. This paper p… ▽ More Composite adaptive radial basis function neural network (RBFNN) control with a lattice distribution of hidden nodes has three inherent demerits: 1) the approximation domain of adaptive RBFNNs is difficult to be determined a priori; 2) only a partial persistence of excitation (PE) condition can be guaranteed; and 3) in general, the required number of hidden nodes of RBFNNs is enormous. This paper proposes an adaptive feedforward RBFNN controller with an optimized distribution of hidden nodes to suitably address the above demerits. The distribution of the hidden nodes calculated by a K-means algorithm is optimally distributed along the desired state trajectory. The adaptive RBFNN satisfies the PE condition for the periodic reference trajectory. The weights of all hidden nodes will converge to the optimal values. This proposed method considerably reduces the number of hidden nodes, while achieving a better approximation ability. The proposed control scheme shares a similar rationality to that of the classical PID control in two special cases, which can thus be seen as an enhanced PID scheme with a better approximation ability. For the controller implemented by digital devices,the proposed method, for a manipulator with unknown dynamics, potentially achieves better control performance than model-based schemes with accurate dynamics.Simulation results demonstrate the effectiveness of the proposed scheme. This result provides a deeper insight into the coordination of the adaptive neural network control and the deterministic learning theory. △ Less

Submitted 22 April, 2021; v1 submitted 23 May, 2020; originally announced May 2020.

Comments: 12 pages, 7 figures This paper is submitted to "IEEE Transactions on Artificial Intelligence"

arXiv:2005.05083 [pdf, other]

A Federated Learning Framework for Healthcare IoT devices

Authors: Binhang Yuan, Song Ge, Wenhui Xing

Abstract: The Internet of Things (IoT) revolution has shown potential to give rise to many medical applications with access to large volumes of healthcare data collected by IoT devices. However, the increasing demand for healthcare data privacy and security makes each IoT device an isolated island of data. Further, the limited computation and communication capacity of wearable healthcare devices restrict th… ▽ More The Internet of Things (IoT) revolution has shown potential to give rise to many medical applications with access to large volumes of healthcare data collected by IoT devices. However, the increasing demand for healthcare data privacy and security makes each IoT device an isolated island of data. Further, the limited computation and communication capacity of wearable healthcare devices restrict the application of vanilla federated learning. To this end, we propose an advanced federated learning framework to train deep neural networks, where the network is partitioned and allocated to IoT devices and a centralized server. Then most of the training computation is handled by the powerful server. The sparsification of activations and gradients significantly reduces the communication overhead. Empirical study have suggested that the proposed framework guarantees a low accuracy loss, while only requiring 0.2% of the synchronization traffic in vanilla federated learning. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:1912.11221 [pdf, ps, other]

FDD Massive MIMO Uplink and Downlink Channel Reciprocity Properties: Full or Partial Reciprocity?

Authors: Zhimeng Zhong, Li Fan, Shibin Ge

Abstract: One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel recipro… ▽ More One challenge for FDD massive MIMO communication system is how to obtain the downlink channel state information (CSI) at the base station. Except for traditional codebook feedback through uplink pilot transmission, some channel reciprocity properties can be utilized through uplink channel estimation and channel parameter estimation algorithms. In this paper, the uplink and downlink channel reciprocity properties are analyzed. It is theoretically proved that not all multipath parameters for FDD downlink and uplink channels are equivalent. Therefore, the so called full reciprocity property does not hold while the partial reciprocity property holds. Moreover, the channel measurement campaign is conducted to verify our theoretical analysis. Finally, in order to support the partial reciprocity property, the revision for the standardization 5G channel model is proposed as well. With the contribution of this paper, the FDD massive MIMO system transmission scheme design could be led to the right direction. △ Less

Submitted 30 December, 2019; v1 submitted 24 December, 2019; originally announced December 2019.

arXiv:1909.13265 [pdf, ps, other]

Adaptive Control for Marine Vessels Against Harsh Environmental Variation

Authors: Fangwen Tu, Shuzhi Sam Ge, Yoo Sang Choo, Chang Chieh Hang

Abstract: In this paper, robust control with sea state observer and dynamic thrust allocation is proposed for the Dynamic Positioning (DP) of an accommodation vessel in the presence of unknown hydrodynamic force variation and the input time delay. In order to overcome the huge force variation due to the adjoining Floating Production Storage and Offloading (FPSO) and accommodation vessel, a novel sea state o… ▽ More In this paper, robust control with sea state observer and dynamic thrust allocation is proposed for the Dynamic Positioning (DP) of an accommodation vessel in the presence of unknown hydrodynamic force variation and the input time delay. In order to overcome the huge force variation due to the adjoining Floating Production Storage and Offloading (FPSO) and accommodation vessel, a novel sea state observer is designed. The sea observer can effectively monitor the variation of the drift wave-induced force on the vessel and activate Neural Network (NN) compensator in the controller when large wave force is identified. Moreover, the wind drag coefficients can be adaptively approximated in the sea observer so that a feedforward control can be achieved. Based on this, a robust constrained control is developed to guarantee a safe operation. The time delay inside the control input is also considered. Dynamic thrust allocation module is presented to distribute the generalized control input among azimuth thrusters. Under the proposed sea observer and control, the boundedness of all the closed-loop signals are demonstrated via rigorous Lyapunov analysis. A set of simulation studies are conducted to verify the effectiveness of the proposed control scheme. △ Less

Submitted 29 September, 2019; originally announced September 2019.

arXiv:1908.07590 [pdf, other]

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories

Authors: Songwei Ge, Curtis Xuan, Ruihua Song, Chao Zou, Wei Liu, Jin Zhou

Abstract: Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we intro… ▽ More Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we introduce a retrieval-based framework hybridized with a semantic inference model which helps to achieve robust retrieval results. Our model relies on fine-designed features extracted from the context of candidate triggers. We collect two story dubbing datasets through crowdsourcing to analyze the setting of adding sound effects and to train and test our proposed methods. We further discuss the importance of each feature and introduce several heuristic rules for the trade-off between precision and recall. Together with the text-to-speech technology, our results reveal a promising automatic pipeline on producing high-quality radio stories. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: In the Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019)

arXiv:1907.06690 [pdf]

doi 10.1109/COMPSAC.2019.10205

A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning

Authors: Shihao Ge, Haruna Isah, Farhana Zulkernine, Shahzad Khan

Abstract: The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and se… ▽ More The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. We present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Showing 1–32 of 32 results for author: Ge, S