-
Learning-based safety lifting monitoring system for cranes on construction sites
Authors:
Hao Chen,
Yu Hin Ng,
Ching-Wei Chang,
Haobo Liang,
Yanke Wang
Abstract:
Lifting on construction sites, as a frequent operation, works still with safety risks, especially for modular integrated construction (MiC) lifting due to its large weight and size, probably leading to accidents, causing damage to the modules, or more critically, posing safety hazards to on-site workers. Aiming to reduce the safety risks in lifting scenarios, we design an automated safe lifting mo…
▽ More
Lifting on construction sites, as a frequent operation, works still with safety risks, especially for modular integrated construction (MiC) lifting due to its large weight and size, probably leading to accidents, causing damage to the modules, or more critically, posing safety hazards to on-site workers. Aiming to reduce the safety risks in lifting scenarios, we design an automated safe lifting monitoring algorithm pipeline based on learning-based methods, and deploy it on construction sites. This work is potentially to increase the safety and efficiency of MiC lifting process via automation technologies. A dataset is created consisting of 1007 image-point cloud pairs (37 MiC liftings). Advanced object detection models are trained for automated two-dimensional (2D) detection of MiCs and humans. Fusing the 2D detection results with the point cloud information allows accurate determination of the three-dimensional (3D) positions of MiCs and humans. The system is designed to automatically trigger alarms that notify individuals in the MiC lifting danger zone, while providing the crane operator with real-time lifting information and early warnings. The monitoring process minimizes the human intervention and no or less signal men are required on real sites assisted by our system. A quantitative analysis is conducted to evaluate the effectiveness of the algorithmic pipeline. The pipeline shows promising results in MiC and human perception with the mean distance error of 1.5640 m and 0.7824 m respectively. Furthermore, the developed system successfully executes safety risk monitoring and alarm functionalities during the MiC lifting process with limited manual work on real construction sites.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Bird's-eye view safety monitoring for the construction top under the tower crane
Authors:
Yanke Wang,
Yu Hin Ng,
Haobo Liang,
Ching-Wei Chang,
Hao Chen
Abstract:
The tower crane is involving more automated and intelligent operation procedure, and importantly, the application of automation technologies to the safety issues is imperative ahead of the utilization of any other advances. Among diverse risk management tasks on site, it is essential to protect the human workers on the workspace between the tower crane and constructed building top area (constructi…
▽ More
The tower crane is involving more automated and intelligent operation procedure, and importantly, the application of automation technologies to the safety issues is imperative ahead of the utilization of any other advances. Among diverse risk management tasks on site, it is essential to protect the human workers on the workspace between the tower crane and constructed building top area (construction top) from the bird's-eye view, especially with Modular Integrated Construction (MiC) lifted. Also, the camera and Light Detection And Ranging (LiDAR) can capture abundant 3D information on site, which is however yet made the best use. Considering the safety protection for humans and tower cranes, we present an AI-based fully automated safety monitoring system for tower crane lifting from the bird's-eye view, surveilling to shield the human workers on the construction top and avoid cranes' collision by alarming the crane operator. The system achieved a 3D data fusion for localization of humans and MiCs by integrating the captured information from camera and LiDAR. The state-of-the-art methods were explored and implemented into our proposed software pipeline coupled with the hardware and display systems. Furthermore, we conducted an analysis of the components in the pipeline to verify the accuracy and effectiveness of the involved methods. The display and visualization on the real site proved that our system can serve as a valuable safety monitoring toolkit on site.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report
Authors:
Marcos V. Conde,
Radu Timofte,
Radu Berdan,
Beril Besbinar,
Daisuke Iso,
Pengzhou Ji,
Xiong Dun,
Zeying Fan,
Chen Wu,
Zhansheng Wang,
Pengbo Zhang,
Jiazi Huang,
Qinglin Liu,
Wei Yu,
Shengping Zhang,
Xiangyang Ji,
Kyungsik Kim,
Minkyung Kim,
Hwalmin Lee,
Hekun Ma,
Huan Zheng,
Yanyan Wei,
Zhao Zhang,
Jing Fang,
Meilin Gao
, et al. (8 additional authors not shown)
Abstract:
Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW…
▽ More
Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW Reconstruction from sRGB (Reverse ISP). We aim to recover RAW sensor images from smartphones given the corresponding sRGB images without metadata and, by doing this, ``reverse" the ISP transformation. Over 150 participants joined this NTIRE 2025 challenge and submitted efficient models. The proposed methods and benchmark establish the state-of-the-art for generating realistic RAW data.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Coevolution of Actions and Opinions in Networks of Coordinating and Anti-Coordinating Agents
Authors:
Hong Liang,
Mengbin Ye,
Lorenzo Zino,
Weiguo Xia
Abstract:
In this paper, we investigate the dynamics of coordinating and anti-coordinating agents in a coevolutionary model for actions and opinions. In the model, the individuals of a population interact on a two-layer network, sharing their opinions and observing others' action, while revising their own opinions and actions according to a game-theoretic mechanism, grounded in the social psychology literat…
▽ More
In this paper, we investigate the dynamics of coordinating and anti-coordinating agents in a coevolutionary model for actions and opinions. In the model, the individuals of a population interact on a two-layer network, sharing their opinions and observing others' action, while revising their own opinions and actions according to a game-theoretic mechanism, grounded in the social psychology literature. First, we consider the scenario of coordinating agents, where convergence to a Nash equilibrium (NE) is guaranteed. We identify conditions for reaching consensus configurations and establish regions of attraction for these equilibria. Second, we study networks of anti-coordinating agents. In this second scenario, we prove that all trajectories converge to a NE by leveraging potential game theory. Then, we establish analytical conditions on the network structure and model parameters to guarantee the existence of consensus and polarized equilibria, characterizing their regions of attraction.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning
Authors:
Zongyuan Zhang,
Tianyang Duan,
Zheng Lin,
Dong Huang,
Zihan Fang,
Zekai Sun,
Ling Xiong,
Hongbin Liang,
Heming Cui,
Yong Cui
Abstract:
Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental perturbations. While existing whitebox adversarial attacks rely on local gradient information and apply uniform perturbations across all states to evaluate DRL robustness, they fail to account for te…
▽ More
Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental perturbations. While existing whitebox adversarial attacks rely on local gradient information and apply uniform perturbations across all states to evaluate DRL robustness, they fail to account for temporal dynamics and state-specific vulnerabilities. To combat the above challenge, we first conduct a theoretical analysis of white-box attacks in DRL by establishing the adversarial victim-dynamics Markov decision process (AVD-MDP), to derive the necessary and sufficient conditions for a successful attack. Based on this, we propose a selective state-aware reinforcement adversarial attack method, named STAR, to optimize perturbation stealthiness and state visitation dispersion. STAR first employs a soft mask-based state-targeting mechanism to minimize redundant perturbations, enhancing stealthiness and attack effectiveness. Then, it incorporates an information-theoretic optimization objective to maximize mutual information between perturbations, environmental states, and victim actions, ensuring a dispersed state-visitation distribution that steers the victim agent into vulnerable states for maximum return reduction. Extensive experiments demonstrate that STAR outperforms state-of-the-art benchmarks.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior
Authors:
Hengyue Liang,
Taihui Li,
Ju Sun
Abstract:
Image watermarks have been considered a promising technique to help detect AI-generated content, which can be used to protect copyright or prevent fake image abuse. In this work, we present a black-box method for removing invisible image watermarks, without the need of any dataset of watermarked images or any knowledge about the watermark system. Our approach is simple to implement: given a single…
▽ More
Image watermarks have been considered a promising technique to help detect AI-generated content, which can be used to protect copyright or prevent fake image abuse. In this work, we present a black-box method for removing invisible image watermarks, without the need of any dataset of watermarked images or any knowledge about the watermark system. Our approach is simple to implement: given a single watermarked image, we regress it by deep image prior (DIP). We show that from the intermediate steps of DIP one can reliably find an evasion image that can remove invisible watermarks while preserving high image quality. Due to its unique working mechanism and practical effectiveness, we advocate including DIP as a baseline invasion method for benchmarking the robustness of watermarking systems. Finally, by showing the limited ability of DIP and other existing black-box methods in evading training-based visible watermarks, we discuss the positive implications on the practical use of training-based visible watermarks to prevent misinformation abuse.
△ Less
Submitted 2 July, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems
Authors:
Qingliang Meng,
Pengju Ren,
Tian Li,
Changsong Dai,
Huizhi Liang
Abstract:
Automatic speech recognition (ASR) systems normally consist of an acoustic model (AM) and a language model (LM). The acoustic model estimates the probability distribution of text given the input speech, while the language model calibrates this distribution toward a specific knowledge domain to produce the final transcription. Traditional ASR-specific LMs are typically trained in a unidirectional (…
▽ More
Automatic speech recognition (ASR) systems normally consist of an acoustic model (AM) and a language model (LM). The acoustic model estimates the probability distribution of text given the input speech, while the language model calibrates this distribution toward a specific knowledge domain to produce the final transcription. Traditional ASR-specific LMs are typically trained in a unidirectional (left-to-right) manner to align with autoregressive decoding. However, this restricts the model from leveraging the right-side context during training, limiting its representational capacity. In this work, we propose MTLM, a novel training paradigm that unifies unidirectional and bidirectional manners through 3 training objectives: ULM, BMLM, and UMLM. This approach enhances the LM's ability to capture richer linguistic patterns from both left and right contexts while preserving compatibility with standard ASR autoregressive decoding methods. As a result, the MTLM model not only enhances the ASR system's performance but also support multiple decoding strategies, including shallow fusion, unidirectional/bidirectional n-best rescoring. Experiments on the LibriSpeech dataset show that MTLM consistently outperforms unidirectional training across multiple decoding strategies, highlighting its effectiveness and flexibility in ASR applications.
△ Less
Submitted 14 June, 2025; v1 submitted 14 February, 2025;
originally announced February 2025.
-
Baichuan-Omni-1.5 Technical Report
Authors:
Yadong Li,
Jun Liu,
Tao Zhang,
Tao Zhang,
Song Chen,
Tianpeng Li,
Zehuan Li,
Lijun Liu,
Lingfeng Ming,
Guosheng Dong,
Da Pan,
Chong Li,
Yuanbo Fang,
Dongdong Kuang,
Mingrui Wang,
Chenglin Zhu,
Youwei Zhang,
Hongyu Guo,
Fengyu Zhang,
Yuran Wang,
Bowen Ding,
Wei Song,
Xu Li,
Yuqi Huo,
Zheng Liang
, et al. (68 additional authors not shown)
Abstract:
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip…
▽ More
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection
Authors:
Hanzhe Liang,
Guoyang Xie,
Chengbin Hou,
Bingshu Wang,
Can Gao,
Jinbao Wang
Abstract:
3D anomaly detection has recently become a significant focus in computer vision. Several advanced methods have achieved satisfying anomaly detection performance. However, they typically concentrate on the external structure of 3D samples and struggle to leverage the internal information embedded within samples. Inspired by the basic intuition of why not look inside for more, we introduce a straigh…
▽ More
3D anomaly detection has recently become a significant focus in computer vision. Several advanced methods have achieved satisfying anomaly detection performance. However, they typically concentrate on the external structure of 3D samples and struggle to leverage the internal information embedded within samples. Inspired by the basic intuition of why not look inside for more, we introduce a straightforward method named Internal Spatial Modality Perception~(ISMP) to explore the feature representation from internal views fully. Specifically, our proposed ISMP consists of a critical perception module, Spatial Insight Engine~(SIE), which abstracts complex internal information of point clouds into essential global features. Besides, to better align structural information with point data, we propose an enhanced key point feature extraction module for amplifying spatial structure feature representation. Simultaneously, a novel feature filtering module is incorporated to reduce noise and redundant features for further aligning precise spatial structure. Extensive experiments validate the effectiveness of our proposed method, achieving object-level and pixel-level AUROC improvements of 3.2\% and 13.1\%, respectively, on the Real3D-AD benchmarks. Note that the strong generalization ability of SIE has been theoretically proven and is verified in both classification and segmentation tasks.
△ Less
Submitted 10 March, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
NeRF-NQA: No-Reference Quality Assessment for Scenes Generated by NeRF and Neural View Synthesis Methods
Authors:
Qiang Qu,
Hanxue Liang,
Xiaoming Chen,
Yuk Ying Chung,
Yiran Shen
Abstract:
Neural View Synthesis (NVS) has demonstrated efficacy in generating high-fidelity dense viewpoint videos using a image set with sparse views. However, existing quality assessment methods like PSNR, SSIM, and LPIPS are not tailored for the scenes with dense viewpoints synthesized by NVS and NeRF variants, thus, they often fall short in capturing the perceptual quality, including spatial and angular…
▽ More
Neural View Synthesis (NVS) has demonstrated efficacy in generating high-fidelity dense viewpoint videos using a image set with sparse views. However, existing quality assessment methods like PSNR, SSIM, and LPIPS are not tailored for the scenes with dense viewpoints synthesized by NVS and NeRF variants, thus, they often fall short in capturing the perceptual quality, including spatial and angular aspects of NVS-synthesized scenes. Furthermore, the lack of dense ground truth views makes the full reference quality assessment on NVS-synthesized scenes challenging. For instance, datasets such as LLFF provide only sparse images, insufficient for complete full-reference assessments. To address the issues above, we propose NeRF-NQA, the first no-reference quality assessment method for densely-observed scenes synthesized from the NVS and NeRF variants. NeRF-NQA employs a joint quality assessment strategy, integrating both viewwise and pointwise approaches, to evaluate the quality of NVS-generated scenes. The viewwise approach assesses the spatial quality of each individual synthesized view and the overall inter-views consistency, while the pointwise approach focuses on the angular qualities of scene surface points and their compound inter-point quality. Extensive evaluations are conducted to compare NeRF-NQA with 23 mainstream visual quality assessment methods (from fields of image, video, and light-field assessment). The results demonstrate NeRF-NQA outperforms the existing assessment methods significantly and it shows substantial superiority on assessing NVS-synthesized scenes without references. An implementation of this paper are available at https://github.com/VincentQQu/NeRF-NQA.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
Jinming Guo,
Xiaolin Chen,
Jingcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for fundus images were pre-trained with limited disease categories and knowledge base. Here we introduce a knowledge-rich vision-language model (RetiZero) that leverages knowledge from more than 400 fundus diseases. For RetiZero's pretraining, we compiled 341,896 fundus images paired with texts, sourced from public datasets, ophthalmic literature, and online resources, e…
▽ More
Previous foundation models for fundus images were pre-trained with limited disease categories and knowledge base. Here we introduce a knowledge-rich vision-language model (RetiZero) that leverages knowledge from more than 400 fundus diseases. For RetiZero's pretraining, we compiled 341,896 fundus images paired with texts, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits remarkable performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, AI-assisted clinical diagnosis,few-shot fine-tuning, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top-5 accuracies of 0.843 for 15 diseases and 0.756 for 52 diseases. For image retrieval, it achieves Top-5 scores of 0.950 and 0.886 for the same sets, respectively. AI-assisted clinical diagnosis results show that RetiZero's Top-3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China, and the United States. RetiZero substantially enhances clinicians' accuracy in diagnosing fundus diseases, in particularly rare ones. These findings underscore the value of integrating the RetiZero into clinical settings, where various fundus diseases are encountered.
△ Less
Submitted 10 April, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising
Authors:
Hao Liang,
Chengjie,
Kun Li,
Xin Tian
Abstract:
Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs.
To address these problems, we propose a hybrid…
▽ More
Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs.
To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD.
△ Less
Submitted 1 August, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Statistical QoS Provisioning Architecture for 6G Satellite-Terrestrial Integrated Networks
Authors:
Jingqing Wang,
Wenchi Cheng,
Wei Zhang,
Hui Liang
Abstract:
The emergence of massive ultra-reliable and low latency communications (mURLLC) as a category of time/reliability-sensitive service over 6G networks has received considerable research attention, which has presented unprecedented challenges. As one of the key enablers for 6G, satellite-terrestrial integrated networks (STIN) have been developed to offer more expansive connectivity and comprehensive…
▽ More
The emergence of massive ultra-reliable and low latency communications (mURLLC) as a category of time/reliability-sensitive service over 6G networks has received considerable research attention, which has presented unprecedented challenges. As one of the key enablers for 6G, satellite-terrestrial integrated networks (STIN) have been developed to offer more expansive connectivity and comprehensive 3D coverage in space-aerial-terrestrial domains for supporting 6G mission-critical mURLLC applications while fulfilling diverse and rigorous quality of service (QoS) requirements. In the context of these mURLLC-driven satellite services, data freshness assumes paramount importance, as outdated data may engender unpredictable or catastrophic outcomes. To effectively measure data freshness in satellite-terrestrial integrated communications, age of information (AoI) has recently surfaced as a new dimension of QoS metric to support time-sensitive applications. It is crucial to design new analytical models that ensure stringent and diverse QoS metrics bounded by different key parameters, including AoI, delay, and reliability, over 6G satellite-terrestrial integrated networks. However, due to the complicated and dynamic nature of satellite-terrestrial integrated network environments, the research on efficiently defining new statistical QoS schemes while taking into account varying degrees of freedom has still been in their infancy. To remedy these deficiencies, in this paper we develop statistical QoS provisioning schemes over 6G satellite-terrestrial integrated networks in the finite blocklength regime. Particularly, we firstly introduce and review key technologies for supporting mURLLC. Secondly, we formulate a number of novel fundamental statistical-QoS metrics in the finite blocklength regime. Finally, we conduct a set of simulations to evaluate our developed statistical QoS schemes.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Personalized Heart Disease Detection via ECG Digital Twin Generation
Authors:
Yaojun Hu,
Jintai Chen,
Lianting Hu,
Dantong Li,
Jiahuan Yan,
Haochao Ying,
Huiying Liang,
Jian Wu
Abstract:
Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig…
▽ More
Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development.
△ Less
Submitted 11 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization
Authors:
Zhiyu Zhang,
Guo Lu,
Huanxiong Liang,
Anni Tang,
Qiang Hu,
Li Song
Abstract:
Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, R…
▽ More
Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, ReRF separates the modeling from compression process, resulting in suboptimal compression efficiency. In contrast, in this paper, we propose a volumetric video compression method based on dynamic NeRF in a more compact manner. Specifically, we decompose the NeRF representation into the coefficient fields and the basis fields, incrementally updating the basis fields in the temporal domain to achieve dynamic modeling. Additionally, we perform end-to-end joint optimization on the modeling and compression process to further improve the compression efficiency. Extensive experiments demonstrate that our method achieves higher compression efficiency compared to ReRF on various datasets.
△ Less
Submitted 7 November, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Arithmetic Average Density Fusion -- Part IV: Distributed Heterogeneous Fusion of RFS and LRFS Filters via Variational Approximation
Authors:
Tiancheng Li,
Haozhe Liang,
Guchong Li,
Jesús GarcÃa Herrero,
Quan Pan
Abstract:
This paper, the fourth part of a series of papers on the arithmetic average (AA) density fusion approach and its application for target tracking, addresses the intricate challenge of distributed heterogeneous multisensor multitarget tracking, where each inter-connected sensor operates a probability hypothesis density (PHD) filter, a multiple Bernoulli (MB) filter or a labeled MB (LMB) filter and t…
▽ More
This paper, the fourth part of a series of papers on the arithmetic average (AA) density fusion approach and its application for target tracking, addresses the intricate challenge of distributed heterogeneous multisensor multitarget tracking, where each inter-connected sensor operates a probability hypothesis density (PHD) filter, a multiple Bernoulli (MB) filter or a labeled MB (LMB) filter and they cooperate with each other via information fusion. Earlier papers in this series have proven that the proper AA fusion of these filters is all exactly built on averaging their respective unlabeled/labeled PHDs. Based on this finding, two PHD-AA fusion approaches are proposed via variational minimization of the upper bound of the Kullback-Leibler divergence between the local and multi-filter averaged PHDs subject to cardinality consensus based on the Gaussian mixture implementation, enabling heterogeneous filter cooperation. One focuses solely on fitting the weights of the local Gaussian components (L-GCs), while the other simultaneously fits all the parameters of the L-GCs at each sensor, both seeking average consensus on the unlabeled PHD, irrespective of the specific posterior form of the local filters. For the distributed peer-to-peer communication, both the classic consensus and flooding paradigms have been investigated. Simulations have demonstrated the effectiveness and flexibility of the proposed approaches in both homogeneous and heterogeneous scenarios.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform
Authors:
Ting Zhu,
Shufei Duan,
Camille Dingam,
Huizhi Liang,
Wei Zhang
Abstract:
Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast…
▽ More
Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features. With the proposed algorithm, the fast Fourier transform of the dysarthria speech is first performed and then followed by EMD to get intrinsic mode functions (IMFs). After that, FWHT is used to output new coefficients and to extract statistical features based on IMFs, power spectral density, and enhanced gammatone frequency cepstral coefficients. To evaluate the proposed approach, we conducted experiments on two public pathological speech databases including UA Speech and TORGO. The results show that our algorithm performed better than traditional features in classification. We achieved improvements of 13.8% (UA Speech) and 3.84% (TORGO), respectively. Furthermore, the incorporation of an imbalanced classification algorithm to address data imbalance has resulted in a 12.18% increase in recognition accuracy. This algorithm effectively addresses the challenges of the imbalanced dataset and non-linearity in dysarthric speech and simultaneously provides a robust representation of the local pathological features of the vocal folds and tracts.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Semantic Importance-Aware Based for Multi-User Communication Over MIMO Fading Channels
Authors:
Haotai Liang,
Zhicheng Bao,
Wannian An,
Chen Dong,
Xiaodong Xu
Abstract:
Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO…
▽ More
Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO channels based on Singular Value Decomposition (SVD) maximizes the end-to-end semantic performance through the new layer mapping method. For multi-user scenarios, a method of semantic interference cancellation is proposed. Furthermore, a new metric, namely semantic information distortion (SID), is established to unify the expressions of semantic performance, which is affected by channel bandwidth ratio (CBR) and signal-to-noise ratio (SNR). With the help of the proposed metric, we derived performance expressions and Semantic Outage Probability (SOP) of SIA-SC for Single-User Single-Input Single-Output (SU-SISO), Single-User MIMO (SU-MIMO), Multi-Users SISO (MU-MIMO) and Multi-Users MIMO (MU-MIMO) scenarios. Numerical experiments show that SIA-SC can significantly improve semantic performance across various scenarios.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Semantic Synchronization for Enhanced Reliability in Communication Systems
Authors:
Xiaoyi Liu,
Haotai Liang,
Chen Dong,
Xiaodong Xu
Abstract:
As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication,…
▽ More
As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication, this study proposes an image-based semantic synchronization method leveraging intrinsic semantic features of image content. Specifically, a shared synchronized image (SyncImg) is encoded into a synchronization vector header at the transmitter and sent to the receiver. The receiver adopts a sliding window semantic decoder combined with classification and template matching methods to locate the synchronization point. Experimental results demonstrate that compared with traditional methods, the proposed method achieves a lower miss detected ratio (MDR) and root-mean-square error (RMSE) under low signal-to-noise ratios, realizing accurate synchronization of semantic signals across different devices.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Design, construction and evaluation of emotional multimodal pathological speech database
Authors:
Ting Zhu,
Shufei Duan,
Huizhi Liang,
Wei Zhang
Abstract:
The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, ang…
▽ More
The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, angry and neutral emotions. All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program. The subjective analysis justifies from emotion discrimination accuracy, speech intelligibility, valence-arousal spatial distribution, and correlation between SCL-90 and disease severity. The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
DeepOPF-U: A Unified Deep Neural Network to Solve AC Optimal Power Flow in Multiple Networks
Authors:
Heng Liang,
Changhong Zhao
Abstract:
The traditional machine learning models to solve optimal power flow (OPF) are mostly trained for a given power network and lack generalizability to today's power networks with varying topologies and growing plug-and-play distributed energy resources (DERs). In this paper, we propose DeepOPF-U, which uses one unified deep neural network (DNN) to solve alternating-current (AC) OPF problems in differ…
▽ More
The traditional machine learning models to solve optimal power flow (OPF) are mostly trained for a given power network and lack generalizability to today's power networks with varying topologies and growing plug-and-play distributed energy resources (DERs). In this paper, we propose DeepOPF-U, which uses one unified deep neural network (DNN) to solve alternating-current (AC) OPF problems in different power networks, including a set of power networks that is successively expanding. Specifically, we design elastic input and output layers for the vectors of given loads and OPF solutions with varying lengths in different networks. The proposed method, using a single unified DNN, can deal with different and growing numbers of buses, lines, loads, and DERs. Simulations of IEEE 57/118/300-bus test systems and a network growing from 73 to 118 buses verify the improved performance of DeepOPF-U compared to existing DNN-based solution methods.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
SFUSNet: A Spatial-Frequency domain-based Multi-branch Network for diagnosis of Cervical Lymph Node Lesions in Ultrasound Images
Authors:
Yubiao Yue,
Jun Xue,
Haihua Liang,
Bingchun Luo,
Zhenzhang Li
Abstract:
Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal…
▽ More
Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named SFUSNet. SFUSNet not only discerns variances in ultrasound images from the spatial domain but also adeptly captures micro-structural alterations across various lesions in the frequency domain. To ascertain the potential of SFUSNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that SFUSNet is the state-of-the-art model and can achieve 92.89% accuracy. Moreover, its average precision, average sensitivity and average specificity for four types of lesions achieve 90.46%, 89.95% and 97.49%, respectively.
△ Less
Submitted 4 October, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
U-SEANNet: A Simple, Efficient and Applied U-Shaped Network for Diagnosis of Nasal Diseases on Nasal Endoscopic Images
Authors:
Yubiao Yue,
Jun Xue,
Chao Wang,
Haihua Liang,
Zhenzhang Li
Abstract:
Numerous studies have affirmed that deep learning models can facilitate early diagnosis of lesions in endoscopic images. However, the lack of available datasets stymies advancements in research on nasal endoscopy, and existing models fail to strike a good trade-off between model diagnosis performance, model complexity and parameters size, rendering them unsuitable for real-world application. To br…
▽ More
Numerous studies have affirmed that deep learning models can facilitate early diagnosis of lesions in endoscopic images. However, the lack of available datasets stymies advancements in research on nasal endoscopy, and existing models fail to strike a good trade-off between model diagnosis performance, model complexity and parameters size, rendering them unsuitable for real-world application. To bridge these gaps, we created the first large-scale nasal endoscopy dataset, named 7-NasalEID, comprising 11,352 images that contain six common nasal diseases and normal samples. Subsequently, we proposed U-SEANNet, an innovative U-shaped architecture, underpinned by depth-wise separable convolution. Moreover, to enhance its capacity for detecting nuanced discrepancies in input images, U-SEANNet employs the Global-Local Channel Feature Fusion module, enabling it to utilize salient channel features from both global and local contexts. To demonstrate U-SEANNet's potential, we benchmarked U-SEANNet against seventeen modern architectures through five-fold cross-validation. The experimental results show that U-SEANNet achieves a commendable accuracy of 93.58%. Notably, U-SEANNet's parameters size and GFLOPs are only 0.78M and 0.21, respectively. Our findings suggest U-SEANNet is the state-of-the-art model for nasal diseases diagnosis in endoscopic images.
△ Less
Submitted 11 February, 2024; v1 submitted 27 August, 2023;
originally announced August 2023.
-
DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music
Authors:
Hongru Liang,
Jingyao Liu,
Yuanxin Xiang,
Jiachen Du,
Lanjun Zhou,
Shushen Pan,
Wenqiang Lei
Abstract:
Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an…
▽ More
Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an essential but under-explored setting, where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels. To this end, we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}}$verse and $\underline{\text{Va}}$lid labels from user comments for music. The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function. The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels. We hope our work can inspire future research on automated music labeling.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming
Authors:
Hao Liang,
Guanxing Zhou,
Xiaotong Tu,
Andreas Jakobsson,
Xinghao Ding,
Yue Huang
Abstract:
Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As a…
▽ More
Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here termed the DAMAS-FISTA-Net. By exploiting the natural structure of an acoustic beamformer, the proposed network inherits the physical knowledge of the acoustic system, and thus learns the underlying physical properties of the propagation. As a result, all the network parameters may be learned end-to-end, guided by a model-based prior using back-propagation. Notably, the proposed network enables an excellent interpretability and the ability of being able to process the raw data directly. Extensive numerical experiments using both simulated and real-world data illustrate the preferable performance of the DAMAS-FISTA-Net as compared to alternative approaches.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
X-ray Recognition: Patient identification from X-rays using a contrastive objective
Authors:
Hao Liang,
Kevin Ni,
Guha Balakrishnan
Abstract:
Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest po…
▽ More
Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Visualizing chest X-ray dataset biases using GANs
Authors:
Hao Liang,
Kevin Ni,
Guha Balakrishnan
Abstract:
Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), t…
▽ More
Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups.
△ Less
Submitted 5 September, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views
Authors:
Hanxue Liang,
Tianhao Wu,
Param Hanji,
Francesco Banterle,
Hongyun Gao,
Rafal Mantiuk,
Cengiz Oztireli
Abstract:
Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS metho…
▽ More
Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation.
△ Less
Submitted 24 October, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
ByteCover3: Accurate Cover Song Identification on Short Queries
Authors:
Xingjian Du,
Zijie Wang,
Xia Liang,
Huidong Liang,
Bilei Zhu,
Zejun Ma
Abstract:
Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and w…
▽ More
Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and waiting for an industrial-level solution. In this paper, we upgrade the previous ByteCover systems to ByteCover3 that utilizes local features to further improve the identification performance of short music queries. ByteCover3 is designed with a local alignment loss (LAL) module and a two-stage feature retrieval pipeline, allowing the system to perform CSI in a more precise and efficient way. We evaluated ByteCover3 on multiple datasets with different benchmark settings, where ByteCover3 beat all the compared methods including its previous versions.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Non-Orthogonal Multiple Access Enhanced Multi-User Semantic Communication
Authors:
Weizhi Li,
Haotai Liang,
Chen Dong,
Xiaodong Xu,
Ping Zhang,
Kaijun Liu
Abstract:
Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-base…
▽ More
Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-based multi-user semantic communication system named NOMASC is proposed in this paper. The proposed system can support semantic tranmission of multiple users with diverse modalities of source information. To avoid high demand for hardware, an asymmetric quantizer is employed at the end of the semantic encoder for discretizing the continuous full-resolution semantic feature. In addition, a neural network model is proposed for mapping the discrete feature into self-learned symbols and accomplishing intelligent multi-user detection (MUD) at the receiver. Simulation results demonstrate that the proposed system holds good performance in non-orthogonal transmission of multiple user signals and outperforms the other methods, especially at low-to-medium SNRs. Moreover, it has high robustness under various simulation settings and mismatched test scenarios.
△ Less
Submitted 20 November, 2023; v1 submitted 12 March, 2023;
originally announced March 2023.
-
A Field-Theoretic View of Unlabeled Sensing
Authors:
Hao Liang,
Jingyu Lu,
Manolis C. Tsakiris,
Lihong Zhi
Abstract:
Unlabeled sensing is the problem of solving a linear system of equations, where the right-hand-side vector is known only up to a permutation. In this work, we study fields of rational functions related to symmetric polynomials and their images under a linear projection of the variables; as a consequence, we establish that the solution to an n-dimensional unlabeled sensing problem with generic data…
▽ More
Unlabeled sensing is the problem of solving a linear system of equations, where the right-hand-side vector is known only up to a permutation. In this work, we study fields of rational functions related to symmetric polynomials and their images under a linear projection of the variables; as a consequence, we establish that the solution to an n-dimensional unlabeled sensing problem with generic data can be obtained as the unique solution to a system of n + 1 polynomial equations of degrees 1, 2, . . . , n + 1 in n unknowns. Besides the new theoretical insights, this development offers the potential for scaling up algebraic unlabeled sensing algorithms.
△ Less
Submitted 4 November, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
A Specific Task-oriented Semantic Image Communication System for substation patrol inspection
Authors:
Senran Fan,
Haotai Liang,
Chen Dong,
Xiaodong Xu,
Geng Liu
Abstract:
Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To…
▽ More
Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To solve such problem, a Specific Task-oriented Semantic Communication System for Imag-STSCI is designed, which involves the semantic features extraction, transmission, restoration and enhancement to get clearer images sent by intelligent robots under weak signals. Inspired by that only some specific details of the image are needed in such substation patrol inspection task, we proposed a new paradigm of semantic enhancement in such specific task to ensure the clarity of key semantic information when facing a lower bit rate or a low signal-to-noise ratio situation. Across the reality-based simulation, experiments show our STSCI can generally surpass traditional image-compression-based and channel-codingbased or other semantic communication system in the substation patrol inspection task with a lower bit rate even under a low signal-to-noise ratio situation.
△ Less
Submitted 13 April, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Cooperative Guidance Strategy for Active Defense Spacecraft with Imperfect Information via Deep Reinforcement Learning
Authors:
Li Zhi,
Haizhao Liang,
Jinze Wu,
Jianying Wang,
Yu Zheng
Abstract:
In this paper, an adaptive cooperative guidance strategy for the active protection of a target spacecraft trying to evade an interceptor was developed. The target spacecraft performs evasive maneuvers, launching an active defense vehicle to divert the interceptor. Instead of classical strategies, which are based on optimal control or differential game theory, the problem was solved by using the de…
▽ More
In this paper, an adaptive cooperative guidance strategy for the active protection of a target spacecraft trying to evade an interceptor was developed. The target spacecraft performs evasive maneuvers, launching an active defense vehicle to divert the interceptor. Instead of classical strategies, which are based on optimal control or differential game theory, the problem was solved by using the deep reinforcement learning method, and imperfect information was assumed for the interceptor maneuverability. To address the sparse reward problem, a universal reward design method and an increasingly difficult training approach were presented utilizing the shaping technique. Guidance law, reward function, and training approach were demonstrated through the learning process and Monte Carlo simulations. The application of the non-sparse reward function and increasingly difficult training approach accelerated the model convergence, alleviating the overfitting problem. Considering a standard optimal guidance law as a benchmark, the effectiveness, and the advantages, that guarantee the target spacecraft's escape and win rates in a multi-agent game, of the proposed guidance strategy were validated by the simulation results. The trained agent's adaptiveness to the interceptor maneuverability was superior to the optimal guidance law. Moreover, compared to the standard optimal guidance law, the proposed guidance strategy performed better with less prior knowledge.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Aircraft Ground Taxiing Deduction and Conflict Early Warning Method Based on Control Command Information
Authors:
Jingchang Zhuge,
Huiyuan Liang,
Yiming Zhang,
Shichao Li,
Xinyu Yang,
Jun Wu
Abstract:
Aircraft taxiing conflict is a threat to the safety of airport operations, mainly due to the human error in control command infor-mation. In order to solve the problem, The aircraft taxiing deduction and conflict early warning method based on control order information is proposed. This method does not need additional equipment and operating costs, and is completely based on his-torical data and co…
▽ More
Aircraft taxiing conflict is a threat to the safety of airport operations, mainly due to the human error in control command infor-mation. In order to solve the problem, The aircraft taxiing deduction and conflict early warning method based on control order information is proposed. This method does not need additional equipment and operating costs, and is completely based on his-torical data and control command information. When the aircraft taxiing command is given, the future route information will be deduced, and the probability of conflict with other taxiing aircraft will be calculated to achieve conflict detection and early warning of different levels. The method is validated by the aircraft taxi data from real airports. The results show that the method can effectively predict the aircraft taxiing process, and can provide early warning of possible conflicts. Due to the advantages of low cost and high accuracy, this method has the potential to be applied to airport operation decision support system.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Optimization for Robustness Evaluation beyond $\ell_p$ Metrics
Authors:
Hengyue Liang,
Buyun Liang,
Ying Cui,
Tim Mitchell,
Ju Sun
Abstract:
Empirical evaluation of deep learning models against adversarial attacks entails solving nontrivial constrained optimization problems. Popular algorithms for solving these constrained problems rely on projected gradient descent (PGD) and require careful tuning of multiple hyperparameters. Moreover, PGD can only handle $\ell_1$, $\ell_2$, and $\ell_\infty$ attack models due to the use of analytical…
▽ More
Empirical evaluation of deep learning models against adversarial attacks entails solving nontrivial constrained optimization problems. Popular algorithms for solving these constrained problems rely on projected gradient descent (PGD) and require careful tuning of multiple hyperparameters. Moreover, PGD can only handle $\ell_1$, $\ell_2$, and $\ell_\infty$ attack models due to the use of analytical projectors. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO, With Constraint-Folding (PWCF), to add reliability and generality to robustness evaluation. PWCF 1) finds good-quality solutions without the need of delicate hyperparameter tuning, and 2) can handle general attack models, e.g., general $\ell_p$ ($p \geq 0$) and perceptual attacks, which are inaccessible to PGD-based algorithms.
△ Less
Submitted 13 November, 2022; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification
Authors:
Guanxing Zhou,
Hao Liang,
Xinghao Ding,
Yue Huang,
Xiaotong Tu,
Saqlain Abbas
Abstract:
Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these met…
▽ More
Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these methods with complex network structures are hard to be recognized by hardware. In this paper, a novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals. The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed, which may generalize well to real data. The code and trained models are available at https://github.com/JoaquinChou/Acoustic-Net.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
RareGAN: Generating Samples for Rare Classes
Authors:
Zinan Lin,
Hao Liang,
Giulia Fanti,
Vyas Sekar
Abstract:
We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g…
▽ More
We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
False Data Injection Attack on Electric Vehicle-Assisted Voltage Regulation
Authors:
Yuan Liu,
Omid Ardakanian,
Ioanis Nikolaidis,
Hao Liang
Abstract:
With the large scale penetration of electric vehicles (EVs) and the advent of bidirectional chargers, EV aggregators will become a major player in the voltage regulation market. This paper proposes a novel false data injection attack (FDIA) against the voltage regulation capacity estimation of EV charging stations, the process that underpins voltage regulation in distribution system. The proposed…
▽ More
With the large scale penetration of electric vehicles (EVs) and the advent of bidirectional chargers, EV aggregators will become a major player in the voltage regulation market. This paper proposes a novel false data injection attack (FDIA) against the voltage regulation capacity estimation of EV charging stations, the process that underpins voltage regulation in distribution system. The proposed FDIA takes into account the uncertainty in EV mobility and network conditions. The attack vector with the largest expected adverse impact is the solution of a stochastic optimization problem subject to a constraint that ensures it can bypass bad data detection. We show that this attack vector can be determined by solving a sequence of convex quadratically constrained linear programs. The case studies examined in a co-simulation platform, based on two standard test feeders, reveal the vulnerability of the voltage regulation capacity estimation.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Innovative semantic communication system
Authors:
Chen Dong,
Haotai Liang,
Xiaodong Xu,
Shujun Han,
Bizhu Wang,
Ping Zhang
Abstract:
Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm,…
▽ More
Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm, which integrates artificial intelligence and communication, the semantic communication system. Semantic communication is at the second level of communication based on Shannon and Weaver\cite{6197583}, which retains the semantic features of the transmitted information and recovers the signal at the receiver, thus compressing the communication traffic without losing important information. Different from other semantic communication systems, the proposed system not only transmits semantic information but also transmits semantic decoder. In addition, a general semantic metrics is proposed to measure the quality of semantic communication system. In particular, the semantic communication system for image, namely AESC-I, is designed to verify the feasibility of the new paradigm. Simulations are conducted on our system with the additive white Gaussian noise (AWGN) and the multipath fading channel using MNIST and Cifar10 datasets. The experimental results show that DeepSC-I can effectively extract semantic information and reconstruct images at a relatively low SNR.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
Early Stopping for Deep Image Prior
Authors:
Hengkang Wang,
Taihui Li,
Zhong Zhuang,
Tiancong Chen,
Hengyue Liang,
Ju Sun
Abstract:
Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the…
▽ More
Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the practicality of DIP often depends critically on good early stopping (ES) that captures the transition period. In this regard, the majority of DIP works for vision tasks only demonstrates the potential of the models -- reporting the peak performance against the ground truth, but provides no clue about how to operationally obtain near-peak performance without access to the groundtruth. In this paper, we set to break this practicality barrier of DIP, and propose an efficient ES strategy, which consistently detects near-peak performance across several vision tasks and DIP variants. Based on a simple measure of dispersion of consecutive DIP reconstructions, our ES method not only outpaces the existing ones -- which only work in very narrow domains, but also remains effective when combined with a number of methods that try to mitigate the overfitting. The code is available at https://github.com/sun-umn/Early_Stopping_for_DIP.
△ Less
Submitted 11 December, 2023; v1 submitted 11 December, 2021;
originally announced December 2021.
-
Economic MPC-based planning for marine vehicles: Tuning safety and energy efficiency
Authors:
Haojiao Liang,
Huiping Li,
Jian Gao,
Rongxin Cui,
Demin Xu
Abstract:
Energy efficiency and safety are two critical objectives for marine vehicles operating in environments with obstacles, and they generally conflict with each other. In this paper, we propose a novel online motion planning method of marine vehicles which can make trade-offs between the two design objectives based on the framework of economic model predictive control (EMPC). Firstly, the feasible tra…
▽ More
Energy efficiency and safety are two critical objectives for marine vehicles operating in environments with obstacles, and they generally conflict with each other. In this paper, we propose a novel online motion planning method of marine vehicles which can make trade-offs between the two design objectives based on the framework of economic model predictive control (EMPC). Firstly, the feasible trajectory with the most safety margin is designed and utilized as tracking reference. Secondly, the EMPC-based receding horizon motion planning algorithm is designed, in which the practical consumed energy and safety measure (i.e., the distance between the planning trajectory and the reference) are considered. Experimental results verify the effectiveness and feasibility of the proposed method.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Self-Validation: Early Stopping for Single-Instance Deep Generative Priors
Authors:
Taihui Li,
Zhong Zhuang,
Hengyue Liang,
Le Peng,
Hengkang Wang,
Ju Sun
Abstract:
Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stopping (ES), which by far has largely been handled…
▽ More
Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stopping (ES), which by far has largely been handled in an ad-hoc manner. In this paper, we propose the first principled method for ES when applying SIDGPs to IR, taking advantage of the typical bell trend of the reconstruction quality. In particular, our method is based on collaborative training and self-validation: the primal reconstruction process is monitored by a deep autoencoder, which is trained online with the historic reconstructed images and used to validate the reconstruction quality constantly. Experimentally, on several IR problems and different SIDGPs, our self-validation method is able to reliably detect near-peak performance and signal good ES points. Our code is available at https://sun-umn.github.io/Self-Validation/.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Controlled AutoEncoders to Generate Faces from Voices
Authors:
Hao Liang,
Lulan Yu,
Guikang Xu,
Bhiksha Raj,
Rita Singh
Abstract:
Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face…
▽ More
Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
High-Throughput Precision Phenotyping of Left Ventricular Hypertrophy with Cardiovascular Deep Learning
Authors:
Grant Duffy,
Paul P Cheng,
Neal Yuan,
Bryan He,
Alan C. Kwan,
Matthew J. Shun-Shin,
Kevin M. Alexander,
Joseph Ebinger,
Matthew P. Lungren,
Florian Rader,
David H. Liang,
Ingela Schnittger,
Euan A. Ashley,
James Y. Zou,
Jignesh Patel,
Ronald Witteles,
Susan Cheng,
David Ouyang
Abstract:
Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di…
▽ More
Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Rethinking Transfer Learning for Medical Image Classification
Authors:
Le Peng,
Hengyue Liang,
Gaoxiang Luo,
Taihui Li,
Ju Sun
Abstract:
Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), whi…
▽ More
Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), which treat the layers in the pretrained models differentially. In this paper, we add one more strategy into this family, called TruncatedTL, which reuses and finetunes appropriate bottom layers and directly discards the remaining layers. This yields not only superior MIC performance but also compact models for efficient inference, compared to other differential TL methods. Our code is available at: https://github.com/sun-umn/TTL
△ Less
Submitted 26 May, 2024; v1 submitted 9 June, 2021;
originally announced June 2021.
-
End-to-end Uncertainty-based Mitigation of Adversarial Attacks to Automated Lane Centering
Authors:
Ruochen Jiao,
Hengyi Liang,
Takami Sato,
Junjie Shen,
Qi Alfred Chen,
Qi Zhu
Abstract:
In the development of advanced driver-assistance systems (ADAS) and autonomous vehicles, machine learning techniques that are based on deep neural networks (DNNs) have been widely used for vehicle perception. These techniques offer significant improvement on average perception accuracy over traditional methods, however, have been shown to be susceptible to adversarial attacks, where small perturba…
▽ More
In the development of advanced driver-assistance systems (ADAS) and autonomous vehicles, machine learning techniques that are based on deep neural networks (DNNs) have been widely used for vehicle perception. These techniques offer significant improvement on average perception accuracy over traditional methods, however, have been shown to be susceptible to adversarial attacks, where small perturbations in the input may cause significant errors in the perception results and lead to system failure. Most prior works addressing such adversarial attacks focus only on the sensing and perception modules. In this work, we propose an end-to-end approach that addresses the impact of adversarial attacks throughout perception, planning, and control modules. In particular, we choose a target ADAS application, the automated lane centering system in OpenPilot, quantify the perception uncertainty under adversarial attacks, and design a robust planning and control module accordingly based on the uncertainty analysis. We evaluate our proposed approach using both the public dataset and production-grade autonomous driving simulator. The experiment results demonstrate that our approach can effectively mitigate the impact of adversarial attacks and can achieve 55% to 90% improvement over the original OpenPilot.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
Exploration of Whether Skylight Polarization Patterns Contain Three-dimensional Attitude Information
Authors:
Huaju Liang,
Hongyang Bai,
Tong Zhou
Abstract:
Our previous work has demonstrated that Rayleigh model, which is widely used in polarized skylight navigation to describe skylight polarization patterns, does not contain three-dimensional (3D) attitude information [1]. However, it is still necessary to further explore whether the skylight polarization patterns contain 3D attitude information. So, in this paper, a social spider optimization (SSO)…
▽ More
Our previous work has demonstrated that Rayleigh model, which is widely used in polarized skylight navigation to describe skylight polarization patterns, does not contain three-dimensional (3D) attitude information [1]. However, it is still necessary to further explore whether the skylight polarization patterns contain 3D attitude information. So, in this paper, a social spider optimization (SSO) method is proposed to estimate three Euler angles, which considers the difference of each pixel among polarization images based on template matching (TM) to make full use of the captured polarization information. In addition, to explore this problem, we not only use angle of polarization (AOP) and degree of polarization (DOP) information, but also the light intensity (LI) information. So, a sky model is established, which combines Berry model and Hosek model to fully describe AOP, DOP, and LI information in the sky, and considers the influence of four neutral points, ground albedo, atmospheric turbidity, and wavelength. The results of simulation show that the SSO algorithm can estimate 3D attitude and the established sky model contains 3D attitude information. However, when there are measurement noise or model error, the accuracy of 3D attitude estimation drops significantly. Especially in field experiment, it is very difficult to estimate 3D attitude. Finally, the results are discussed in detail.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music
Authors:
Hongru Liang,
Wenqiang Lei,
Paul Yaozhu Chan,
Zhenglu Yang,
Maosong Sun,
Tat-Seng Chua
Abstract:
Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1)…
▽ More
Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1) musical token is multi-faceted -- it comprises of pitch, rhythm and dynamics information; and (2) musical context is two-dimensional -- each musical token is dependent on both melodic and harmonic contexts. In this work, we provide a comprehensive solution by proposing a novel framework named PiRhDy that integrates pitch, rhythm, and dynamics information seamlessly. PiRhDy adopts a hierarchical strategy which can be decomposed into two steps: (1) token (i.e., note event) modeling, which separately represents pitch, rhythm, and dynamics and integrates them into a single token embedding; and (2) context modeling, which utilizes melodic and harmonic knowledge to train the token embedding. A thorough study was made on each component and sub-strategy of PiRhDy. We further validate our embeddings in three downstream tasks -- melody completion, accompaniment suggestion, and genre classification. Results indicate a significant advancement of the neural approach towards symbolic music as well as PiRhDy's potential as a pretrained tool for a broad range of symbolic music applications.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Leveraging Weakly-hard Constraints for Improving System Fault Tolerance with Functional and Timing Guarantees
Authors:
Hengyi Liang,
Zhilu Wang,
Ruochen Jiao,
Qi Zhu
Abstract:
Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to their resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints,…
▽ More
Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to their resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints, which allows task deadline misses in a bounded manner, to improve system's capability to accommodate fault tolerance techniques while ensuring timing and functional correctness. In particular, we 1) quantitatively measure control cost under different deadline hit/miss scenarios and identify weak-hard constraints that guarantee control stability, 2) employ typical worst-case analysis (TWCA) to bound the number of deadline misses and approximate system control cost, 3) develop an event-based simulation method to check the task execution pattern and evaluate system control cost for any given solution and 4) develop a meta-heuristic algorithm that consists of heuristic methods and a simulated annealing procedure to explore the design space. Our experiments on an industrial case study and a set of synthetic examples demonstrate the effectiveness of our approach.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Stain Style Transfer of Histopathology Images Via Structure-Preserved Generative Learning
Authors:
Hanwen Liang,
Konstantinos N. Plataniotis,
Xingyu Li
Abstract:
Computational histopathology image diagnosis becomes increasingly popular and important, where images are segmented or classified for disease diagnosis by computers. While pathologists do not struggle with color variations in slides, computational solutions usually suffer from this critical issue. To address the issue of color variations in histopathology images, this study proposes two stain styl…
▽ More
Computational histopathology image diagnosis becomes increasingly popular and important, where images are segmented or classified for disease diagnosis by computers. While pathologists do not struggle with color variations in slides, computational solutions usually suffer from this critical issue. To address the issue of color variations in histopathology images, this study proposes two stain style transfer models, SSIM-GAN and DSCSI-GAN, based on the generative adversarial networks. By cooperating structural preservation metrics and feedback of an auxiliary diagnosis net in learning, medical-relevant information presented by image texture, structure, and chroma-contrast features is preserved in color-normalized images. Particularly, the smart treat of chromatic image content in our DSCSI-GAN model helps to achieve noticeable normalization improvement in image regions where stains mix due to histological substances co-localization. Extensive experimentation on public histopathology image sets indicates that our methods outperform prior arts in terms of generating more stain-consistent images, better preserving histological information in images, and obtaining significantly higher learning efficiency. Our python implementation is published on https://github.com/hanwen0529/DSCSI-GAN.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.