-
Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Authors:
Yuyu Wang,
Wuyue Xia,
Huaxiu Yao,
Jingping Nie
Abstract:
Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, b…
▽ More
Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, building on a recently released dataset with synchronized audio and respiration signals, we provide systematic annotations of pause types. Using these annotations, we systematically conduct exploratory breathing and semantic pause detection and exertion-level classification across deep learning models (GRU, 1D CNN-LSTM, AlexNet, VGG16), acoustic features (MFCC, MFB), and layer-stratified Wav2Vec2 representations. We evaluate three setups-single feature, feature fusion, and a two-stage detection-classification cascade-under both classification and regression formulations. Results show per-type detection accuracy up to 89$\%$ for semantic, 55$\%$ for breathing, 86$\%$ for combined pauses, and 73$\%$overall, while exertion-level classification achieves 90.5$\%$ accuracy, outperformin prior work.
△ Less
Submitted 18 September, 2025;
originally announced September 2025.
-
InWaveSR: Topography-Aware Super-Resolution Network for Internal Solitary Waves
Authors:
Xinjie Wang,
Zhongrui Li,
Peng Han,
Chunxin Yuan,
Jiexin Xu,
Zhiqiang Wei,
Jie Nie
Abstract:
The effective utilization of observational data is frequently hindered by insufficient resolution. To address this problem, we present a new spatio-temporal super-resolution (STSR) model, called InWaveSR. It is built on a deep learning framework with physical restrictions and can efficiently generate high-resolution data from low-resolution input, especially for data featuring internal solitary wa…
▽ More
The effective utilization of observational data is frequently hindered by insufficient resolution. To address this problem, we present a new spatio-temporal super-resolution (STSR) model, called InWaveSR. It is built on a deep learning framework with physical restrictions and can efficiently generate high-resolution data from low-resolution input, especially for data featuring internal solitary waves (ISWs). To increase generality and interpretation, the model InWaveSR uses the primitive Navier-Stokes equations as the constraint, ensuring that the output results are physically consistent. In addition, the proposed model incorporates an HF-ResBlock component that combines the attention mechanism and the Fast Fourier Transform (FFT) method to improve the performance of the model in capturing high-frequency characteristics. Simultaneously, in order to enhance the adaptability of the model to complicated bottom topography, an edge sampling and numerical pre-processing method are carried out to optimize the training process. On evaluations using the in-situ observational ISW data, the proposed InWaveSR achieved a peak signal-to-noise ratio (PSNR) score of 36.2, higher than those of the traditional interpolation method and the previous neural network. This highlights its significant superiority over traditional methods, demonstrating its excellent performance and reliability in high-resolution ISW reconstruction.
△ Less
Submitted 3 September, 2025;
originally announced September 2025.
-
HybridRAG-based LLM Agents for Low-Carbon Optimization in Low-Altitude Economy Networks
Authors:
Jinbo Wen,
Cheng Su,
Jiawen Kang,
Jiangtian Nie,
Yang Zhang,
Jianhang Tang,
Dusit Niyato,
Chau Yuen
Abstract:
Low-Altitude Economy Networks (LAENets) are emerging as a promising paradigm to support various low-altitude services through integrated air-ground infrastructure. To satisfy low-latency and high-computation demands, the integration of Unmanned Aerial Vehicles (UAVs) with Mobile Edge Computing (MEC) systems plays a vital role, which offloads computing tasks from terminal devices to nearby UAVs, en…
▽ More
Low-Altitude Economy Networks (LAENets) are emerging as a promising paradigm to support various low-altitude services through integrated air-ground infrastructure. To satisfy low-latency and high-computation demands, the integration of Unmanned Aerial Vehicles (UAVs) with Mobile Edge Computing (MEC) systems plays a vital role, which offloads computing tasks from terminal devices to nearby UAVs, enabling flexible and resilient service provisions for ground users. To promote the development of LAENets, it is significant to achieve low-carbon multi-UAV-assisted MEC networks. However, several challenges hinder this implementation, including the complexity of multi-dimensional UAV modeling and the difficulty of multi-objective coupled optimization. To this end, this paper proposes a novel Retrieval Augmented Generation (RAG)-based Large Language Model (LLM) agent framework for model formulation. Specifically, we develop HybridRAG by combining KeywordRAG, VectorRAG, and GraphRAG, empowering LLM agents to efficiently retrieve structural information from expert databases and generate more accurate optimization problems compared with traditional RAG-based LLM agents. After customizing carbon emission optimization problems for multi-UAV-assisted MEC networks, we propose a Double Regularization Diffusion-enhanced Soft Actor-Critic (R\textsuperscript{2}DSAC) algorithm to solve the formulated multi-objective optimization problem. The R\textsuperscript{2}DSAC algorithm incorporates diffusion entropy regularization and action entropy regularization to improve the performance of the diffusion policy. Furthermore, we dynamically mask unimportant neurons in the actor network to reduce the carbon emissions associated with model training. Simulation results demonstrate the effectiveness and reliability of the proposed HybridRAG-based LLM agent framework and the R\textsuperscript{2}DSAC algorithm.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation
Authors:
Jingping Nie,
Dung T. Tran,
Karan Thakkar,
Vasudha Kowtha,
Jon Huang,
Carlos Avendano,
Erdrin Azemi,
Vikramjit Mitra
Abstract:
Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this…
▽ More
Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this work, using a publicly available phonocardiogram (PCG) dataset and a heart rate (HR) estimation model, we conduct a layer-wise investigation of six acoustic representation FMs: HuBERT, wav2vec2, wavLM, Whisper, Contrastive Language-Audio Pretraining (CLAP), and an in-house CLAP model. Additionally, we implement the baseline method from Nie et al., 2024 (which relies on acoustic features) and show that overall, representation vectors from pre-trained foundation models (FMs) offer comparable performance to the baseline. Notably, HR estimation using the representations from the audio encoder of the in-house CLAP model outperforms the results obtained from the baseline, achieving a lower mean absolute error (MAE) across various train/validation/test splits despite the domain mismatch.
△ Less
Submitted 29 May, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
A Novel Scene Coupling Semantic Mask Network for Remote Sensing Image Segmentation
Authors:
Xiaowen Ma,
Rongrong Lian,
Zhenkai Wu,
Renxiang Guan,
Tingfeng Hong,
Mengjiao Zhao,
Mengting Ma,
Jiangtao Nie,
Zhenhong Du,
Siyang Song,
Wei Zhang
Abstract:
As a common method in the field of computer vision, spatial attention mechanism has been widely used in semantic segmentation of remote sensing images due to its outstanding long-range dependency modeling capability. However, remote sensing images are usually characterized by complex backgrounds and large intra-class variance that would degrade their analysis performance. While vanilla spatial att…
▽ More
As a common method in the field of computer vision, spatial attention mechanism has been widely used in semantic segmentation of remote sensing images due to its outstanding long-range dependency modeling capability. However, remote sensing images are usually characterized by complex backgrounds and large intra-class variance that would degrade their analysis performance. While vanilla spatial attention mechanisms are based on dense affine operations, they tend to introduce a large amount of background contextual information and lack of consideration for intrinsic spatial correlation. To deal with such limitations, this paper proposes a novel scene-Coupling semantic mask network, which reconstructs the vanilla attention with scene coupling and local global semantic masks strategies. Specifically, scene coupling module decomposes scene information into global representations and object distributions, which are then embedded in the attention affinity processes. This Strategy effectively utilizes the intrinsic spatial correlation between features so that improve the process of attention modeling. Meanwhile, local global semantic masks module indirectly correlate pixels with the global semantic masks by using the local semantic mask as an intermediate sensory element, which reduces the background contextual interference and mitigates the effect of intra-class variance. By combining the above two strategies, we propose the model SCSM, which not only can efficiently segment various geospatial objects in complex scenarios, but also possesses inter-clean and elegant mathematical representations. Experimental results on four benchmark datasets demonstrate the the effectiveness of the above two strategies for improving the attention modeling of remote sensing images. The dataset and code are available at https://github.com/xwmaxwma/rssegmentation
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
EigenSR: Eigenimage-Bridged Pre-Trained RGB Learners for Single Hyperspectral Image Super-Resolution
Authors:
Xi Su,
Xiangfei Shen,
Mingyang Wan,
Jing Nie,
Lihui Chen,
Haijun Liu,
Xichuan Zhou
Abstract:
Single hyperspectral image super-resolution (single-HSI-SR) aims to improve the resolution of a single input low-resolution HSI. Due to the bottleneck of data scarcity, the development of single-HSI-SR lags far behind that of RGB natural images. In recent years, research on RGB SR has shown that models pre-trained on large-scale benchmark datasets can greatly improve performance on unseen data, wh…
▽ More
Single hyperspectral image super-resolution (single-HSI-SR) aims to improve the resolution of a single input low-resolution HSI. Due to the bottleneck of data scarcity, the development of single-HSI-SR lags far behind that of RGB natural images. In recent years, research on RGB SR has shown that models pre-trained on large-scale benchmark datasets can greatly improve performance on unseen data, which may stand as a remedy for HSI. But how can we transfer the pre-trained RGB model to HSI, to overcome the data-scarcity bottleneck? Because of the significant difference in the channels between the pre-trained RGB model and the HSI, the model cannot focus on the correlation along the spectral dimension, thus limiting its ability to utilize on HSI. Inspired by the HSI spatial-spectral decoupling, we propose a new framework that first fine-tunes the pre-trained model with the spatial components (known as eigenimages), and then infers on unseen HSI using an iterative spectral regularization (ISR) to maintain the spectral correlation. The advantages of our method lie in: 1) we effectively inject the spatial texture processing capabilities of the pre-trained RGB model into HSI while keeping spectral fidelity, 2) learning in the spectral-decorrelated domain can improve the generalizability to spectral-agnostic data, and 3) our inference in the eigenimage domain naturally exploits the spectral low-rank property of HSI, thereby reducing the complexity. This work bridges the gap between pre-trained RGB models and HSI via eigenimages, addressing the issue of limited HSI training data, hence the name EigenSR. Extensive experiments show that EigenSR outperforms the state-of-the-art (SOTA) methods in both spatial and spectral metrics.
△ Less
Submitted 30 December, 2024; v1 submitted 6 September, 2024;
originally announced September 2024.
-
Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram
Authors:
Jingping Nie,
Ran Liu,
Behrooz Mahasseni,
Erdrin Azemi,
Vikramjit Mitra
Abstract:
Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate est…
▽ More
Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate estimation and murmur detection. Heart rate estimates are derived using a sliding window technique on heart sound snippets, analyzed with a combination of acoustic features (Mel spectrogram, cepstral coefficients, power spectral density, root mean square energy). Our findings indicate that a 2D convolutional neural network (\textbf{\texttt{2dCNN}}) is most effective for heart rate estimation, achieving a mean absolute error (MAE) of 1.312 bpm. We systematically investigate the impact of different feature combinations and find that utilizing all four features yields the best results. The MTL model (\textbf{\texttt{2dCNN-MTL}}) achieves accuracy over 95% in murmur detection, surpassing existing models, while maintaining an MAE of 1.636 bpm in heart rate estimation, satisfying the requirements stated by Association for the Advancement of Medical Instrumentation (AAMI).
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
A multi-stage semi-supervised learning for ankle fracture classification on CT images
Authors:
Hongzhi Liu,
Guicheng Li,
Jiacheng Nie,
Hui Tang,
Chunfeng Yang,
Qianjin Feng,
Hailin Xu,
Yang Chen
Abstract:
Because of the complicated mechanism of ankle injury, it is very difficult to diagnose ankle fracture in clinic. In order to simplify the process of fracture diagnosis, an automatic diagnosis model of ankle fracture was proposed. Firstly, a tibia-fibula segmentation network is proposed for the joint tibiofibular region of the ankle joint, and the corresponding segmentation dataset is established o…
▽ More
Because of the complicated mechanism of ankle injury, it is very difficult to diagnose ankle fracture in clinic. In order to simplify the process of fracture diagnosis, an automatic diagnosis model of ankle fracture was proposed. Firstly, a tibia-fibula segmentation network is proposed for the joint tibiofibular region of the ankle joint, and the corresponding segmentation dataset is established on the basis of fracture data. Secondly, the image registration method is used to register the bone segmentation mask with the normal bone mask. Finally, a semi-supervised classifier is constructed to make full use of a large number of unlabeled data to classify ankle fractures. Experiments show that the proposed method can segment fractures with fracture lines accurately and has better performance than the general method. At the same time, this method is superior to classification network in several indexes.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Authors:
Ran Liu,
Ellen L. Zippi,
Hadi Pouransari,
Chris Sandino,
Jingping Nie,
Hanlin Goh,
Erdrin Azemi,
Ali Moin
Abstract:
Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence o…
▽ More
Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code is available at https://github.com/apple/ml-famae .
△ Less
Submitted 18 April, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Causal Disentanglement Hidden Markov Model for Fault Diagnosis
Authors:
Rihao Chang,
Yongtao Ma,
Weizhi Nie,
Jie Nie,
An-an Liu
Abstract:
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism a…
▽ More
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism and thus, capture their characteristics to achieve a more robust representation. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. The ELBO is reformulated to optimize the learning of the causal disentanglement Markov model. Moreover, to expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments. Experiments were conducted on the CWRU dataset and IMS dataset. Relevant results validate the superiority of the proposed method.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Authors:
Yihan Wu,
Xi Wang,
Shaofei Zhang,
Lei He,
Ruihua Song,
Jian-Yun Nie
Abstract:
Expressive speech synthesis, like audiobook synthesis, is still challenging for style representation learning and prediction. Deriving from reference audio or predicting style tags from text requires a huge amount of labeled data, which is costly to acquire and difficult to define and annotate accurately. In this paper, we propose a novel framework for learning style representation from abundant p…
▽ More
Expressive speech synthesis, like audiobook synthesis, is still challenging for style representation learning and prediction. Deriving from reference audio or predicting style tags from text requires a huge amount of labeled data, which is costly to acquire and difficult to define and annotate accurately. In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner. It leverages an emotion lexicon and uses contrastive learning and deep clustering. We further integrate the style representation as a conditioned embedding in a multi-style Transformer TTS. Comparing with multi-style TTS by predicting style tags trained on the same dataset but with human annotations, our method achieves improved results according to subjective evaluations on both in-domain and out-of-domain test sets in audiobook speech. Moreover, with implicit context-aware style representation, the emotion transition of synthesized audio in a long paragraph appears more natural. The audio samples are available on the demo web.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Unsupervised Alternating Optimization for Blind Hyperspectral Imagery Super-resolution
Authors:
Jiangtao Nie,
Lei Zhang,
Wei Wei,
Zhiqiang Lang,
Yanning Zhang
Abstract:
Despite the great success of deep model on Hyperspectral imagery (HSI) super-resolution(SR) for simulated data, most of them function unsatisfactory when applied to the real data, especially for unsupervised HSI SR methods. One of the main reason comes from the fact that the predefined degeneration models (e.g. blur in spatial domain) utilized by most HSI SR methods often exist great discrepancy w…
▽ More
Despite the great success of deep model on Hyperspectral imagery (HSI) super-resolution(SR) for simulated data, most of them function unsatisfactory when applied to the real data, especially for unsupervised HSI SR methods. One of the main reason comes from the fact that the predefined degeneration models (e.g. blur in spatial domain) utilized by most HSI SR methods often exist great discrepancy with the real one, which results in these deep models overfit and ultimately degrade their performance on real data. To well mitigate such a problem, we explore the unsupervised blind HSI SR method. Specifically, we investigate how to effectively obtain the degeneration models in spatial and spectral domain, respectively, and makes them can well compatible with the fusion based SR reconstruction model. To this end, we first propose an alternating optimization based deep framework to estimate the degeneration models and reconstruct the latent image, with which the degeneration models estimation and HSI reconstruction can mutually promotes each other. Then, a meta-learning based mechanism is further proposed to pre-train the network, which can effectively improve the speed and generalization ability adapting to different complex degeneration. Experiments on three benchmark HSI SR datasets report an excellent superiority of the proposed method on handling blind HSI fusion problem over other competing methods.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Martin Danelljan,
Yawei Li,
Radu Timofte,
Jie Liu,
Jie Tang,
Gangshan Wu,
Yu Zhu,
Xiangyu He,
Wenjie Xu,
Chenghua Li,
Cong Leng,
Jian Cheng,
Guangyang Wu,
Wenyi Wang,
Xiaohong Liu,
Hengyuan Zhao,
Xiangtao Kong,
Jingwen He,
Yu Qiao,
Chao Dong,
Xiaotong Luo,
Liang Chen,
Jiangtao Zhang,
Maitreya Suin
, et al. (60 additional authors not shown)
Abstract:
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co…
▽ More
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Federated Learning in the Sky: Aerial-Ground Air Quality Sensing Framework with UAV Swarms
Authors:
Yi Liu,
Jiangtian Nie,
Xuandi Li,
Syed Hassan Ahmed,
Wei Yang Bryan Lim,
Chunyan Miao
Abstract:
Due to air quality significantly affects human health, it is becoming increasingly important to accurately and timely predict the Air Quality Index (AQI). To this end, this paper proposes a new federated learning-based aerial-ground air quality sensing framework for fine-grained 3D air quality monitoring and forecasting. Specifically, in the air, this framework leverages a light-weight Dense-Mobil…
▽ More
Due to air quality significantly affects human health, it is becoming increasingly important to accurately and timely predict the Air Quality Index (AQI). To this end, this paper proposes a new federated learning-based aerial-ground air quality sensing framework for fine-grained 3D air quality monitoring and forecasting. Specifically, in the air, this framework leverages a light-weight Dense-MobileNet model to achieve energy-efficient end-to-end learning from haze features of haze images taken by Unmanned Aerial Vehicles (UAVs) for predicting AQI scale distribution. Furthermore, the Federated Learning Framework not only allows various organizations or institutions to collaboratively learn a well-trained global model to monitor AQI without compromising privacy, but also expands the scope of UAV swarms monitoring. For ground sensing systems, we propose a Graph Convolutional neural network-based Long Short-Term Memory (GC-LSTM) model to achieve accurate, real-time and future AQI inference. The GC-LSTM model utilizes the topological structure of the ground monitoring station to capture the spatio-temporal correlation of historical observation data, which helps the aerial-ground sensing system to achieve accurate AQI inference. Through extensive case studies on a real-world dataset, numerical results show that the proposed framework can achieve accurate and energy-efficient AQI sensing without compromising the privacy of raw data.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results
Authors:
Kai Zhang,
Shuhang Gu,
Radu Timofte,
Zheng Hui,
Xiumei Wang,
Xinbo Gao,
Dongliang Xiong,
Shuai Liu,
Ruipeng Gang,
Nan Nan,
Chenghua Li,
Xueyi Zou,
Ning Kang,
Zhan Wang,
Hang Xu,
Chaofeng Wang,
Zheng Li,
Linlin Wang,
Jun Shi,
Wenyu Sun,
Zhiqiang Lang,
Jiangtao Nie,
Wei Wei,
Lei Zhang,
Yazhe Niu
, et al. (4 additional authors not shown)
Abstract:
This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve…
▽ More
This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve the running time and the PSNR result, Tracks 2 and 3 aim to optimize running time and PSNR result with constrain of the other two aspects, respectively. Each track had an average of 64 registered participants, and 12 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.