-
G-SEED: A Spatio-temporal Encoding Framework for Forest and Grassland Data Based on GeoSOT
Authors:
Xuan Ouyang,
Xinwen Yu,
Yan Chen,
Guang Deng,
Xuanxin Liu
Abstract:
In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse…
▽ More
In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse data sources. To address these challenges, this paper proposes G-SEED (GeoSOT-based Scalable Encoding and Extraction for Forest and Grassland Spatio-temporal Data), a unified encoding and management framework based on the hierarchical GeoSOT (Geographical coordinate global Subdivision grid with One dimension integer on 2n tree) grid system. G-SEED integrates spatial, temporal, and type information into a composite code, enabling consistent encoding of both structured and unstructured data, including remote sensing imagery, vector maps, sensor records, documents, and multimedia content. The framework incorporates adaptive grid-level selection, center-cell-based indexing, and full-coverage grid arrays to optimize spatial querying and compression. Through extensive experiments on a real-world dataset from Shennongjia National Park (China), G-SEED demonstrates superior performance in spatial precision control, cross-source consistency, query efficiency, and compression compared to mainstream methods such as Geohash and H3. This study provides a scalable and reusable paradigm for the unified organization of forest and grassland big data, supporting dynamic monitoring and intelligent decision-making in these domains.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Authors:
Kai Li,
Can Shen,
Yile Liu,
Jirui Han,
Kelong Zheng,
Xuechao Zou,
Zhe Wang,
Shun Zhang,
Xingjian Du,
Hanjun Luo,
Yingbin Jin,
Xinxin Xing,
Ziyang Ma,
Yue Liu,
Yifan Zhang,
Junfeng Fang,
Kun Wang,
Yibo Yan,
Gelei Deng,
Haoyang Li,
Yiming Li,
Xiaobin Zhuang,
Tianlong Chen,
Qingsong Wen,
Tianwei Zhang
, et al. (9 additional authors not shown)
Abstract:
Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background no…
▽ More
Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background noise, which can manipulate model behavior. We propose AudioTrust, a comprehensive framework for systematic evaluation of ALLM trustworthiness across audio-specific risks. AudioTrust encompasses six key dimensions: fairness, hallucination, safety, privacy, robustness, and authentication. The framework implements 26 distinct sub-tasks using a curated dataset of over 4,420 audio samples from real-world scenarios, including daily conversations, emergency calls, and voice assistant interactions. We conduct comprehensive evaluations across 18 experimental configurations using human-validated automated pipelines. Our evaluation of 14 state-of-the-art open-source and closed-source ALLMs reveals significant limitations when confronted with diverse high-risk audio scenarios, providing insights for secure deployment of audio models. Code and data are available at https://github.com/JusperLee/AudioTrust.
△ Less
Submitted 30 September, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush
Authors:
Kuang Yuan,
Mohamed Ibrahim,
Yiwen Song,
Guoxiang Deng,
Suvendra Vijayan,
Robert Nerone,
Akshay Gadre,
Swarun Kumar
Abstract:
Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"…
▽ More
Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"
This paper presents ToMoBrush, a dental health sensing system that explores using off-the-shelf sonic toothbrushes for dental condition detection. Our solution leverages the fact that a sonic toothbrush produces rich acoustic signals when in contact with teeth, which contain important information about each tooth's status. ToMoBrush extracts tooth resonance signatures from the acoustic signals to characterize varied dental health conditions of the teeth. We evaluate ToMoBrush on 19 participants and dental-standard models for detecting common dental problems including caries, calculus, and food impaction, achieving a detection ROC-AUC of 0.90, 0.83, and 0.88 respectively. Interviews with dental experts validate ToMoBrush's potential in enhancing at-home dental healthcare.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
A deep learning approach for marine snow synthesis and removal
Authors:
Fernando Galetto,
Guang Deng
Abstract:
Marine snow, the floating particles in underwater images, severely degrades the visibility and performance of human and machine vision systems. This paper proposes a novel method to reduce the marine snow interference using deep learning techniques. We first synthesize realistic marine snow samples by training a Generative Adversarial Network (GAN) model and combine them with natural underwater im…
▽ More
Marine snow, the floating particles in underwater images, severely degrades the visibility and performance of human and machine vision systems. This paper proposes a novel method to reduce the marine snow interference using deep learning techniques. We first synthesize realistic marine snow samples by training a Generative Adversarial Network (GAN) model and combine them with natural underwater images to create a paired dataset. We then train a U-Net model to perform marine snow removal as an image to image translation task. Our experiments show that the U-Net model can effectively remove both synthetic and natural marine snow with high accuracy, outperforming state-of-the-art methods such as the Median filter and its adaptive variant. We also demonstrate the robustness of our method by testing it on the MSRB dataset, which contains synthetic artifacts that our model has not seen during training. Our method is a practical and efficient solution for enhancing underwater images affected by marine snow.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers
Authors:
Yi Liu,
Yuekang Li,
Gelei Deng,
Felix Juefei-Xu,
Yao Du,
Cen Zhang,
Chengwei Liu,
Yeting Li,
Lei Ma,
Yang Liu
Abstract:
The popularity of automatic speech recognition (ASR) systems nowadays leads to an increasing need for improving their accessibility. Handling stuttering speech is an important feature for accessible ASR systems. To improve the accessibility of ASR systems for stutterers, we need to expose and analyze the failures of ASR systems on stuttering speech. The speech datasets recorded from stutterers are…
▽ More
The popularity of automatic speech recognition (ASR) systems nowadays leads to an increasing need for improving their accessibility. Handling stuttering speech is an important feature for accessible ASR systems. To improve the accessibility of ASR systems for stutterers, we need to expose and analyze the failures of ASR systems on stuttering speech. The speech datasets recorded from stutterers are not diverse enough to expose most of the failures. Furthermore, these datasets lack ground truth information about the non-stuttered text, rendering them unsuitable as comprehensive test suites. Therefore, a methodology for generating stuttering speech as test inputs to test and analyze the performance of ASR systems is needed. However, generating valid test inputs in this scenario is challenging. The reason is that although the generated test inputs should mimic how stutterers speak, they should also be diverse enough to trigger more failures. To address the challenge, we propose ASTER, a technique for automatically testing the accessibility of ASR systems. ASTER can generate valid test cases by injecting five different types of stuttering. The generated test cases can both simulate realistic stuttering speech and expose failures in ASR systems. Moreover, ASTER can further enhance the quality of the test cases with a multi-objective optimization-based seed updating algorithm. We implemented ASTER as a framework and evaluated it on four open-source ASR models and three commercial ASR systems. We conduct a comprehensive evaluation of ASTER and find that it significantly increases the word error rate, match error rate, and word information loss in the evaluated ASR systems. Additionally, our user study demonstrates that the generated stuttering audio is indistinguishable from real-world stuttering audio clips.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Brezinski Inverse and Geometric Product-Based Steffensen's Methods for Image Reverse Filtering
Authors:
Guang Deng
Abstract:
This work develops extensions of Steffensen's method to provide new tools for solving the semi-blind image reverse filtering problem. Two extensions are presented: a parametric Steffensen's method for accelerating the Mann iteration, and a family of 12 Steffensen's methods for vector variables. The development is based on Brezinski inverse and geometric product vector inverse. Variants of these me…
▽ More
This work develops extensions of Steffensen's method to provide new tools for solving the semi-blind image reverse filtering problem. Two extensions are presented: a parametric Steffensen's method for accelerating the Mann iteration, and a family of 12 Steffensen's methods for vector variables. The development is based on Brezinski inverse and geometric product vector inverse. Variants of these methods are presented with adaptive parameter setting and first-order method acceleration. Implementation details, complexity, and convergence are discussed, and the proposed methods are shown to generalize existing algorithms. A comprehensive study of 108 variants of the vector Steffensen's methods is presented in the Supplementary Material. Representative results and comparison with current state-of-the-art methods demonstrate that the vector Steffensen's methods are efficient and effective tools in reversing the effects of commonly used filters in image processing.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Fast image reverse filters through fixed point and gradient descent acceleration
Authors:
Fernando Galetto,
Guang Deng
Abstract:
In this paper, we study the problem of reverse image filtering. An image filter denoted g(.), which is available as a black box, produces an observation b = g(x) when provided with an input x. The problem is to estimate the original input signal x from the black box filter g(.) and the observation b. We study and re-develop state-of-the-art methods from two points of view, fixed point iteration an…
▽ More
In this paper, we study the problem of reverse image filtering. An image filter denoted g(.), which is available as a black box, produces an observation b = g(x) when provided with an input x. The problem is to estimate the original input signal x from the black box filter g(.) and the observation b. We study and re-develop state-of-the-art methods from two points of view, fixed point iteration and gradient descent. We also explore the application of acceleration techniques for the two types of iterations. Through extensive experiments and comparison, we show that acceleration methods for both fixed point iteration and gradient descent help to speed up the convergence of state-of-the-art methods.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Reverse image filtering using total derivative approximation and accelerated gradient descent
Authors:
Fernando J. Galetto,
Guang Deng
Abstract:
In this paper, we address a new problem of reversing the effect of an image filter, which can be linear or nonlinear. The assumption is that the algorithm of the filter is unknown and the filter is available as a black box. We formulate this inverse problem as minimizing a local patch-based cost function and use total derivative to approximate the gradient which is used in gradient descent to solv…
▽ More
In this paper, we address a new problem of reversing the effect of an image filter, which can be linear or nonlinear. The assumption is that the algorithm of the filter is unknown and the filter is available as a black box. We formulate this inverse problem as minimizing a local patch-based cost function and use total derivative to approximate the gradient which is used in gradient descent to solve the problem. We analyze factors affecting the convergence and quality of the output in the Fourier domain. We also study the application of accelerated gradient descent algorithms in three gradient-free reverse filters, including the one proposed in this paper. We present results from extensive experiments to evaluate the complexity and effectiveness of the proposed algorithm. Results demonstrate that the proposed algorithm outperforms the state-of-the-art in that (1) it is at the same level of complexity as that of the fastest reverse filter, but it can reverse a larger number of filters, and (2) it can reverse the same list of filters as that of the very complex reverse filter, but its complexity is much smaller.
△ Less
Submitted 13 December, 2021; v1 submitted 8 December, 2021;
originally announced December 2021.
-
COVID-view: Diagnosis of COVID-19 using Chest CT
Authors:
Shreeraj Jadhav,
Gaofeng Deng,
Marlene Zawin,
Arie E. Kaufman
Abstract:
Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from ches…
▽ More
Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from chest CT data. The system incorporates a complete pipeline of automatic lungs segmentation, localization/ isolation of lung abnormalities, followed by visualization, visual and DL analysis, and measurement/quantification tools. Our system combines the traditional 2D workflow of radiologists with newer 2D and 3D visualization techniques with DL support for a more comprehensive diagnosis. COVID-view incorporates a novel DL model for classifying the patients into positive/negative COVID-19 cases, which acts as a reading aid for the radiologist using COVID-view and provides the attention heatmap as an explainable DL for the model output. We designed and evaluated COVID-view through suggestions, close feedback and conducting case studies of real-world patient data by expert radiologists who have substantial experience diagnosing chest CT scans for COVID-19, pulmonary embolism, and other forms of lung infections. We present requirements and task analysis for the diagnosis of COVID-19 that motivate our design choices and results in a practical system which is capable of handling real-world patient cases.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
A guided edge-aware smoothing-sharpening filter based on patch interpolation model and generalized Gamma distribution
Authors:
Guang Deng,
Fernando J. Galetto,
Mukhalad Al-nasrawi,
Waseem Waheed
Abstract:
Smoothing and sharpening are two fundamental image processing operations. The latter is usually related to the former through the unsharp masking algorithm. In this paper, we develop a new type of filter which performs smoothing or sharpening via a tuning parameter. The development of the new filter is based on (1) a new Laplacian-based filter formulation which unifies the smoothing and sharpening…
▽ More
Smoothing and sharpening are two fundamental image processing operations. The latter is usually related to the former through the unsharp masking algorithm. In this paper, we develop a new type of filter which performs smoothing or sharpening via a tuning parameter. The development of the new filter is based on (1) a new Laplacian-based filter formulation which unifies the smoothing and sharpening operations, (2) a patch interpolation model similar to that used in the guided filter which provides edge-awareness capability, and (3) the generalized Gamma distribution which is used as the prior for parameter estimation. We have conducted detailed studies on the properties of two versions of the proposed filter (self-guidance and external guidance). We have also conducted experiments to demonstrate applications of the proposed filter. In the self-guidance case, we have developed adaptive smoothing and sharpening algorithms based on texture, depth and blurriness information extracted from an image. Applications include enhancing human face images, producing shallow depth of field effects, focus-based image enhancement, and seam carving. In the external guidance case, we have developed new algorithms for combining flash and no-flash images and for enhancing multi-spectral images using a panchromatic image.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Single image deep defocus estimation and its applications
Authors:
Fernando J. Galetto,
Guang Deng
Abstract:
Depth information is useful in many image processing applications. However, since taking a picture is a process of projection of a 3D scene onto a 2D imaging sensor, the depth information is embedded in the image. Extracting the depth information from the image is a challenging task. A guiding principle is that the level of blurriness due to defocus is related to the distance between the object an…
▽ More
Depth information is useful in many image processing applications. However, since taking a picture is a process of projection of a 3D scene onto a 2D imaging sensor, the depth information is embedded in the image. Extracting the depth information from the image is a challenging task. A guiding principle is that the level of blurriness due to defocus is related to the distance between the object and the focal plane. Based on this principle and the widely used assumption that Gaussian blur is a good model for defocus blur, we formulate the problem of estimating the spatially varying defocus blurriness as a Gaussian blur classification problem. We solved the problem by training a deep neural network to classify image patches into one of the 20 levels of blurriness. We have created a dataset of more than 500000 image patches of size $32\times32$ which are used to train and test several well-known network models. We find that MobileNetV2 is suitable for this application due to its low memory requirement and high accuracy. The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter. The result is a defocus map that carries the information of the degree of blurriness for each pixel. We compare the proposed method with state-of-the-art techniques and we demonstrate its successful applications in adaptive image enhancement, defocus magnification, and multi-focus image fusion.
△ Less
Submitted 13 December, 2021; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Coherent optical communications using coherence-cloned Kerr soliton microcombs
Authors:
Yong Geng,
Heng Zhou,
Wenwen Cui,
Xinjie Han,
Qiang Zhang,
Boyuan Liu,
Guangwei Deng,
Qiang Zhou,
Kun Qiu
Abstract:
Dissipative Kerr soliton microcomb has been recognized as a promising on-chip multi-wavelength laser source for fiber optical communications, as its comb lines possess frequency and phase stability far beyond independent lasers. In the scenarios of coherent optical transmission and interconnect, a highly beneficial but rarely explored target is to re-generate a Kerr soliton microcomb at the receiv…
▽ More
Dissipative Kerr soliton microcomb has been recognized as a promising on-chip multi-wavelength laser source for fiber optical communications, as its comb lines possess frequency and phase stability far beyond independent lasers. In the scenarios of coherent optical transmission and interconnect, a highly beneficial but rarely explored target is to re-generate a Kerr soliton microcomb at the receiver side as local oscillators that conserve the frequency and phase property of the incoming data carriers, so that to enable coherent detection with minimized optical and electrical compensations. Here, by using the techniques of pump laser conveying and two-point locking, we implement re-generation of a Kerr soliton microcomb that faithfully clones the frequency and phase coherence of another microcomb sent from 50 km away. Moreover, leveraging the coherence-cloned soliton microcombs as carriers and local oscillators, we demonstrate terabit coherent data interconnect, wherein traditional digital processes for frequency offset estimation is totally dispensed with, and carrier phase estimation is substantially simplified via slowed-down phase estimation rate per channel and joint phase estimation among multiple channels. Our work reveals that, in addition to providing a multitude of laser tones, regulating the frequency and phase of Kerr soliton microcombs among transmitters and receivers can significantly improve coherent communication in terms of performance, power consumption, and simplicity.
△ Less
Submitted 31 December, 2020;
originally announced January 2021.
-
Generating Fundus Fluorescence Angiography Images from Structure Fundus Images Using Generative Adversarial Networks
Authors:
Wanyue Li,
Wen Kong,
Yiwei Chen,
Jing Wang,
Yi He,
Guohua Shi,
Guohua Deng
Abstract:
Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients. To help physicians reduce the potential risks of diagnosis, an image translation method is adopted. In this work, we proposed a conditional generative adversarial network(GAN) - based method t…
▽ More
Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients. To help physicians reduce the potential risks of diagnosis, an image translation method is adopted. In this work, we proposed a conditional generative adversarial network(GAN) - based method to directly learn the mapping relationship between structure fundus images and fundus fluorescence angiography images. Moreover, local saliency maps, which define each pixel's importance, are used to define a novel saliency loss in the GAN cost function. This facilitates more accurate learning of small-vessel and fluorescein leakage features.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.