Search | arXiv e-print repository

G-SEED: A Spatio-temporal Encoding Framework for Forest and Grassland Data Based on GeoSOT

Authors: Xuan Ouyang, Xinwen Yu, Yan Chen, Guang Deng, Xuanxin Liu

Abstract: In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse… ▽ More In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse data sources. To address these challenges, this paper proposes G-SEED (GeoSOT-based Scalable Encoding and Extraction for Forest and Grassland Spatio-temporal Data), a unified encoding and management framework based on the hierarchical GeoSOT (Geographical coordinate global Subdivision grid with One dimension integer on 2n tree) grid system. G-SEED integrates spatial, temporal, and type information into a composite code, enabling consistent encoding of both structured and unstructured data, including remote sensing imagery, vector maps, sensor records, documents, and multimedia content. The framework incorporates adaptive grid-level selection, center-cell-based indexing, and full-coverage grid arrays to optimize spatial querying and compression. Through extensive experiments on a real-world dataset from Shennongjia National Park (China), G-SEED demonstrates superior performance in spatial precision control, cross-source consistency, query efficiency, and compression compared to mainstream methods such as Geohash and H3. This study provides a scalable and reusable paradigm for the unified organization of forest and grassland big data, supporting dynamic monitoring and intelligent decision-making in these domains. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: 11 pages, 2 figures. Previously submitted to a non-academic conference (ICGARSA 2025) and formally withdrawn

arXiv:2505.16211 [pdf, ps, other]

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Authors: Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Shun Zhang, Xingjian Du, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Gelei Deng, Haoyang Li, Yiming Li, Xiaobin Zhuang, Tianlong Chen, Qingsong Wen, Tianwei Zhang , et al. (9 additional authors not shown)

Abstract: Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background no… ▽ More Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background noise, which can manipulate model behavior. We propose AudioTrust, a comprehensive framework for systematic evaluation of ALLM trustworthiness across audio-specific risks. AudioTrust encompasses six key dimensions: fairness, hallucination, safety, privacy, robustness, and authentication. The framework implements 26 distinct sub-tasks using a curated dataset of over 4,420 audio samples from real-world scenarios, including daily conversations, emergency calls, and voice assistant interactions. We conduct comprehensive evaluations across 18 experimental configurations using human-validated automated pipelines. Our evaluation of 14 state-of-the-art open-source and closed-source ALLMs reveals significant limitations when confronted with diverse high-risk audio scenarios, providing insights for secure deployment of audio models. Code and data are available at https://github.com/JusperLee/AudioTrust. △ Less

Submitted 30 September, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

Comments: Technical Report

arXiv:2402.01933 [pdf, other]

doi 10.1145/3678505

ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

Authors: Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

Abstract: Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"… ▽ More Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?" This paper presents ToMoBrush, a dental health sensing system that explores using off-the-shelf sonic toothbrushes for dental condition detection. Our solution leverages the fact that a sonic toothbrush produces rich acoustic signals when in contact with teeth, which contain important information about each tooth's status. ToMoBrush extracts tooth resonance signatures from the acoustic signals to characterize varied dental health conditions of the teeth. We evaluate ToMoBrush on 19 participants and dental-standard models for detecting common dental problems including caries, calculus, and food impaction, achieving a detection ROC-AUC of 0.90, 0.83, and 0.88 respectively. Interviews with dental experts validate ToMoBrush's potential in enhancing at-home dental healthcare. △ Less

Submitted 2 February, 2024; originally announced February 2024.

ACM Class: J.3; C.3; H.5.2

arXiv:2311.15584 [pdf, other]

A deep learning approach for marine snow synthesis and removal

Authors: Fernando Galetto, Guang Deng

Abstract: Marine snow, the floating particles in underwater images, severely degrades the visibility and performance of human and machine vision systems. This paper proposes a novel method to reduce the marine snow interference using deep learning techniques. We first synthesize realistic marine snow samples by training a Generative Adversarial Network (GAN) model and combine them with natural underwater im… ▽ More Marine snow, the floating particles in underwater images, severely degrades the visibility and performance of human and machine vision systems. This paper proposes a novel method to reduce the marine snow interference using deep learning techniques. We first synthesize realistic marine snow samples by training a Generative Adversarial Network (GAN) model and combine them with natural underwater images to create a paired dataset. We then train a U-Net model to perform marine snow removal as an image to image translation task. Our experiments show that the U-Net model can effectively remove both synthetic and natural marine snow with high accuracy, outperforming state-of-the-art methods such as the Median filter and its adaptive variant. We also demonstrate the robustness of our method by testing it on the MSRB dataset, which contains synthetic artifacts that our model has not seen during training. Our method is a practical and efficient solution for enhancing underwater images affected by marine snow. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2308.15742 [pdf, other]

ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers

Authors: Yi Liu, Yuekang Li, Gelei Deng, Felix Juefei-Xu, Yao Du, Cen Zhang, Chengwei Liu, Yeting Li, Lei Ma, Yang Liu

Abstract: The popularity of automatic speech recognition (ASR) systems nowadays leads to an increasing need for improving their accessibility. Handling stuttering speech is an important feature for accessible ASR systems. To improve the accessibility of ASR systems for stutterers, we need to expose and analyze the failures of ASR systems on stuttering speech. The speech datasets recorded from stutterers are… ▽ More The popularity of automatic speech recognition (ASR) systems nowadays leads to an increasing need for improving their accessibility. Handling stuttering speech is an important feature for accessible ASR systems. To improve the accessibility of ASR systems for stutterers, we need to expose and analyze the failures of ASR systems on stuttering speech. The speech datasets recorded from stutterers are not diverse enough to expose most of the failures. Furthermore, these datasets lack ground truth information about the non-stuttered text, rendering them unsuitable as comprehensive test suites. Therefore, a methodology for generating stuttering speech as test inputs to test and analyze the performance of ASR systems is needed. However, generating valid test inputs in this scenario is challenging. The reason is that although the generated test inputs should mimic how stutterers speak, they should also be diverse enough to trigger more failures. To address the challenge, we propose ASTER, a technique for automatically testing the accessibility of ASR systems. ASTER can generate valid test cases by injecting five different types of stuttering. The generated test cases can both simulate realistic stuttering speech and expose failures in ASR systems. Moreover, ASTER can further enhance the quality of the test cases with a multi-objective optimization-based seed updating algorithm. We implemented ASTER as a framework and evaluated it on four open-source ASR models and three commercial ASR systems. We conduct a comprehensive evaluation of ASTER and find that it significantly increases the word error rate, match error rate, and word information loss in the evaluated ASR systems. Additionally, our user study demonstrates that the generated stuttering audio is indistinguishable from real-world stuttering audio clips. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2306.01219 [pdf, other]

Brezinski Inverse and Geometric Product-Based Steffensen's Methods for Image Reverse Filtering

Authors: Guang Deng

Abstract: This work develops extensions of Steffensen's method to provide new tools for solving the semi-blind image reverse filtering problem. Two extensions are presented: a parametric Steffensen's method for accelerating the Mann iteration, and a family of 12 Steffensen's methods for vector variables. The development is based on Brezinski inverse and geometric product vector inverse. Variants of these me… ▽ More This work develops extensions of Steffensen's method to provide new tools for solving the semi-blind image reverse filtering problem. Two extensions are presented: a parametric Steffensen's method for accelerating the Mann iteration, and a family of 12 Steffensen's methods for vector variables. The development is based on Brezinski inverse and geometric product vector inverse. Variants of these methods are presented with adaptive parameter setting and first-order method acceleration. Implementation details, complexity, and convergence are discussed, and the proposed methods are shown to generalize existing algorithms. A comprehensive study of 108 variants of the vector Steffensen's methods is presented in the Supplementary Material. Representative results and comparison with current state-of-the-art methods demonstrate that the vector Steffensen's methods are efficient and effective tools in reversing the effects of commonly used filters in image processing. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2206.10124 [pdf, other]

Fast image reverse filters through fixed point and gradient descent acceleration

Authors: Fernando Galetto, Guang Deng

Abstract: In this paper, we study the problem of reverse image filtering. An image filter denoted g(.), which is available as a black box, produces an observation b = g(x) when provided with an input x. The problem is to estimate the original input signal x from the black box filter g(.) and the observation b. We study and re-develop state-of-the-art methods from two points of view, fixed point iteration an… ▽ More In this paper, we study the problem of reverse image filtering. An image filter denoted g(.), which is available as a black box, produces an observation b = g(x) when provided with an input x. The problem is to estimate the original input signal x from the black box filter g(.) and the observation b. We study and re-develop state-of-the-art methods from two points of view, fixed point iteration and gradient descent. We also explore the application of acceleration techniques for the two types of iterations. Through extensive experiments and comparison, we show that acceleration methods for both fixed point iteration and gradient descent help to speed up the convergence of state-of-the-art methods. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2112.04121 [pdf, other]

Reverse image filtering using total derivative approximation and accelerated gradient descent

Authors: Fernando J. Galetto, Guang Deng

Abstract: In this paper, we address a new problem of reversing the effect of an image filter, which can be linear or nonlinear. The assumption is that the algorithm of the filter is unknown and the filter is available as a black box. We formulate this inverse problem as minimizing a local patch-based cost function and use total derivative to approximate the gradient which is used in gradient descent to solv… ▽ More In this paper, we address a new problem of reversing the effect of an image filter, which can be linear or nonlinear. The assumption is that the algorithm of the filter is unknown and the filter is available as a black box. We formulate this inverse problem as minimizing a local patch-based cost function and use total derivative to approximate the gradient which is used in gradient descent to solve the problem. We analyze factors affecting the convergence and quality of the output in the Fourier domain. We also study the application of accelerated gradient descent algorithms in three gradient-free reverse filters, including the one proposed in this paper. We present results from extensive experiments to evaluate the complexity and effectiveness of the proposed algorithm. Results demonstrate that the proposed algorithm outperforms the state-of-the-art in that (1) it is at the same level of complexity as that of the fastest reverse filter, but it can reverse a larger number of filters, and (2) it can reverse the same list of filters as that of the very complex reverse filter, but its complexity is much smaller. △ Less

Submitted 13 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2108.03799 [pdf, other]

COVID-view: Diagnosis of COVID-19 using Chest CT

Authors: Shreeraj Jadhav, Gaofeng Deng, Marlene Zawin, Arie E. Kaufman

Abstract: Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from ches… ▽ More Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from chest CT data. The system incorporates a complete pipeline of automatic lungs segmentation, localization/ isolation of lung abnormalities, followed by visualization, visual and DL analysis, and measurement/quantification tools. Our system combines the traditional 2D workflow of radiologists with newer 2D and 3D visualization techniques with DL support for a more comprehensive diagnosis. COVID-view incorporates a novel DL model for classifying the patients into positive/negative COVID-19 cases, which acts as a reading aid for the radiologist using COVID-view and provides the attention heatmap as an explainable DL for the model output. We designed and evaluated COVID-view through suggestions, close feedback and conducting case studies of real-world patient data by expert radiologists who have substantial experience diagnosing chest CT scans for COVID-19, pulmonary embolism, and other forms of lung infections. We present requirements and task analysis for the diagnosis of COVID-19 that motivate our design choices and results in a practical system which is capable of handling real-world patient cases. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: 11 pages, 10 figures, accepted to IEEE VIS 2021 conference and IEEE Transactions on Visualization and Computer Graphics

arXiv:2107.14765 [pdf, other]

doi 10.1109/OJSP.2021.3063076

A guided edge-aware smoothing-sharpening filter based on patch interpolation model and generalized Gamma distribution

Authors: Guang Deng, Fernando J. Galetto, Mukhalad Al-nasrawi, Waseem Waheed

Abstract: Smoothing and sharpening are two fundamental image processing operations. The latter is usually related to the former through the unsharp masking algorithm. In this paper, we develop a new type of filter which performs smoothing or sharpening via a tuning parameter. The development of the new filter is based on (1) a new Laplacian-based filter formulation which unifies the smoothing and sharpening… ▽ More Smoothing and sharpening are two fundamental image processing operations. The latter is usually related to the former through the unsharp masking algorithm. In this paper, we develop a new type of filter which performs smoothing or sharpening via a tuning parameter. The development of the new filter is based on (1) a new Laplacian-based filter formulation which unifies the smoothing and sharpening operations, (2) a patch interpolation model similar to that used in the guided filter which provides edge-awareness capability, and (3) the generalized Gamma distribution which is used as the prior for parameter estimation. We have conducted detailed studies on the properties of two versions of the proposed filter (self-guidance and external guidance). We have also conducted experiments to demonstrate applications of the proposed filter. In the self-guidance case, we have developed adaptive smoothing and sharpening algorithms based on texture, depth and blurriness information extracted from an image. Applications include enhancing human face images, producing shallow depth of field effects, focus-based image enhancement, and seam carving. In the external guidance case, we have developed new algorithms for combining flash and no-flash images and for enhancing multi-spectral images using a panchromatic image. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 23 pages, 16 figures

Journal ref: IEEE Open Journal of Signal Processing, vol. 2, pp. 119-135, 2021

arXiv:2107.14443 [pdf, other]

Single image deep defocus estimation and its applications

Authors: Fernando J. Galetto, Guang Deng

Abstract: Depth information is useful in many image processing applications. However, since taking a picture is a process of projection of a 3D scene onto a 2D imaging sensor, the depth information is embedded in the image. Extracting the depth information from the image is a challenging task. A guiding principle is that the level of blurriness due to defocus is related to the distance between the object an… ▽ More Depth information is useful in many image processing applications. However, since taking a picture is a process of projection of a 3D scene onto a 2D imaging sensor, the depth information is embedded in the image. Extracting the depth information from the image is a challenging task. A guiding principle is that the level of blurriness due to defocus is related to the distance between the object and the focal plane. Based on this principle and the widely used assumption that Gaussian blur is a good model for defocus blur, we formulate the problem of estimating the spatially varying defocus blurriness as a Gaussian blur classification problem. We solved the problem by training a deep neural network to classify image patches into one of the 20 levels of blurriness. We have created a dataset of more than 500000 image patches of size $32\times32$ which are used to train and test several well-known network models. We find that MobileNetV2 is suitable for this application due to its low memory requirement and high accuracy. The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter. The result is a defocus map that carries the information of the degree of blurriness for each pixel. We compare the proposed method with state-of-the-art techniques and we demonstrate its successful applications in adaptive image enhancement, defocus magnification, and multi-focus image fusion. △ Less

Submitted 13 December, 2021; v1 submitted 30 July, 2021; originally announced July 2021.

arXiv:2101.00137 [pdf, other]

Coherent optical communications using coherence-cloned Kerr soliton microcombs

Authors: Yong Geng, Heng Zhou, Wenwen Cui, Xinjie Han, Qiang Zhang, Boyuan Liu, Guangwei Deng, Qiang Zhou, Kun Qiu

Abstract: Dissipative Kerr soliton microcomb has been recognized as a promising on-chip multi-wavelength laser source for fiber optical communications, as its comb lines possess frequency and phase stability far beyond independent lasers. In the scenarios of coherent optical transmission and interconnect, a highly beneficial but rarely explored target is to re-generate a Kerr soliton microcomb at the receiv… ▽ More Dissipative Kerr soliton microcomb has been recognized as a promising on-chip multi-wavelength laser source for fiber optical communications, as its comb lines possess frequency and phase stability far beyond independent lasers. In the scenarios of coherent optical transmission and interconnect, a highly beneficial but rarely explored target is to re-generate a Kerr soliton microcomb at the receiver side as local oscillators that conserve the frequency and phase property of the incoming data carriers, so that to enable coherent detection with minimized optical and electrical compensations. Here, by using the techniques of pump laser conveying and two-point locking, we implement re-generation of a Kerr soliton microcomb that faithfully clones the frequency and phase coherence of another microcomb sent from 50 km away. Moreover, leveraging the coherence-cloned soliton microcombs as carriers and local oscillators, we demonstrate terabit coherent data interconnect, wherein traditional digital processes for frequency offset estimation is totally dispensed with, and carrier phase estimation is substantially simplified via slowed-down phase estimation rate per channel and joint phase estimation among multiple channels. Our work reveals that, in addition to providing a multitude of laser tones, regulating the frequency and phase of Kerr soliton microcombs among transmitters and receivers can significantly improve coherent communication in terms of performance, power consumption, and simplicity. △ Less

Submitted 31 December, 2020; originally announced January 2021.

arXiv:2006.10216 [pdf, other]

Generating Fundus Fluorescence Angiography Images from Structure Fundus Images Using Generative Adversarial Networks

Authors: Wanyue Li, Wen Kong, Yiwei Chen, Jing Wang, Yi He, Guohua Shi, Guohua Deng

Abstract: Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients. To help physicians reduce the potential risks of diagnosis, an image translation method is adopted. In this work, we proposed a conditional generative adversarial network(GAN) - based method t… ▽ More Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients. To help physicians reduce the potential risks of diagnosis, an image translation method is adopted. In this work, we proposed a conditional generative adversarial network(GAN) - based method to directly learn the mapping relationship between structure fundus images and fundus fluorescence angiography images. Moreover, local saliency maps, which define each pixel's importance, are used to define a novel saliency loss in the GAN cost function. This facilitates more accurate learning of small-vessel and fluorescein leakage features. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 16 pages, 6 figures, accepted by Medical Imaging on Deep Learning

Showing 1–13 of 13 results for author: Deng, G