Search | arXiv e-print repository

FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding

Authors: Chenlu Zhan, Gaoang Wang, Hongwei Wang

Abstract: Semantic querying in complex 3D scenes through free-form language presents a significant challenge. Existing 3D scene understanding methods use large-scale training data and CLIP to align text queries with 3D semantic features. However, their reliance on predefined vocabulary priors from training data hinders free-form semantic querying. Besides, recent advanced methods rely on LLMs for scene unde… ▽ More Semantic querying in complex 3D scenes through free-form language presents a significant challenge. Existing 3D scene understanding methods use large-scale training data and CLIP to align text queries with 3D semantic features. However, their reliance on predefined vocabulary priors from training data hinders free-form semantic querying. Besides, recent advanced methods rely on LLMs for scene understanding but lack comprehensive 3D scene-level information and often overlook the potential inconsistencies in LLM-generated outputs. In our paper, we propose FreeQ-Graph, which enables Free-form Querying with a semantic consistent scene Graph for 3D scene understanding. The core idea is to encode free-form queries from a complete and accurate 3D scene graph without predefined vocabularies, and to align them with 3D consistent semantic labels, which accomplished through three key steps. We initiate by constructing a complete and accurate 3D scene graph that maps free-form objects and their relations through LLM and LVLM guidance, entirely free from training data or predefined priors. Most importantly, we align graph nodes with accurate semantic labels by leveraging 3D semantic aligned features from merged superpoints, enhancing 3D semantic consistency. To enable free-form semantic querying, we then design an LLM-based reasoning algorithm that combines scene-level and object-level information to intricate reasoning. We conducted extensive experiments on 3D semantic grounding, segmentation, and complex querying tasks, while also validating the accuracy of graph generation. Experiments on 6 datasets show that our model excels in both complex free-form semantic queries and intricate relational reasoning. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.06822 [pdf, ps, other]

Hi-LSplat: Hierarchical 3D Language Gaussian Splatting

Authors: Chenlu Zhan, Yufei Zhang, Gaoang Wang, Hongwei Wang

Abstract: Modeling 3D language fields with Gaussian Splatting for open-ended language queries has recently garnered increasing attention. However, recent 3DGS-based models leverage view-dependent 2D foundation models to refine 3D semantics but lack a unified 3D representation, leading to view inconsistencies. Additionally, inherent open-vocabulary challenges cause inconsistencies in object and relational de… ▽ More Modeling 3D language fields with Gaussian Splatting for open-ended language queries has recently garnered increasing attention. However, recent 3DGS-based models leverage view-dependent 2D foundation models to refine 3D semantics but lack a unified 3D representation, leading to view inconsistencies. Additionally, inherent open-vocabulary challenges cause inconsistencies in object and relational descriptions, impeding hierarchical semantic understanding. In this paper, we propose Hi-LSplat, a view-consistent Hierarchical Language Gaussian Splatting work for 3D open-vocabulary querying. To achieve view-consistent 3D hierarchical semantics, we first lift 2D features to 3D features by constructing a 3D hierarchical semantic tree with layered instance clustering, which addresses the view inconsistency issue caused by 2D semantic features. Besides, we introduce instance-wise and part-wise contrastive losses to capture all-sided hierarchical semantic representations. Notably, we construct two hierarchical semantic datasets to better assess the model's ability to distinguish different semantic levels. Extensive experiments highlight our method's superiority in 3D open-vocabulary segmentation and localization. Its strong performance on hierarchical semantic datasets underscores its ability to capture complex hierarchical semantics within 3D scenes. △ Less

Submitted 7 June, 2025; originally announced June 2025.

arXiv:2504.16834 [pdf]

Improving Significant Wave Height Prediction Using Chronos Models

Authors: Yilin Zhai, Hongyuan Shi, Chao Zhan, Qing Wang, Zaijin You, Nan Wang

Abstract: Accurate wave height prediction is critical for maritime safety and coastal resilience, yet conventional physics-based models and traditional machine learning methods face challenges in computational efficiency and nonlinear dynamics modeling. This study introduces Chronos, the first implementation of a large language model (LLM)-powered temporal architecture (Chronos) optimized for wave forecasti… ▽ More Accurate wave height prediction is critical for maritime safety and coastal resilience, yet conventional physics-based models and traditional machine learning methods face challenges in computational efficiency and nonlinear dynamics modeling. This study introduces Chronos, the first implementation of a large language model (LLM)-powered temporal architecture (Chronos) optimized for wave forecasting. Through advanced temporal pattern recognition applied to historical wave data from three strategically chosen marine zones in the Northwest Pacific basin, our framework achieves multimodal improvements: (1) 14.3% reduction in training time with 2.5x faster inference speed compared to PatchTST baselines, achieving 0.575 mean absolute scaled error (MASE) units; (2) superior short-term forecasting (1-24h) across comprehensive metrics; (3) sustained predictive leadership in extended-range forecasts (1-120h); and (4) demonstrated zero-shot capability maintaining median performance (rank 4/12) against specialized operational models. This LLM-enhanced temporal modeling paradigm establishes a new standard in wave prediction, offering both computationally efficient solutions and a transferable framework for complex geophysical systems modeling. △ Less

Submitted 25 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

Comments: arXiv admin note: text overlap with arXiv:2403.07815 by other authors

arXiv:2503.07956 [pdf, other]

EFPC: Towards Efficient and Flexible Prompt Compression

Authors: Yun-Hao Cao, Yangsong Wang, Shuzheng Hao, Zhenxing Li, Chengjun Zhan, Sichao Liu, Yi-Qi Hu

Abstract: The emergence of large language models (LLMs) like GPT-4 has revolutionized natural language processing (NLP), enabling diverse, complex tasks. However, extensive token counts lead to high computational and financial burdens. To address this, we propose Efficient and Flexible Prompt Compression (EFPC), a novel method unifying task-aware and task-agnostic compression for a favorable accuracy-effici… ▽ More The emergence of large language models (LLMs) like GPT-4 has revolutionized natural language processing (NLP), enabling diverse, complex tasks. However, extensive token counts lead to high computational and financial burdens. To address this, we propose Efficient and Flexible Prompt Compression (EFPC), a novel method unifying task-aware and task-agnostic compression for a favorable accuracy-efficiency trade-off. EFPC uses GPT-4 to generate compressed prompts and integrates them with original prompts for training. During training and inference, we selectively prepend user instructions and compress prompts based on predicted probabilities. EFPC is highly data-efficient, achieving significant performance with minimal data. Compared to the state-of-the-art method LLMLingua-2, EFPC achieves a 4.8% relative improvement in F1-score with 1% additional data at a 4x compression rate, and an 11.4% gain with 10% additional data on the LongBench single-doc QA benchmark. EFPC's unified framework supports broad applicability and enhances performance across various models, tasks, and domains, offering a practical advancement in NLP. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 10 pages, 6 figures

arXiv:2502.01964 [pdf, other]

Design and Simulation of the Adaptive Continuous Entanglement Generation Protocol

Authors: Caitao Zhan, Joaquin Chung, Allen Zang, Alexander Kolar, Rajkumar Kettimuthu

Abstract: Generating and distributing remote entangled pairs (EPs) is a primary function of quantum networks, as entanglement is the fundamental resource for key quantum network applications. A critical performance metric for quantum networks is the time-to-serve (TTS) for users' EP requests, which is the time to distribute EPs between the requested nodes. Minimizing the TTS is essential given the limited q… ▽ More Generating and distributing remote entangled pairs (EPs) is a primary function of quantum networks, as entanglement is the fundamental resource for key quantum network applications. A critical performance metric for quantum networks is the time-to-serve (TTS) for users' EP requests, which is the time to distribute EPs between the requested nodes. Minimizing the TTS is essential given the limited qubit coherence time. In this paper, we study the Adaptive Continuous entanglement generation Protocol (ACP), which enables quantum network nodes to continuously generate EPs with their neighbors, while adaptively selecting the neighbors to optimize TTS. Meanwhile, entanglement purification is used to mitigate decoherence in pre-generated EPs prior to the arrival of user requests. We extend the SeQUeNCe simulator to fully implement ACP and conduct extensive simulations across various network scales. Our results show that ACP reduces TTS by up to 94% and increases entanglement fidelity by up to 0.05. △ Less

Submitted 16 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

Comments: 8 pages, 10 figures, accepted at QCNC 2025

arXiv:2501.11102 [pdf, other]

RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering

Authors: Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang

Abstract: Efficiently synthesizing novel views from sparse inputs while maintaining accuracy remains a critical challenge in 3D reconstruction. While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and impressive efficiency with dense view inputs, they suffer from significant geometric reconstruction errors when applied to sparse input views. Moreover, although r… ▽ More Efficiently synthesizing novel views from sparse inputs while maintaining accuracy remains a critical challenge in 3D reconstruction. While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and impressive efficiency with dense view inputs, they suffer from significant geometric reconstruction errors when applied to sparse input views. Moreover, although recent methods leverage monocular depth estimation to enhance geometric learning, their dependence on single-view estimated depth often leads to view inconsistency issues across different viewpoints. Consequently, this reliance on absolute depth can introduce inaccuracies in geometric information, ultimately compromising the quality of scene reconstruction with Gaussian splats. In this paper, we present RDG-GS, a novel sparse-view 3D rendering framework with Relative Depth Guidance based on 3D Gaussian Splatting. The core innovation lies in utilizing relative depth guidance to refine the Gaussian field, steering it towards view-consistent spatial geometric representations, thereby enabling the reconstruction of accurate geometric structures and capturing intricate textures. First, we devise refined depth priors to rectify the coarse estimated depth and insert global and fine-grained scene information to regular Gaussians. Building on this, to address spatial geometric inaccuracies from absolute depth, we propose relative depth guidance by optimizing the similarity between spatially correlated patches of depth and images. Additionally, we also directly deal with the sparse areas challenging to converge by the adaptive sampling for quick densification. Across extensive experiments on Mip-NeRF360, LLFF, DTU, and Blender, RDG-GS demonstrates state-of-the-art rendering quality and efficiency, making a significant advancement for real-world application. △ Less

Submitted 19 January, 2025; originally announced January 2025.

Comments: 24 pages, 12 figures

arXiv:2411.17372 [pdf, other]

Epidemiology-informed Graph Neural Network for Heterogeneity-aware Epidemic Forecasting

Authors: Yufan Zheng, Wei Jiang, Alexander Zhou, Nguyen Quoc Viet Hung, Choujun Zhan, Tong Chen

Abstract: Among various spatio-temporal prediction tasks, epidemic forecasting plays a critical role in public health management. Recent studies have demonstrated the strong potential of spatio-temporal graph neural networks (STGNNs) in extracting heterogeneous spatio-temporal patterns for epidemic forecasting. However, most of these methods bear an over-simplified assumption that two locations (e.g., citie… ▽ More Among various spatio-temporal prediction tasks, epidemic forecasting plays a critical role in public health management. Recent studies have demonstrated the strong potential of spatio-temporal graph neural networks (STGNNs) in extracting heterogeneous spatio-temporal patterns for epidemic forecasting. However, most of these methods bear an over-simplified assumption that two locations (e.g., cities) with similar observed features in previous time steps will develop similar infection numbers in the future. In fact, for any epidemic disease, there exists strong heterogeneity of its intrinsic evolution mechanisms across geolocation and time, which can eventually lead to diverged infection numbers in two ``similar'' locations. However, such mechanistic heterogeneity is non-trivial to be captured due to the existence of numerous influencing factors like medical resource accessibility, virus mutations, mobility patterns, etc., most of which are spatio-temporal yet unreachable or even unobservable. To address this challenge, we propose a Heterogeneous Epidemic-Aware Transmission Graph Neural Network (HeatGNN), a novel epidemic forecasting framework. By binding the epidemiology mechanistic model into a GNN, HeatGNN learns epidemiology-informed location embeddings of different locations that reflect their own transmission mechanisms over time. With the time-varying mechanistic affinity graphs computed with the epidemiology-informed location embeddings, a heterogeneous transmission graph network is designed to encode the mechanistic heterogeneity among locations, providing additional predictive signals to facilitate accurate forecasting. Experiments on three benchmark datasets have revealed that HeatGNN outperforms various strong baselines. Moreover, our efficiency analysis verifies the real-world practicality of HeatGNN on datasets of different sizes. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: 14 pages, 6 figures, 3 tables

arXiv:2411.11031 [pdf, other]

Simulation of Entanglement-Enabled Connectivity in QLANs using SeQUeNCe

Authors: Francesco Mazza, Caitao Zhan, Joaquin Chung, Rajkumar Kettimuthu, Marcello Caleffi, Angela Sara Cacciapuoti

Abstract: Quantum Local Area Networks (QLANs) represent a promising building block for larger scale quantum networks with the ambitious goal -- in a long time horizon -- of realizing a Quantum Internet. Surprisingly, the physical topology of a QLAN can be enriched by a set of artificial links, enabled by shared multipartite entangled states among the nodes of the network. This novel concept of artificial to… ▽ More Quantum Local Area Networks (QLANs) represent a promising building block for larger scale quantum networks with the ambitious goal -- in a long time horizon -- of realizing a Quantum Internet. Surprisingly, the physical topology of a QLAN can be enriched by a set of artificial links, enabled by shared multipartite entangled states among the nodes of the network. This novel concept of artificial topology revolutionizes the possibilities of connectivity within the local network, enabling an on-demand manipulation of the artificial network topology. In this paper, we discuss the implementation of the QLAN model in SeQUeNCe, a discrete-event simulator of quantum networks. Specifically, we provide an analysis of how network nodes interact, with an emphasis on the interplay between quantum operations and classical signaling within the network. Remarkably, through the modeling of a measurement protocol and a correction protocol, our QLAN model implementation enables the simulation of the manipulation process of a shared entangled quantum state, and the subsequent engineering of the entanglement-based connectivity. Our simulations demonstrate how to obtain different virtual topologies with different manipulations of the shared resources and with all the possible measurement outcomes, with an arbitrary number of nodes within the network. △ Less

Submitted 17 November, 2024; originally announced November 2024.

arXiv:2410.10122 [pdf, other]

MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling

Authors: Yue Zhang, Zhizhou Zhong, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou

Abstract: Real-time video dubbing that preserves identity consistency while achieving accurate lip synchronization remains a critical challenge. Existing approaches face a trilemma: diffusion-based methods achieve high visual fidelity but suffer from prohibitive computational costs, while GAN-based solutions sacrifice lip-sync accuracy or dental details for real-time performance. We present MuseTalk, a nove… ▽ More Real-time video dubbing that preserves identity consistency while achieving accurate lip synchronization remains a critical challenge. Existing approaches face a trilemma: diffusion-based methods achieve high visual fidelity but suffer from prohibitive computational costs, while GAN-based solutions sacrifice lip-sync accuracy or dental details for real-time performance. We present MuseTalk, a novel two-stage training framework that resolves this trade-off through latent space optimization and spatio-temporal data sampling strategy. Our key innovations include: (1) During the Facial Abstract Pretraining stage, we propose Informative Frame Sampling to temporally align reference-source pose pairs, eliminating redundant feature interference while preserving identity cues. (2) In the Lip-Sync Adversarial Finetuning stage, we employ Dynamic Margin Sampling to spatially select the most suitable lip-movement-promoting regions, balancing audio-visual synchronization and dental clarity. (3) MuseTalk establishes an effective audio-visual feature fusion framework in the latent space, delivering 30 FPS output at 256*256 resolution on an NVIDIA V100 GPU. Extensive experiments demonstrate that MuseTalk outperforms state-of-the-art methods in visual fidelity while achieving comparable lip-sync accuracy. %The codes and models will be made publicly available upon acceptance. The code is made available at \href{https://github.com/TMElyralab/MuseTalk}{https://github.com/TMElyralab/MuseTalk} △ Less

Submitted 26 March, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

Comments: 15 pages, 4 figures

Report number: RV-10-16

arXiv:2410.02644 [pdf, ps, other]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Authors: Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang

Abstract: Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive frame… ▽ More Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses across 13 LLM backbones. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30\%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. We also introduce a new metric to evaluate the agents' capability to balance utility and security. Our code can be found at https://github.com/agiresearch/ASB. △ Less

Submitted 29 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: Accepted by ICLR 2025

arXiv:2409.17480 [pdf, other]

What Would Happen Next? Predicting Consequences from An Event Causality Graph

Authors: Chuanhong Zhan, Wei Xiang, Chao Liang, Bang Wang

Abstract: Existing script event prediction task forcasts the subsequent event based on an event script chain. However, the evolution of historical events are more complicated in real world scenarios and the limited information provided by the event script chain also make it difficult to accurately predict subsequent events. This paper introduces a Causality Graph Event Prediction(CGEP) task that forecasting… ▽ More Existing script event prediction task forcasts the subsequent event based on an event script chain. However, the evolution of historical events are more complicated in real world scenarios and the limited information provided by the event script chain also make it difficult to accurately predict subsequent events. This paper introduces a Causality Graph Event Prediction(CGEP) task that forecasting consequential event based on an Event Causality Graph (ECG). We propose a Semantic Enhanced Distance-sensitive Graph Prompt Learning (SeDGPL) Model for the CGEP task. In SeDGPL, (1) we design a Distance-sensitive Graph Linearization (DsGL) module to reformulate the ECG into a graph prompt template as the input of a PLM; (2) propose an Event-Enriched Causality Encoding (EeCE) module to integrate both event contextual semantic and graph schema information; (3) propose a Semantic Contrast Event Prediction (ScEP) module to enhance the event representation among numerous candidate events and predict consequential event following prompt learning paradigm. %We construct two CGEP datasets based on existing MAVEN-ERE and ESC corpus for experiments. Experiment results validate our argument our proposed SeDGPL model outperforms the advanced competitors for the CGEP task. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2405.14672 [pdf, other]

Invisible Backdoor Attack against Self-supervised Learning

Authors: Hanrong Zhang, Zhenting Wang, Boheng Li, Fulin Lin, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma

Abstract: Self-supervised learning (SSL) models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in SSL often involve noticeable triggers, like colored patches or visible noise, which are vulnerable to human inspection. This paper proposes an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed… ▽ More Self-supervised learning (SSL) models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in SSL often involve noticeable triggers, like colored patches or visible noise, which are vulnerable to human inspection. This paper proposes an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are less effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in SSL. Building on this insight, we design an attack using optimized triggers disentangled with the augmented transformation in the SSL, while remaining imperceptible to human vision. Experiments on five datasets and six SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/INACTIVE. △ Less

Submitted 3 April, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14040 [pdf, other]

Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin

Abstract: Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes… ▽ More Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes. In this work, we introduce a new task of Synchronized Video Storytelling, which aims to generate synchronous and informative narrations for videos. These narrations, associated with each video clip, should relate to the visual content, integrate relevant knowledge, and have an appropriate word count corresponding to the clip's duration. Specifically, a structured storyline is beneficial to guide the generation process, ensuring coherence and integrity. To support the exploration of this task, we introduce a new benchmark dataset E-SyncVidStory with rich annotations. Since existing Multimodal LLMs are not effective in addressing this task in one-shot or few-shot settings, we propose a framework named VideoNarrator that can generate a storyline for input videos and simultaneously generate narrations with the guidance of the generated or predefined storyline. We further introduce a set of evaluation metrics to thoroughly assess the generation. Both automatic and human evaluations validate the effectiveness of our approach. Our dataset, codes, and evaluations will be released. △ Less

Submitted 30 December, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: 15 pages, 13 figures

Journal ref: https://aclanthology.org/2024.acl-long.513/

arXiv:2405.00222 [pdf, other]

Optimized Distribution of Entanglement Graph States in Quantum Networks

Authors: Xiaojie Fan, Caitao Zhan, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantu… ▽ More Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantum networks lie at the heart of the success of future quantum information technologies. In quantum networks, multipartite entangled states distributed over the network help implement and support many quantum network applications for communications, sensing, and computing. Our work focuses on developing optimal techniques to generate and distribute multipartite entanglement states efficiently. Prior works on generating general multipartite entanglement states have focused on the objective of minimizing the number of maximally entangled pairs (EPs) while ignoring the heterogeneity of the network nodes and links as well as the stochastic nature of underlying processes. In this work, we develop a hypergraph based linear programming framework that delivers optimal (under certain assumptions) generation schemes for general multipartite entanglement represented by graph states, under the network resources, decoherence, and fidelity constraints, while considering the stochasticity of the underlying processes. We illustrate our technique by developing generation schemes for the special cases of path and tree graph states, and discuss optimized generation schemes for more general classes of graph states. Using extensive simulations over a quantum network simulator (NetSquid), we demonstrate the effectiveness of our developed techniques and show that they outperform prior known schemes by up to orders of magnitude. △ Less

Submitted 18 March, 2025; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: 16 pages, 20 figures

arXiv:2403.04290 [pdf, other]

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

Authors: Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

Abstract: Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical tasks and are restricted to inadequate medical multi-modal knowledge, constraining medical comprehensive diagnosis. In this paper, we propose MedM2G, a Medical… ▽ More Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical tasks and are restricted to inadequate medical multi-modal knowledge, constraining medical comprehensive diagnosis. In this paper, we propose MedM2G, a Medical Multi-Modal Generative framework, with the key innovation to align, extract, and generate medical multi-modal within a unified model. Extending beyond single or two medical modalities, we efficiently align medical multi-modal through the central alignment approach in the unified space. Significantly, our framework extracts valuable clinical knowledge by preserving the medical visual invariant of each imaging modal, thereby enhancing specific medical information for multi-modal generation. By conditioning the adaptive cross-guided parameters into the multi-flow diffusion framework, our model promotes flexible interactions among medical multi-modal for generation. MedM2G is the first medical generative model that unifies medical generation tasks of text-to-image, image-to-text, and unified generation of medical modalities (CT, MRI, X-ray). It performs 5 medical generation tasks across 10 datasets, consistently outperforming various state-of-the-art works. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2312.11171 [pdf, other]

UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts

Authors: Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang

Abstract: Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application. However, most Med-VLP models learn task-specific representations independently from scratch, thereby leading to great inflexibility when they work across multiple fine-tuning tasks. In this work, we propose UniDCP, a Unified medical vision-language model with Dynamic Cr… ▽ More Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application. However, most Med-VLP models learn task-specific representations independently from scratch, thereby leading to great inflexibility when they work across multiple fine-tuning tasks. In this work, we propose UniDCP, a Unified medical vision-language model with Dynamic Cross-modal learnable Prompts, which can be plastically applied to multiple medical vision-language tasks. Specifically, we explicitly construct a unified framework to harmonize diverse inputs from multiple pretraining tasks by leveraging cross-modal prompts for unification, which accordingly can accommodate heterogeneous medical fine-tuning tasks. Furthermore, we conceive a dynamic cross-modal prompt optimizing strategy that optimizes the prompts within the shareable space for implicitly processing the shareable clinic knowledge. UniDCP is the first Med-VLP model capable of performing all 8 medical uni-modal and cross-modal tasks over 14 corresponding datasets, consistently yielding superior results over diverse state-of-the-art methods. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2307.09813 [pdf, other]

DAPrompt: Deterministic Assumption Prompt Learning for Event Causality Identification

Authors: Wei Xiang, Chuanhong Zhan, Bang Wang

Abstract: Event Causality Identification (ECI) aims at determining whether there is a causal relation between two event mentions. Conventional prompt learning designs a prompt template to first predict an answer word and then maps it to the final decision. Unlike conventional prompts, we argue that predicting an answer word may not be a necessary prerequisite for the ECI task. Instead, we can first make a d… ▽ More Event Causality Identification (ECI) aims at determining whether there is a causal relation between two event mentions. Conventional prompt learning designs a prompt template to first predict an answer word and then maps it to the final decision. Unlike conventional prompts, we argue that predicting an answer word may not be a necessary prerequisite for the ECI task. Instead, we can first make a deterministic assumption on the existence of causal relation between two events and then evaluate its rationality to either accept or reject the assumption. The design motivation is to try the most utilization of the encyclopedia-like knowledge embedded in a pre-trained language model. In light of such considerations, we propose a deterministic assumption prompt learning model, called DAPrompt, for the ECI task. In particular, we design a simple deterministic assumption template concatenating with the input event pair, which includes two masks as predicted events' tokens. We use the probabilities of predicted events to evaluate the assumption rationality for the final event causality decision. Experiments on the EventStoryLine corpus and Causal-TimeBank corpus validate our design objective in terms of significant performance improvements over the state-of-the-art algorithms. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.14701 [pdf, other]

Hard Sample Mining Enabled Supervised Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis

Authors: Zixuan Wang, Bo Qin, Mengxuan Li, Chenlu Zhan, Mark D. Butala, Peng Peng, Hongwei Wang

Abstract: The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple health conditions in the pitch system due to the long-term wear and tear poses challenges in accurately classifying them, thus increasing the maintenance cost of wind turbines or even damaging them.… ▽ More The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple health conditions in the pitch system due to the long-term wear and tear poses challenges in accurately classifying them, thus increasing the maintenance cost of wind turbines or even damaging them. This paper proposes a novel method based on hard sample mining-enabled supervised contrastive learning (HSMSCL) to address this problem. The proposed method employs cosine similarity to identify hard samples and subsequently, leverages supervised contrastive learning to learn more discriminative representations by constructing hard sample pairs. Furthermore, the hard sample mining framework in the proposed method also constructs hard samples with learned representations to make the training process of the multilayer perceptron (MLP) more challenging and make it a more effective classifier. The proposed approach progressively improves the fault diagnosis model by introducing hard samples in the SCL and MLP phases, thus enhancing its performance in complex multi-class fault diagnosis tasks. To evaluate the effectiveness of the proposed method, two real datasets comprising wind turbine pitch system cog belt fracture data are utilized. The fault diagnosis performance of the proposed method is compared against existing methods, and the results demonstrate its superior performance. The proposed approach exhibits significant improvements in fault diagnosis performance, providing promising prospects for enhancing the reliability and efficiency of wind turbine pitch system fault diagnosis. △ Less

Submitted 10 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.07484 [pdf, other]

Multi-objective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex

Authors: Hongsong Feng, Rui Wang, Chang-Guo Zhan, Guo-Wei Wei

Abstract: Opioid Use Disorder (OUD) has emerged as a significant global public health issue, with complex multifaceted conditions. Due to the lack of effective treatment options for various conditions, there is a pressing need for the discovery of new medications. In this study, we propose a deep generative model that combines a stochastic differential equation (SDE)-based diffusion modeling with the latent… ▽ More Opioid Use Disorder (OUD) has emerged as a significant global public health issue, with complex multifaceted conditions. Due to the lack of effective treatment options for various conditions, there is a pressing need for the discovery of new medications. In this study, we propose a deep generative model that combines a stochastic differential equation (SDE)-based diffusion modeling with the latent space of a pretrained autoencoder model. The molecular generator enables efficient generation of molecules that are effective on multiple targets, specifically the mu, kappa, and delta opioid receptors. Furthermore, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of the generated molecules to identify drug-like compounds. To enhance the pharmacokinetic properties of some lead compounds, we employ a molecular optimization approach. We obtain a diverse set of drug-like molecules. We construct binding affinity predictors by integrating molecular fingerprints derived from autoencoder embeddings, transformer embeddings, and topological Laplacians with advanced machine learning algorithms. Further experimental studies are needed to evaluate the pharmacological effects of these drug-like compounds for OUD treatment. Our machine learning platform serves as a valuable tool in designing and optimizing effective molecules for addressing OUD. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2212.10729 [pdf, other]

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Authors: Chenlu Zhan, Peng Peng, Hongsen Wang, Tao Chen, Hongwei Wang

Abstract: Medical Visual Question Answering (Medical-VQA) aims to to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and In… ▽ More Medical Visual Question Answering (Medical-VQA) aims to to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. Specifically, to learn an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy. Technically, the proposed strategy learns a constraint for the vision and texture encoders to be close in a same space, which is gradually loosened as the higher number of layers. Moreover, for grasping the unified semantic representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. Concretely, while the encoder training minimizes the distance between original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. Furthermore, we also intuitively take a further exploration to the unified adversarial masking augmentation model, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE public benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA models. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis. △ Less

Submitted 27 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.02871 [pdf, other]

Video Object of Interest Segmentation

Authors: Siyuan Zhou, Chunru Zhan, Biao Wang, Tiezheng Ge, Yuning Jiang, Li Niu

Abstract: In this work, we present a new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are relevant to the target image. This problem combines the traditional video object segmentation task with an additional image indicating the content that users are c… ▽ More In this work, we present a new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, our objective is to simultaneously segment and track all objects in the video that are relevant to the target image. This problem combines the traditional video object segmentation task with an additional image indicating the content that users are concerned with. Since no existing dataset is perfectly suitable for this new task, we specifically construct a large-scale dataset called LiveVideos, which contains 2418 pairs of target images and live videos with instance-level annotations. In addition, we propose a transformer-based method for this task. We revisit Swin Transformer and design a dual-path structure to fuse video and image features. Then, a transformer decoder is employed to generate object proposals for segmentation and tracking from the fused features. Extensive experiments on LiveVideos dataset show the superiority of our proposed method. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 13 pages, 8 figures

arXiv:2209.12749 [pdf, other]

doi 10.1016/j.sysarc.2022.102780

Joint Task Offloading and Resource Optimization in NOMA-based Vehicular Edge Computing: A Game-Theoretic DRL Approach

Authors: Xincao Xu, Kai Liu, Penglin Dai, Feiyu Jin, Hualing Ren, Choujun Zhan, Songtao Guo

Abstract: Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneo… ▽ More Vehicular edge computing (VEC) becomes a promising paradigm for the development of emerging intelligent transportation systems. Nevertheless, the limited resources and massive transmission demands bring great challenges on implementing vehicular applications with stringent deadline requirements. This work presents a non-orthogonal multiple access (NOMA) based architecture in VEC, where heterogeneous edge nodes are cooperated for real-time task processing. We derive a vehicle-to-infrastructure (V2I) transmission model by considering both intra-edge and inter-edge interferences and formulate a cooperative resource optimization (CRO) problem by jointly optimizing the task offloading and resource allocation, aiming at maximizing the service ratio. Further, we decompose the CRO into two subproblems, namely, task offloading and resource allocation. In particular, the task offloading subproblem is modeled as an exact potential game (EPG), and a multi-agent distributed distributional deep deterministic policy gradient (MAD4PG) is proposed to achieve the Nash equilibrium. The resource allocation subproblem is divided into two independent convex optimization problems, and an optimal solution is proposed by using a gradient-based iterative method and KKT condition. Finally, we build the simulation model based on real-world vehicle trajectories and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solutions. △ Less

Submitted 24 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Journal ref: Journal of Systems Architecture 134 (2023) 102780

arXiv:2201.07762 [pdf, other]

doi 10.1109/ACCESS.2024.3352034

DeepAlloc: CNN-Based Approach to Efficient Spectrum Allocation in Shared Spectrum Systems

Authors: Mohammad Ghaderibaneh, Caitao Zhan, Himanshu Gupta

Abstract: Shared spectrum systems facilitate spectrum allocation to unlicensed users without harming the licensed users; they offer great promise in optimizing spectrum utility, but their management (in particular, efficient spectrum allocation to unlicensed users) is challenging. A significant shortcoming of current allocation methods is that they are either done very conservatively to ensure correctness,… ▽ More Shared spectrum systems facilitate spectrum allocation to unlicensed users without harming the licensed users; they offer great promise in optimizing spectrum utility, but their management (in particular, efficient spectrum allocation to unlicensed users) is challenging. A significant shortcoming of current allocation methods is that they are either done very conservatively to ensure correctness, or are based on imperfect propagation models and/or spectrum sensing with poor spatial granularity. This leads to poor spectrum utilization, the fundamental objective of shared spectrum systems. To allocate spectrum near-optimally to secondary users in general scenarios, we fundamentally need to have knowledge of the signal path-loss function. In practice, however, even the best known path-loss models have unsatisfactory accuracy, and conducting extensive surveys to gather path-loss values is infeasible. To circumvent this challenge, we propose to learn the spectrum allocation function directly using supervised learning techniques. We particularly address the scenarios when the primary users' information may not be available; for such settings, we make use of a crowdsourced sensing architecture and use the spectrum sensor readings as features. We develop an efficient CNN-based approach (called DeepAlloc) and address various challenges that arise in its application to the learning the spectrum allocation function. Via extensive large-scale simulation and a small testbed, we demonstrate the effectiveness of our developed techniques; in particular, we observe that our approach improves the accuracy of standard learning techniques and prior work by up to 60%. △ Less

Submitted 4 April, 2024; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: 15 pages, 16 figures

arXiv:2201.01772 [pdf, other]

Neural Architecture Search for Inversion

Authors: Cheng Zhan, Licheng Zhang, Xin Zhao, Chang-Chun Lee, Shujiao Huang

Abstract: Over the year, people have been using deep learning to tackle inversion problems, and we see the framework has been applied to build relationship between recording wavefield and velocity (Yang et al., 2016). Here we will extend the work from 2 perspectives, one is deriving a more appropriate loss function, as we now, pixel-2-pixel comparison might not be the best choice to characterize image struc… ▽ More Over the year, people have been using deep learning to tackle inversion problems, and we see the framework has been applied to build relationship between recording wavefield and velocity (Yang et al., 2016). Here we will extend the work from 2 perspectives, one is deriving a more appropriate loss function, as we now, pixel-2-pixel comparison might not be the best choice to characterize image structure, and we will elaborate on how to construct cost function to capture high level feature to enhance the model performance. Another dimension is searching for the more appropriate neural architecture, which is a subset of an even bigger picture, the automatic machine learning, or AutoML. There are several famous networks, U-net, ResNet (He et al., 2016) and DenseNet (Huang et al., 2017), and they achieve phenomenal results for certain problems, yet it's hard to argue they are the best for inversion problems without thoroughly searching within certain space. Here we will be showing our architecture search results for inversion. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2112.13181 [pdf, other]

doi 10.1016/j.pmcj.2022.101582

DeepMTL Pro: Deep Learning Based MultipleTransmitter Localization and Power Estimation

Authors: Caitao Zhan, Mohammad Ghaderibaneh, Pranjal Sahu, Himanshu Gupta

Abstract: In this paper, we address the problem of Multiple Transmitter Localization (MTL). MTL is to determine the locations of potential multiple transmitters in a field, based on readings from a distributed set of sensors. In contrast to the widely studied single transmitter localization problem, the MTL problem has only been studied recently in a few works. MTL is of great significance in many applicati… ▽ More In this paper, we address the problem of Multiple Transmitter Localization (MTL). MTL is to determine the locations of potential multiple transmitters in a field, based on readings from a distributed set of sensors. In contrast to the widely studied single transmitter localization problem, the MTL problem has only been studied recently in a few works. MTL is of great significance in many applications wherein intruders may be present. E.g., in shared spectrum systems, detection of unauthorized transmitters and estimating their power are imperative to efficient utilization of the shared spectrum. In this paper, we present DeepMTL, a novel deep-learning approach to address the MTL problem. In particular, we frame MTL as a sequence of two steps, each of which is a computer vision problem: image-to-image translation and object detection. The first step of mage-to-image translation essentially maps an input image representing sensor readings to an image representing the distribution of transmitter locations, and the second object detection step derives precise locations of transmitters from the image of transmitter distributions. For the first step, we design our learning model Sen2Peak, while for the second step, we customize a state-of-the-art object detection model YOLO-cust. Using DeepMTL as a building block, we also develop techniques to estimate transmit power of the localized transmitters. We demonstrate the effectiveness of our approach via extensive large-scale simulations and show that our approach outperforms the previous approaches significantly (by 50% or more) in performance metrics including localization error, miss rate, and false alarm rate. Our method also incurs a very small latency. We evaluate our techniques over a small-scale area with real testbed data and the testbed results align with the simulation results. △ Less

Submitted 22 March, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

Comments: 38 pages, 27 figures. This is the final revision verison of a journal paper submitted to Pervasive and Mobile Computing (PMC). This is an extension of an accepted paper at IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM 2021)

arXiv:2112.11002 [pdf, other]

doi 10.1109/TQE.2022.3168784

Efficient Quantum Network Communication using Optimized Entanglement-Swapping Trees

Authors: Mohammad Ghaderibaneh, Caitao Zhan, Himanshu Gupta, C. R. Ramakrishnan

Abstract: Quantum network communication is challenging, as the No-cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable communication approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the lo… ▽ More Quantum network communication is challenging, as the No-cloning theorem in quantum regime makes many classical techniques inapplicable. For long-distance communication, the only viable communication approach is teleportation of quantum states, which requires a prior distribution of entangled pairs (EPs) of qubits. Establishment of EPs across remote nodes can incur significant latency due to the low probability of success of the underlying physical processes. The focus of our work is to develop efficient techniques that minimize EP generation latency. Prior works have focused on selecting entanglement paths; in contrast, we select entanglement swapping trees--a more accurate representation of the entanglement generation structure. We develop a dynamic programming algorithm to select an optimal swapping-tree for a single pair of nodes, under the given capacity and fidelity constraints. For the general setting, we develop an efficient iterative algorithm to compute a set of swapping trees. We present simulation results which show that our solutions outperform the prior approaches by an order of magnitude and are viable for long-distance entanglement generation. △ Less

Submitted 4 April, 2024; v1 submitted 21 December, 2021; originally announced December 2021.

arXiv:2101.04264 [pdf]

HighAir: A Hierarchical Graph Neural Network-Based Air Quality Forecasting Method

Authors: Jiahui Xu, Ling Chen, Mingqi Lv, Chaoqun Zhan, Sanjian Chen, Jian Chang

Abstract: Accurately forecasting air quality is critical to protecting general public from lung and heart diseases. This is a challenging task due to the complicated interactions among distinct pollution sources and various other influencing factors. Existing air quality forecasting methods cannot effectively model the diffusion processes of air pollutants between cities and monitoring stations, which may s… ▽ More Accurately forecasting air quality is critical to protecting general public from lung and heart diseases. This is a challenging task due to the complicated interactions among distinct pollution sources and various other influencing factors. Existing air quality forecasting methods cannot effectively model the diffusion processes of air pollutants between cities and monitoring stations, which may suddenly deteriorate the air quality of a region. In this paper, we propose HighAir, i.e., a hierarchical graph neural network-based air quality forecasting method, which adopts an encoder-decoder architecture and considers complex air quality influencing factors, e.g., weather and land usage. Specifically, we construct a city-level graph and station-level graphs from a hierarchical perspective, which can consider city-level and station-level patterns, respectively. We design two strategies, i.e., upper delivery and lower updating, to implement the inter-level interactions, and introduce message passing mechanism to implement the intra-level interactions. We dynamically adjust edge weights based on wind direction to model the correlations between dynamic factors and air quality. We compare HighAir with the state-of-the-art air quality forecasting methods on the dataset of Yangtze River Delta city group, which covers 10 major cities within 61,500 km2. The experimental results show that HighAir significantly outperforms other methods. △ Less

Submitted 11 January, 2021; originally announced January 2021.

arXiv:2001.09021 [pdf]

Dense Residual Network: Enhancing Global Dense Feature Flow for Character Recognition

Authors: Zhao Zhang, Zemin Tang, Yang Wang, Zheng Zhang, Choujun Zhan, Zhengjun Zha, Meng Wang

Abstract: Deep Convolutional Neural Networks (CNNs), such as Dense Convolutional Networks (DenseNet), have achieved great success for image representation by discovering deep hierarchical information. However, most existing networks simply stacks the convolutional layers and hence failing to fully discover local and global feature information among layers. In this paper, we mainly explore how to enhance the… ▽ More Deep Convolutional Neural Networks (CNNs), such as Dense Convolutional Networks (DenseNet), have achieved great success for image representation by discovering deep hierarchical information. However, most existing networks simply stacks the convolutional layers and hence failing to fully discover local and global feature information among layers. In this paper, we mainly explore how to enhance the local and global dense feature flow by exploiting hierarchical features fully from all the convolution layers. Technically, we propose an efficient and effective CNN framework, i.e., Fast Dense Residual Network (FDRN), for text recognition. To construct FDRN, we propose a new fast residual dense block (f-RDB) to retain the ability of local feature fusion and local residual learning of original RDB, which can reduce the computing efforts at the same time. After fully learning local residual dense features, we utilize the sum operation and several f-RDBs to define a new block termed global dense block (GDB) by imitating the construction of dense blocks to learn global dense residual features adaptively in a holistic way. Finally, we use two convolution layers to construct a down-sampling block to reduce the global feature size and extract deeper features. Extensive simulations show that FDRN obtains the enhanced recognition results, compared with other related models. △ Less

Submitted 8 February, 2021; v1 submitted 23 January, 2020; originally announced January 2020.

Comments: Please cite this work as: Zhao Zhang, Zemin Tang, Yang Wang, Zheng Zhang, Choujun Zhan, Zhengjun Zha and Meng Wang, "Dense Residual Network: Enhancing Global Dense Feature Flow for Character Recognition," Neural Networks (NN), Feb 2021. arXiv admin note: text overlap with arXiv:1912.07016

arXiv:1812.07367 [pdf]

Deep Learning Approach in Automatic Iceberg - Ship Detection with SAR Remote Sensing Data

Authors: Cheng Zhan, Licheng Zhang, Zhenzhen Zhong, Sher Didi-Ooi, Youzuo Lin, Yunxi Zhang, Shujiao Huang, Changchun Wang

Abstract: Deep Learning is gaining traction with geophysics community to understand subsurface structures, such as fault detection or salt body in seismic data. This study describes using deep learning method for iceberg or ship recognition with synthetic aperture radar (SAR) data. Drifting icebergs pose a potential threat to activities offshore around the Arctic, including for both ship navigation and oil… ▽ More Deep Learning is gaining traction with geophysics community to understand subsurface structures, such as fault detection or salt body in seismic data. This study describes using deep learning method for iceberg or ship recognition with synthetic aperture radar (SAR) data. Drifting icebergs pose a potential threat to activities offshore around the Arctic, including for both ship navigation and oil rigs. Advancement of satellite imagery using weather-independent cross-polarized radar has enabled us to monitor and delineate icebergs and ships, however a human component is needed to classify the images. Here we present Transfer Learning, a convolutional neural network (CNN) designed to work with a limited training data and features, while demonstrating its effectiveness in this problem. Key aspect of the approach is data augmentation and stacking of multiple outputs, resulted in a significant boost in accuracy (logarithmic score of 0.1463). This algorithm has been tested through participation at the Statoil/C-Core Kaggle competition. △ Less

Submitted 9 December, 2018; originally announced December 2018.

arXiv:1810.07075 [pdf]

A Multi-stage Framework with Context Information Fusion Structure for Skin Lesion Segmentation

Authors: Yujiao Tang, Feng Yang, Shaofeng Yuan, Chang'an Zhan

Abstract: The computer-aided diagnosis (CAD) systems can highly improve the reliability and efficiency of melanoma recognition. As a crucial step of CAD, skin lesion segmentation has the unsatisfactory accuracy in existing methods due to large variability in lesion appearance and artifacts. In this work, we propose a framework employing multi-stage UNets (MS-UNet) in the auto-context scheme to segment skin… ▽ More The computer-aided diagnosis (CAD) systems can highly improve the reliability and efficiency of melanoma recognition. As a crucial step of CAD, skin lesion segmentation has the unsatisfactory accuracy in existing methods due to large variability in lesion appearance and artifacts. In this work, we propose a framework employing multi-stage UNets (MS-UNet) in the auto-context scheme to segment skin lesion accurately end-to-end. We apply two approaches to boost the performance of MS-UNet. First, UNet is coupled with a context information fusion structure (CIFS) to integrate the low-level and context information in the multi-scale feature space. Second, to alleviate the gradient vanishing problem, we use deep supervision mechanism through supervising MS-UNet by minimizing a weighted Jaccard distance loss function. Four out of five commonly used performance metrics, including Jaccard index and Dice coefficient, show that our approach outperforms the state-ofthe-art deep learning based methods on the ISBI 2016 Skin Lesion Challenge dataset. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: 4 pages, 3 figures, 1 table

arXiv:1805.04364 [pdf, ps, other]

Trajectory Design for Distributed Estimation in UAV Enabled Wireless Sensor Network

Authors: Cheng Zhan, Yong Zeng, Rui Zhang

Abstract: In this paper, we study an unmanned aerial vehicle(UAV)-enabled wireless sensor network, where a UAV is dispatched to collect the sensed data from distributed sensor nodes (SNs) for estimating an unknown parameter. It is revealed that in order to minimize the mean square error (MSE) for the estimation, the UAV should collect the data from as many SNs as possible, based on which an optimization pro… ▽ More In this paper, we study an unmanned aerial vehicle(UAV)-enabled wireless sensor network, where a UAV is dispatched to collect the sensed data from distributed sensor nodes (SNs) for estimating an unknown parameter. It is revealed that in order to minimize the mean square error (MSE) for the estimation, the UAV should collect the data from as many SNs as possible, based on which an optimization problem is formulated to design the UAV's trajectory subject to its practical mobility constraints. Although the problem is non-convex and NP-hard, we show that the optimal UAV trajectory consists of connected line segments only. With this simplification, an efficient suboptimal solution is proposed by leveraging the classic traveling salesman problem (TSP) method and applying convex optimization techniques. Simulation results show that the proposed trajectory design achieves significant performance gains in terms of the number of SNs whose data are successfully collected, as compared to other benchmark schemes. △ Less

Submitted 11 May, 2018; originally announced May 2018.

Comments: 5 pages, 4 figures, submitted for possible journal publication

arXiv:1708.00221 [pdf, ps, other]

Energy-Efficient Data Collection in UAV Enabled Wireless Sensor Network

Authors: Cheng Zhan, Yong Zeng, Rui Zhang

Abstract: In wireless sensor networks (WSNs), utilizing the unmanned aerial vehicle (UAV) as a mobile data collector for the ground sensor nodes (SNs) is an energy-efficient technique to prolong the network lifetime. Specifically, since the UAV can sequentially move close to each of the SNs when collecting data from them and thus reduce the link distance for saving the SNs' transmission energy. In this lett… ▽ More In wireless sensor networks (WSNs), utilizing the unmanned aerial vehicle (UAV) as a mobile data collector for the ground sensor nodes (SNs) is an energy-efficient technique to prolong the network lifetime. Specifically, since the UAV can sequentially move close to each of the SNs when collecting data from them and thus reduce the link distance for saving the SNs' transmission energy. In this letter, considering a general fading channel model for the SN-UAV links, we jointly optimize the SNs' wake-up schedule and UAV's trajectory to minimize the maximum energy consumption of all SNs, while ensuring that the required amount of data is collected reliably from each SN. We formulate our design as a mixed-integer non-convex optimization problem. By applying the successive convex optimization technique, an efficient iterative algorithm is proposed to find a sub-optimal solution. Numerical results show that the proposed scheme achieves significant network energy saving as compared to benchmark schemes. △ Less

Submitted 1 August, 2017; originally announced August 2017.

Comments: Submitted for possible journal publication

arXiv:1706.08217 [pdf, other]

An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

Authors: Zhenzhen Zhong, Shujiao Huang, Cheng Zhan, Licheng Zhang, Zhiwei Xiao, Chang-Chun Wang, Pei Yang

Abstract: Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multi-label video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio & v… ▽ More Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multi-label video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio & visual features from every second of video (3.2 billion feature vectors in total). Google cloud recently released the datasets and organized 'Google Cloud & YouTube-8M Video Understanding Challenge' on Kaggle. Competitors are challenged to develop classification algorithms that assign video-level labels using the new and improved Youtube-8M V2 dataset. Inspired by the competition, we started exploration of audio understanding and classification using deep learning algorithms and ensemble methods. We built several baseline predictions according to the benchmark paper and public github tensorflow code. Furthermore, we improved global prediction accuracy (GAP) from base level 77% to 80.7% through approaches of ensemble. △ Less

Submitted 25 June, 2017; originally announced June 2017.

Comments: 5 pages, 2 figures

Showing 1–33 of 33 results for author: Zhan, C