-
3DeepRep: 3D Deep Low-rank Tensor Representation for Hyperspectral Image Inpainting
Authors:
Yunshan Li,
Wenwu Gong,
Qianqian Wang,
Chao Wang,
Lili Yang
Abstract:
Recent approaches based on transform-based tensor nuclear norm (TNN) have demonstrated notable effectiveness in hyperspectral image (HSI) inpainting by leveraging low-rank structures in latent representations. Recent developments incorporate deep transforms to improve low-rank tensor representation; however, existing approaches typically restrict the transform to the spectral mode, neglecting low-…
▽ More
Recent approaches based on transform-based tensor nuclear norm (TNN) have demonstrated notable effectiveness in hyperspectral image (HSI) inpainting by leveraging low-rank structures in latent representations. Recent developments incorporate deep transforms to improve low-rank tensor representation; however, existing approaches typically restrict the transform to the spectral mode, neglecting low-rank properties along other tensor modes. In this paper, we propose a novel 3-directional deep low-rank tensor representation (3DeepRep) model, which performs deep nonlinear transforms along all three modes of the HSI tensor. To enforce low-rankness, the model minimizes the nuclear norms of mode-i frontal slices in the corresponding latent space for each direction (i=1,2,3), forming a 3-directional TNN regularization. The outputs from the three directional branches are subsequently fused via a learnable aggregation module to produce the final result. An efficient gradient-based optimization algorithm is developed to solve the model in a self-supervised manner. Extensive experiments on real-world HSI datasets demonstrate that the proposed method achieves superior inpainting performance compared to existing state-of-the-art techniques, both qualitatively and quantitatively.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
WDMIR: Wavelet-Driven Multimodal Intent Recognition
Authors:
Weiyin Gong,
Kai Zhang,
Yanghai Zhang,
Qi Liu,
Xinjie Sun,
Junyu Lu,
Linbo Zhu
Abstract:
Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition(WDMIR) framework that enhanc…
▽ More
Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition(WDMIR) framework that enhances intent understanding through frequency-domain analysis of non-verbal information. To be more specific, we propose: (1) a wavelet-driven fusion module that performs synchronized decomposition and integration of video-audio features in the frequency domain, enabling fine-grained analysis of temporal dynamics; (2) a cross-modal interaction mechanism that facilitates progressive feature enhancement from bimodal to trimodal integration, effectively bridging the semantic gap between verbal and non-verbal information. Extensive experiments on MIntRec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1.13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0.41% increase in recognition accuracy when analyzing subtle emotional cues.
△ Less
Submitted 26 May, 2025;
originally announced June 2025.
-
Information-Computation Gaps in Quantum Learning via Low-Degree Likelihood
Authors:
Sitan Chen,
Weiyuan Gong,
Jonas Haferkamp,
Yihui Quek
Abstract:
In a variety of physically relevant settings for learning from quantum data, designing protocols that can computationally efficiently extract information remains largely an art, and there are important cases where we believe this to be impossible, that is, where there is an information-computation gap. While there is a large array of tools in the classical literature for giving evidence for averag…
▽ More
In a variety of physically relevant settings for learning from quantum data, designing protocols that can computationally efficiently extract information remains largely an art, and there are important cases where we believe this to be impossible, that is, where there is an information-computation gap. While there is a large array of tools in the classical literature for giving evidence for average-case hardness of statistical inference problems, the corresponding tools in the quantum literature are far more limited. One such framework in the classical literature, the low-degree method, makes predictions about hardness of inference problems based on the failure of estimators given by low-degree polynomials. In this work, we extend this framework to the quantum setting.
We establish a general connection between state designs and low-degree hardness. We use this to obtain the first information-computation gaps for learning Gibbs states of random, sparse, non-local Hamiltonians. We also use it to prove hardness for learning random shallow quantum circuit states in a challenging model where states can be measured in adaptively chosen bases. To our knowledge, the ability to model adaptivity within the low-degree framework was open even in classical settings. In addition, we also obtain a low-degree hardness result for quantum error mitigation against strategies with single-qubit measurements.
We define a new quantum generalization of the planted biclique problem and identify the threshold at which this problem becomes computationally hard for protocols that perform local measurements. Interestingly, the complexity landscape for this problem shifts when going from local measurements to more entangled single-copy measurements.
We show average-case hardness for the "standard" variant of Learning Stabilizers with Noise and for agnostically learning product states.
△ Less
Submitted 17 June, 2025; v1 submitted 28 May, 2025;
originally announced May 2025.
-
DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding
Authors:
Chong Li,
Jingyang Huo,
Weikang Gong,
Yanwei Fu,
Xiangyang Xue,
Jianfeng Feng
Abstract:
Decoding visual experiences from brain activity is a significant challenge. Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. However, these aspects are all essential and are processed through distinct pathways in the brain. Motivated by this, we propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals.…
▽ More
Decoding visual experiences from brain activity is a significant challenge. Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. However, these aspects are all essential and are processed through distinct pathways in the brain. Motivated by this, we propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals. It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video. This approach not only simplifies the complex task of video decoding by decomposing it into manageable sub-tasks, but also establishes a clearer connection between learned representations and their biological counterpart, as supported by ablation studies. Further, our experiments show significant improvements over previous state-of-the-art methods, achieving 82.4% accuracy for semantic classification, 70.6% accuracy in spatial consistency, a 0.212 cosine similarity for motion prediction, and 21.9% 50-way accuracy for video generation. Additionally, neural encoding analyses for semantic and spatial information align with the two-streams hypothesis, further validating the distinct roles of the ventral and dorsal pathways. Overall, DecoFuse provides a strong and biologically plausible framework for fMRI-to-video decoding. Project page: https://chongjg.github.io/DecoFuse/.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Towards Understanding the Optimization Mechanisms in Deep Learning
Authors:
Binchuan Qi,
Wei Gong,
Li Li
Abstract:
In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both…
▽ More
In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Composite Indicator-Guided Infilling Sampling for Expensive Multi-Objective Optimization
Authors:
Huixiang Zhen,
Xiaotong Li,
Wenyin Gong,
Xiangyun Hu
Abstract:
In expensive multi-objective optimization, where the evaluation budget is strictly limited, selecting promising candidate solutions for expensive fitness evaluations is critical for accelerating convergence and improving algorithmic performance. However, designing an optimization strategy that effectively balances convergence, diversity, and distribution remains a challenge. To tackle this issue,…
▽ More
In expensive multi-objective optimization, where the evaluation budget is strictly limited, selecting promising candidate solutions for expensive fitness evaluations is critical for accelerating convergence and improving algorithmic performance. However, designing an optimization strategy that effectively balances convergence, diversity, and distribution remains a challenge. To tackle this issue, we propose a composite indicator-based evolutionary algorithm (CI-EMO) for expensive multi-objective optimization. In each generation of the optimization process, CI-EMO first employs NSGA-III to explore the solution space based on fitness values predicted by surrogate models, generating a candidate population. Subsequently, we design a novel composite performance indicator to guide the selection of candidates for real fitness evaluation. This indicator simultaneously considers convergence, diversity, and distribution to improve the efficiency of identifying promising candidate solutions, which significantly improves algorithm performance. The composite indicator-based candidate selection strategy is easy to achieve and computes efficiency. Component analysis experiments confirm the effectiveness of each element in the composite performance indicator. Comparative experiments on three benchmark test sets and real-world problems demonstrate that the proposed algorithm outperforms five state-of-the-art expensive multi-objective optimization algorithms.
△ Less
Submitted 12 June, 2025; v1 submitted 28 March, 2025;
originally announced March 2025.
-
Multimodal Image Matching based on Frequency-domain Information of Local Energy Response
Authors:
Meng Yang,
Jun Chen,
Wenping Gong,
Longsheng Wei,
Xin Tian
Abstract:
Complicated nonlinear intensity differences, nonlinear local geometric distortions, noises and rotation transformation are main challenges in multimodal image matching. In order to solve these problems, we propose a method based on Frequency-domain Information of Local Energy Response called FILER. The core of FILER is the local energy response model based on frequency-domain information, which ca…
▽ More
Complicated nonlinear intensity differences, nonlinear local geometric distortions, noises and rotation transformation are main challenges in multimodal image matching. In order to solve these problems, we propose a method based on Frequency-domain Information of Local Energy Response called FILER. The core of FILER is the local energy response model based on frequency-domain information, which can overcome the effect of nonlinear intensity differences. To improve the robustness to local nonlinear geometric distortions and noises, we design a new edge structure enhanced feature detector and convolutional feature weighted descriptor, respectively. In addition, FILER overcomes the sensitivity of the frequency-domain information to the rotation angle and achieves rotation invariance. Extensive experiments multimodal image pairs show that FILER outperforms other state-of-the-art algorithms and has good robustness and universality.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Injecting Imbalance Sensitivity for Multi-Task Learning
Authors:
Zhipeng Zhou,
Liu Liu,
Peilin Zhao,
Wei Gong
Abstract:
Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentia…
▽ More
Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL. In line with this perspective, we enhance the existing baseline method by injecting imbalance-sensitivity through the imposition of constraints on the projected norms. To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Ansatz-free Hamiltonian learning with Heisenberg-limited scaling
Authors:
Hong-Ye Hu,
Muzhou Ma,
Weiyuan Gong,
Qi Ye,
Yu Tong,
Steven T. Flammia,
Susanne F. Yelin
Abstract:
Learning the unknown interactions that govern a quantum system is crucial for quantum information processing, device benchmarking, and quantum sensing. The problem, known as Hamiltonian learning, is well understood under the assumption that interactions are local, but this assumption may not hold for arbitrary Hamiltonians. Previous methods all require high-order inverse polynomial dependency with…
▽ More
Learning the unknown interactions that govern a quantum system is crucial for quantum information processing, device benchmarking, and quantum sensing. The problem, known as Hamiltonian learning, is well understood under the assumption that interactions are local, but this assumption may not hold for arbitrary Hamiltonians. Previous methods all require high-order inverse polynomial dependency with precision, unable to surpass the standard quantum limit and reach the gold standard Heisenberg-limited scaling. Whether Heisenberg-limited Hamiltonian learning is possible without prior assumptions about the interaction structures, a challenge we term \emph{ansatz-free Hamiltonian learning}, remains an open question. In this work, we present a quantum algorithm to learn arbitrary sparse Hamiltonians without any structure constraints using only black-box queries of the system's real-time evolution and minimal digital controls to attain Heisenberg-limited scaling in estimation error. Our method is also resilient to state-preparation-and-measurement errors, enhancing its practical feasibility. We numerically demonstrate our ansatz-free protocol for learning physical Hamiltonians and validating analog quantum simulations, benchmarking our performance against the state-of-the-art Heisenberg-limited learning approach. Moreover, we establish a fundamental trade-off between total evolution time and quantum control on learning arbitrary interactions, revealing the intrinsic interplay between controllability and total evolution time complexity for any learning algorithm. These results pave the way for further exploration into Heisenberg-limited Hamiltonian learning in complex quantum systems under minimal assumptions, potentially enabling new benchmarking and verification protocols.
△ Less
Submitted 30 June, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
UniGO: A Unified Graph Neural Network for Modeling Opinion Dynamics on Graphs
Authors:
Hao Li,
Hao Jiang,
Yuke Zheng,
Hao Sun,
Wenying Gong
Abstract:
Polarization and fragmentation in social media amplify user biases, making it increasingly important to understand the evolution of opinions. Opinion dynamics provide interpretability for studying opinion evolution, yet incorporating these insights into predictive models remains challenging. This challenge arises due to the inherent complexity of the diversity of opinion fusion rules and the diffi…
▽ More
Polarization and fragmentation in social media amplify user biases, making it increasingly important to understand the evolution of opinions. Opinion dynamics provide interpretability for studying opinion evolution, yet incorporating these insights into predictive models remains challenging. This challenge arises due to the inherent complexity of the diversity of opinion fusion rules and the difficulty in capturing equilibrium states while avoiding over-smoothing. This paper constructs a unified opinion dynamics model to integrate different opinion fusion rules and generates corresponding synthetic datasets. To fully leverage the advantages of unified opinion dynamics, we introduces UniGO, a framework for modeling opinion evolution on graphs. Using a coarsen-refine mechanism, UniGO efficiently models opinion dynamics through a graph neural network, mitigating over-smoothing while preserving equilibrium phenomena. UniGO leverages pretraining on synthetic datasets, which enhances its ability to generalize to real-world scenarios, providing a viable paradigm for applications of opinion dynamics. Experimental results on both synthetic and real-world datasets demonstrate UniGO's effectiveness in capturing complex opinion formation processes and predicting future evolution. The pretrained model also shows strong generalization capability, validating the benefits of using synthetic data to boost real-world performance.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Integrating Spatiotemporal Vision Transformer into Digital Twins for High-Resolution Heat Stress Forecasting in Campus Environments
Authors:
Wenjing Gong,
Xinyue Ye,
Keshu Wu,
Suphanut Jamonnak,
Wenyu Zhang,
Yifan Yang,
Xiao Huang
Abstract:
Extreme heat events exacerbated by climate change pose significant challenges to urban resilience and planning. This study introduces a climate-responsive digital twin framework integrating the Spatiotemporal Vision Transformer (ST-ViT) model to enhance heat stress forecasting and decision-making. Using a Texas campus as a testbed, we synthesized high-resolution physical model simulations with spa…
▽ More
Extreme heat events exacerbated by climate change pose significant challenges to urban resilience and planning. This study introduces a climate-responsive digital twin framework integrating the Spatiotemporal Vision Transformer (ST-ViT) model to enhance heat stress forecasting and decision-making. Using a Texas campus as a testbed, we synthesized high-resolution physical model simulations with spatial and meteorological data to develop fine-scale human thermal predictions. The ST-ViT-powered digital twin enables efficient, data-driven insights for planners, policymakers, and campus stakeholders, supporting targeted heat mitigation strategies and advancing climate-adaptive urban design.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Authors:
Wenbo Gong,
Meyer Scetbon,
Chao Ma,
Edward Meeds
Abstract:
Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the lens of structured Fisher information matrix (FIM) approximation. We show that many state-of-the-art efficient optimizers can be viewed as solutions to FIM appr…
▽ More
Designing efficient optimizers for large language models (LLMs) with low-memory requirements and fast convergence is an important and challenging problem. This paper makes a step towards the systematic design of such optimizers through the lens of structured Fisher information matrix (FIM) approximation. We show that many state-of-the-art efficient optimizers can be viewed as solutions to FIM approximation (under the Frobenius norm) with specific structural assumptions. Building on these insights, we propose two design recommendations of practical efficient optimizers for LLMs, involving the careful selection of structural assumptions to balance generality and efficiency, and enhancing memory efficiency of optimizers with general structures through a novel low-rank extension framework. We demonstrate how to use each design approach by deriving new memory-efficient optimizers: Row and Column Scaled SGD (RACS) and Adaptive low-dimensional subspace estimation (Alice). Experiments on LLaMA pre-training (up to 1B parameters) validate the effectiveness, showing faster and better convergence than existing memory-efficient baselines and Adam with little memory overhead. Notably, Alice achieves better than 2x faster convergence over Adam, while RACS delivers strong performance on the 1B model with SGD-like memory.
△ Less
Submitted 20 February, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Gradient Multi-Normalization for Stateless and Scalable LLM Training
Authors:
Meyer Scetbon,
Chao Ma,
Wenbo Gong,
Edward Meeds
Abstract:
Training large language models (LLMs) typically relies on adaptive optimizers like Adam (Kingma & Ba, 2015) which store additional state information to accelerate convergence but incur significant memory overhead. Recent efforts, such as SWAN (Ma et al., 2024) address this by eliminating the need for optimizer states while achieving performance comparable to Adam via a multi-step preprocessing pro…
▽ More
Training large language models (LLMs) typically relies on adaptive optimizers like Adam (Kingma & Ba, 2015) which store additional state information to accelerate convergence but incur significant memory overhead. Recent efforts, such as SWAN (Ma et al., 2024) address this by eliminating the need for optimizer states while achieving performance comparable to Adam via a multi-step preprocessing procedure applied to instantaneous gradients. Motivated by the success of SWAN, we introduce a novel framework for designing stateless optimizers that normalizes stochastic gradients according to multiple norms. To achieve this, we propose a simple alternating scheme to enforce the normalization of gradients w.r.t these norms. We show that our procedure can produce, up to an arbitrary precision, a fixed-point of the problem, and that SWAN is a particular instance of our approach with carefully chosen norms, providing a deeper understanding of its design. However, SWAN's computationally expensive whitening/orthogonalization step limit its practicality for large LMs. Using our principled perspective, we develop of a more efficient, scalable, and practical stateless optimizer. Our algorithm relaxes the properties of SWAN, significantly reducing its computational cost while retaining its memory efficiency, making it applicable to training large-scale models. Experiments on pre-training LLaMA models with up to 1 billion parameters demonstrate a 3X speedup over Adam with significantly reduced memory requirements, outperforming other memory-efficient baselines.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Adaptivity can help exponentially for shadow tomography
Authors:
Sitan Chen,
Weiyuan Gong,
Zhihan Zhang
Abstract:
In recent years there has been significant interest in understanding the statistical complexity of learning from quantum data under the constraint that one can only make unentangled measurements. While a key challenge in establishing tight lower bounds in this setting is to deal with the fact that the measurements can be chosen in an adaptive fashion, a recurring theme has been that adaptivity off…
▽ More
In recent years there has been significant interest in understanding the statistical complexity of learning from quantum data under the constraint that one can only make unentangled measurements. While a key challenge in establishing tight lower bounds in this setting is to deal with the fact that the measurements can be chosen in an adaptive fashion, a recurring theme has been that adaptivity offers little advantage over more straightforward, nonadaptive protocols.
In this note, we offer a counterpoint to this. We show that for the basic task of shadow tomography, protocols that use adaptively chosen two-copy measurements can be exponentially more sample-efficient than any protocol that uses nonadaptive two-copy measurements.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training
Authors:
Chao Ma,
Wenbo Gong,
Meyer Scetbon,
Edward Meeds
Abstract:
Adaptive optimizers such as Adam (Kingma & Ba, 2015) have been central to the success of large language models. However, they often require to maintain optimizer states throughout training, which can result in memory requirements several times greater than the model footprint. This overhead imposes constraints on scalability and computational efficiency. Stochastic Gradient Descent (SGD), in contr…
▽ More
Adaptive optimizers such as Adam (Kingma & Ba, 2015) have been central to the success of large language models. However, they often require to maintain optimizer states throughout training, which can result in memory requirements several times greater than the model footprint. This overhead imposes constraints on scalability and computational efficiency. Stochastic Gradient Descent (SGD), in contrast, is a stateless optimizer, as it does not track state variables during training. Consequently, it achieves optimal memory efficiency. However, its capability in LLM training is limited (Zhao et al., 2024b). In this work, we show that pre-processing SGD in a stateless manner can achieve the same performance as the Adam optimizer for LLM training, while drastically reducing the memory cost. Specifically, we propose to pre-process the instantaneous stochastic gradients using normalization and whitening. We show that normalization stabilizes gradient distributions, and whitening counteracts the local curvature of the loss landscape. This results in SWAN (SGD with Whitening And Normalization), a stochastic optimizer that eliminates the need to store any optimizer states. Empirically, SWAN has the same memory footprint as SGD, achieving $\approx 50\%$ reduction on total end-to-end memory compared to Adam. In language modeling tasks, SWAN demonstrates comparable or even better performance than Adam: when pre-training the LLaMA model with 350M and 1.3B parameters, SWAN achieves a 2x speedup by reaching the same evaluation perplexity using half as many tokens.
△ Less
Submitted 21 February, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
Identification of Epileptic Spasms (ESES) Phases Using EEG Signals: A Vision Transformer Approach
Authors:
Wei Gong,
Yaru Li
Abstract:
This work introduces a new approach to the Epileptic Spasms (ESES) detection based on the EEG signals using Vision Transformers (ViT). Classic ESES detection approaches have usually been performed with manual processing or conventional algorithms, suffering from poor sample sizes, single-channel-based analyses, and low generalization abilities. In contrast, the proposed ViT model overcomes these l…
▽ More
This work introduces a new approach to the Epileptic Spasms (ESES) detection based on the EEG signals using Vision Transformers (ViT). Classic ESES detection approaches have usually been performed with manual processing or conventional algorithms, suffering from poor sample sizes, single-channel-based analyses, and low generalization abilities. In contrast, the proposed ViT model overcomes these limitations by using the attention mechanism to focus on the important features in multi-channel EEG data, which is contributing to both better accuracy and efficiency. The model processes frequency-domain representations of EEG signals, such as spectrograms, as image data to capture long-range dependencies and complex patterns in the signal. The model demonstrates high performance with an accuracy of 97% without requiring intensive data preprocessing, thus rendering it suitable for real-time clinical applications on a large scale. The method represents a significant development in the advancement of neurological disorders such as ESES in detection and analysis.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Optimizing Student Ability Assessment: A Hierarchy Constraint-Aware Cognitive Diagnosis Framework for Educational Contexts
Authors:
Xinjie Sun,
Qi Liu,
Kai Zhang,
Shuanghong Shen,
Fei Wang,
Yan Zhuang,
Zheng Zhang,
Weiyin Gong,
Shijin Wang,
Lina Yang,
Xingying Huo
Abstract:
Cognitive diagnosis (CD) aims to reveal students' proficiency in specific knowledge concepts. With the increasing adoption of intelligent education applications, accurately assessing students' knowledge mastery has become an urgent challenge. Although existing cognitive diagnosis frameworks enhance diagnostic accuracy by analyzing students' explicit response records, they primarily focus on indivi…
▽ More
Cognitive diagnosis (CD) aims to reveal students' proficiency in specific knowledge concepts. With the increasing adoption of intelligent education applications, accurately assessing students' knowledge mastery has become an urgent challenge. Although existing cognitive diagnosis frameworks enhance diagnostic accuracy by analyzing students' explicit response records, they primarily focus on individual knowledge state, failing to adequately reflect the relative ability performance of students within hierarchies. To address this, we propose the Hierarchy Constraint-Aware Cognitive Diagnosis Framework (HCD), designed to more accurately represent student ability performance within real educational contexts. Specifically, the framework introduces a hierarchy mapping layer to identify students' levels. It then employs a hierarchy convolution-enhanced attention layer for in-depth analysis of knowledge concepts performance among students at the same level, uncovering nuanced differences. A hierarchy inter-sampling attention layer captures performance differences across hierarchies, offering a comprehensive understanding of the relationships among students' knowledge state. Finally, through personalized diagnostic enhancement, the framework integrates hierarchy constraint perception features with existing models, improving the representation of both individual and group characteristics. This approach enables precise inference of students' knowledge state. Research shows that this framework not only reasonably constrains changes in students' knowledge states to align with real educational settings, but also supports the scientific rigor and fairness of educational assessments, thereby advancing the field of cognitive diagnosis.
△ Less
Submitted 21 November, 2024;
originally announced December 2024.
-
ORB-SLAM3AB: Augmenting ORB-SLAM3 to Counteract Bumps with Optical Flow Inter-frame Matching
Authors:
Yangrui Dong,
Weisheng Gong,
Qingyong Li,
Kaijie Su,
Chen He,
Z. Jane Wang
Abstract:
This paper proposes an enhancement to the ORB-SLAM3 algorithm, tailored for applications on rugged road surfaces. Our improved algorithm adeptly combines feature point matching with optical flow methods, capitalizing on the high robustness of optical flow in complex terrains and the high precision of feature points on smooth surfaces. By refining the inter-frame matching logic of ORB-SLAM3, we hav…
▽ More
This paper proposes an enhancement to the ORB-SLAM3 algorithm, tailored for applications on rugged road surfaces. Our improved algorithm adeptly combines feature point matching with optical flow methods, capitalizing on the high robustness of optical flow in complex terrains and the high precision of feature points on smooth surfaces. By refining the inter-frame matching logic of ORB-SLAM3, we have addressed the issue of frame matching loss on uneven roads. To prevent a decrease in accuracy, an adaptive matching mechanism has been incorporated, which increases the reliance on optical flow points during periods of high vibration, thereby effectively maintaining SLAM precision. Furthermore, due to the scarcity of multi-sensor datasets suitable for environments with bumpy roads or speed bumps, we have collected LiDAR and camera data from such settings. Our enhanced algorithm, ORB-SLAM3AB, was then benchmarked against several advanced open-source SLAM algorithms that rely solely on laser or visual data. Through the analysis of Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) metrics, our results demonstrate that ORB-SLAM3AB achieves superior robustness and accuracy on rugged road surfaces.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach
Authors:
Weiyi Gong,
Tao Sun,
Hexin Bai,
Jeng-Yuan Tsai,
Haibin Ling,
Qimin Yan
Abstract:
Predicting electronic band structures from crystal structures is crucial for understanding structure-property correlations in materials science. First-principles approaches are accurate but computationally intensive. Recent years, machine learning (ML) has been extensively applied to this field, while existing ML models predominantly focus on band gap predictions or indirect band structure estimat…
▽ More
Predicting electronic band structures from crystal structures is crucial for understanding structure-property correlations in materials science. First-principles approaches are accurate but computationally intensive. Recent years, machine learning (ML) has been extensively applied to this field, while existing ML models predominantly focus on band gap predictions or indirect band structure estimation via solving predicted Hamiltonians. An end-to-end model to predict band structure accurately and efficiently is still lacking. Here, we introduce a graph Transformer-based end-to-end approach that directly predicts band structures from crystal structures with high accuracy. Our method leverages the continuity of the k-path and treat continuous bands as a sequence. We demonstrate that our model not only provides accurate band structure predictions but also can derive other properties (such as band gap, band center, and band dispersion) with high accuracy. We verify the model performance on large and diverse datasets.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
An Architectural Approach to Enhance Deep Long-Tailed Learning
Authors:
Yuhan Pan,
Yanan Sun,
Wei Gong
Abstract:
Deep long-tailed recognition has been widely studied to address the issue of imbalanced data distributions in real-world scenarios. However, there has been insufficient focus on the design of neural architectures, despite empirical evidence suggesting that architecture can significantly impact performance. In this paper, we attempt to mitigate long-tailed issues through architectural improvements.…
▽ More
Deep long-tailed recognition has been widely studied to address the issue of imbalanced data distributions in real-world scenarios. However, there has been insufficient focus on the design of neural architectures, despite empirical evidence suggesting that architecture can significantly impact performance. In this paper, we attempt to mitigate long-tailed issues through architectural improvements. To simplify the design process, we utilize Differential Architecture Search (DARTS) to achieve this goal. Unfortunately, existing DARTS methods struggle to perform well in long-tailed scenarios. To tackle this challenge, we introduce Long-Tailed Differential Architecture Search (LTDAS). Specifically, we conduct extensive experiments to explore architectural components that demonstrate better performance on long-tailed data and propose a new search space based on our observations. This ensures that the architecture obtained through our search process incorporates superior components. Additionally, we propose replacing the learnable linear classifier with an Equiangular Tight Frame (ETF) classifier to further enhance our method. This classifier effectively alleviates the biased search process and prevents performance collapse. Extensive experimental evaluations demonstrate that our approach consistently improves upon existing methods from an orthogonal perspective and achieves state-of-the-art results with simple enhancements.
△ Less
Submitted 2 December, 2024; v1 submitted 9 November, 2024;
originally announced November 2024.
-
On the sample complexity of purity and inner product estimation
Authors:
Weiyuan Gong,
Jonas Haferkamp,
Qi Ye,
Zhihan Zhang
Abstract:
We study the sample complexity of the prototypical tasks quantum purity estimation and quantum inner product estimation. In purity estimation, we are to estimate $tr(ρ^2)$ of an unknown quantum state $ρ$ to additive error $ε$. Meanwhile, for quantum inner product estimation, Alice and Bob are to estimate $tr(ρσ)$ to additive error $ε$ given copies of unknown quantum state $ρ$ and $σ$ using classic…
▽ More
We study the sample complexity of the prototypical tasks quantum purity estimation and quantum inner product estimation. In purity estimation, we are to estimate $tr(ρ^2)$ of an unknown quantum state $ρ$ to additive error $ε$. Meanwhile, for quantum inner product estimation, Alice and Bob are to estimate $tr(ρσ)$ to additive error $ε$ given copies of unknown quantum state $ρ$ and $σ$ using classical communication and restricted quantum communication.
In this paper, we show a strong connection between the sample complexity of purity estimation with bounded quantum memory and inner product estimation with bounded quantum communication and unentangled measurements. We propose a protocol that solves quantum inner product estimation with $k$-qubit one-way quantum communication and unentangled local measurements using $O(median\{1/ε^2,2^{n/2}/ε,2^{n-k}/ε^2\})$ copies of $ρ$ and $σ$. Our protocol can be modified to estimate the purity of an unknown quantum state $ρ$ using $k$-qubit quantum memory with the same complexity. We prove that arbitrary protocols with $k$-qubit quantum memory that estimate purity to error $ε$ require $Ω(median\{1/ε^2,2^{n/2}/\sqrtε,2^{n-k}/ε^2\})$ copies of $ρ$. This indicates the same lower bound for quantum inner product estimation with one-way $k$-qubit quantum communication and classical communication, and unentangled local measurements. For purity estimation, we further improve the lower bound to $Ω(\max\{1/ε^2,2^{n/2}/ε\})$ for any protocols using an identical single-copy projection-valued measurement.
Additionally, we investigate a decisional variant of quantum distributed inner product estimation without quantum communication for mixed state and provide a lower bound on the sample complexity.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare
Authors:
Nan Fang,
Guiliang Liu,
Wei Gong
Abstract:
Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Co…
▽ More
Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Constrained Reinforcement Learning (ICRL) is a promising approach that infers constraints from expert demonstrations. ICRL algorithms model Markovian decisions in an interactive environment. These settings do not align with the practical requirement of a decision-making system in healthcare, where decisions rely on historical treatment recorded in an offline dataset. To tackle these issues, we propose the Constraint Transformer (CT). Specifically, 1) we utilize a causal attention mechanism to incorporate historical decisions and observations into the constraint modeling, while employing a Non-Markovian layer for weighted constraints to capture critical states. 2) A generative world model is used to perform exploratory data augmentation, enabling offline RL methods to simulate unsafe decision sequences. In multiple medical scenarios, empirical results demonstrate that CT can capture unsafe states and achieve strategies that approximate lower mortality rates, reducing the occurrence probability of unsafe behaviors.
△ Less
Submitted 14 October, 2024; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Extended convexity and smoothness and their applications in deep learning
Authors:
Binchuan Qi,
Wei Gong,
Li Li
Abstract:
Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This study aims to elucidate the mechanisms of non-convex optimization in deep learning by extending the conventional notions of strong convexity and Lipschitz smooth…
▽ More
Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This study aims to elucidate the mechanisms of non-convex optimization in deep learning by extending the conventional notions of strong convexity and Lipschitz smoothness. By leveraging these concepts, we prove that, under the established constraints, the empirical risk minimization problem is equivalent to optimizing the local gradient norm and structural error, which together constitute the upper and lower bounds of the empirical risk. Furthermore, our analysis demonstrates that the stochastic gradient descent (SGD) algorithm can effectively minimize the local gradient norm. Additionally, techniques like skip connections, over-parameterization, and random parameter initialization are shown to help control the structural error. Ultimately, we validate the core conclusions of this paper through extensive experiments. Theoretical analysis and experimental results indicate that our findings provide new insights into the mechanisms of non-convex optimization in deep learning.
△ Less
Submitted 30 April, 2025; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages
Authors:
Feige Liu,
Xin Li,
Chao Lu,
Wenying Gong
Abstract:
Parallel batch processing machines have extensive applications in the semiconductor manufacturing process. However, the problem models in previous studies regard parallel batch processing as a fixed processing stage in the machining process. This study generalizes the problem model, in which users can arbitrarily set certain stages as parallel batch processing stages according to their needs. A Hy…
▽ More
Parallel batch processing machines have extensive applications in the semiconductor manufacturing process. However, the problem models in previous studies regard parallel batch processing as a fixed processing stage in the machining process. This study generalizes the problem model, in which users can arbitrarily set certain stages as parallel batch processing stages according to their needs. A Hybrid Flow Shop Scheduling Problem with Parallel Batch Processing Machines (PBHFSP) is solved in this paper. Furthermore, an Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm (AMOEA/D) is designed to simultaneously optimize both makespan and Total Energy Consumption (TEC). Firstly, a hybrid initialization strategy with heuristic rules based on knowledge of PBHFSP is proposed to generate promising solutions. Secondly, the disjunctive graph model has been established based on the knowledge to find the critical-path of PBHFS. Then, a critical-path based neighborhood search is proposed to enhance the exploitation ability of AMOEA/D. Moreover, the search time is adaptively adjusted based on learning experience from Q-learning and Decay Law. Afterward, to enhance the exploration capability of the algorithm, AMOEA/D designs an improved population updating strategy with a weight vector updating strategy. These strategies rematch individuals with weight vectors, thereby maintaining the diversity of the population. Finally, the proposed algorithm is compared with state-of-the-art algorithms. The experimental results show that the AMOEA/D is superior to the comparison algorithms in solving the PBHFSP.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers
Authors:
Luoyu Mei,
Shuai Wang,
Yun Cheng,
Ruofeng Liu,
Zhimeng Yin,
Wenchao Jiang,
Shuai Wang,
Wei Gong
Abstract:
Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a…
▽ More
Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver
Authors:
Qiwei Wang,
Wei Gong
Abstract:
Backscatter communication has attracted significant attention for Internet-of-Things applications due to its ultra-low-power consumption. The state-of-the-art backscatter systems no longer require dedicated carrier generators and leverage ambient signals as carriers. However, there is an emerging challenge: most prior systems need dual receivers to capture the original and backscattered signals at…
▽ More
Backscatter communication has attracted significant attention for Internet-of-Things applications due to its ultra-low-power consumption. The state-of-the-art backscatter systems no longer require dedicated carrier generators and leverage ambient signals as carriers. However, there is an emerging challenge: most prior systems need dual receivers to capture the original and backscattered signals at the same time for tag data demodulation. This is not conducive to the widespread deployment of backscatter communication. To address this problem, we present double-decker, a novel backscatter system that only requires a single commercial device for backscatter communication. The key technology of double-decker is to divide the carrier OFDM symbols into two parts, which are pilot symbols and data symbols. Pilot symbols can be used as reference signals for tag data demodulation, thus getting rid of the dependence on the dual receiver structure. We have built an FPGA prototype and conducted extensive experiments. Empirical results show that when the excitation signal is 802.11g, double-decker achieves a tag data rate of 35.2kbps and a productive data rate of 38kbps, respectively. The communication range of double-decker is up to 28m in LOS deployment and 24m in NLOS deployment.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation
Authors:
Sitan Chen,
Weiyuan Gong,
Qi Ye,
Zhihan Zhang
Abstract:
We study the task of agnostic tomography: given copies of an unknown $n$-qubit state $ρ$ which has fidelity $τ$ with some state in a given class $C$, find a state which has fidelity $\ge τ- ε$ with $ρ$. We give a new framework, stabilizer bootstrapping, for designing computationally efficient protocols for this task, and use this to get new agnostic tomography protocols for the following classes:…
▽ More
We study the task of agnostic tomography: given copies of an unknown $n$-qubit state $ρ$ which has fidelity $τ$ with some state in a given class $C$, find a state which has fidelity $\ge τ- ε$ with $ρ$. We give a new framework, stabilizer bootstrapping, for designing computationally efficient protocols for this task, and use this to get new agnostic tomography protocols for the following classes:
Stabilizer states: We give a protocol that runs in time $\mathrm{poly}(n,1/ε)\cdot (1/τ)^{O(\log(1/τ))}$, answering an open question posed by Grewal, Iyer, Kretschmer, Liang [43] and Anshu and Arunachalam [6]. Previous protocols ran in time $\mathrm{exp}(Θ(n))$ or required $τ>\cos^2(π/8)$.
States with stabilizer dimension $n - t$: We give a protocol that runs in time $n^3\cdot(2^t/τ)^{O(\log(1/ε))}$, extending recent work on learning quantum states prepared by circuits with few non-Clifford gates, which only applied in the realizable setting where $τ= 1$ [33, 40, 49, 66].
Discrete product states: If $C = K^{\otimes n}$ for some $μ$-separated discrete set $K$ of single-qubit states, we give a protocol that runs in time $(n/μ)^{O((1 + \log (1/τ))/μ)}/ε^2$. This strictly generalizes a prior guarantee which applied to stabilizer product states [42]. For stabilizer product states, we give a further improved protocol that runs in time $(n^2/ε^2)\cdot (1/τ)^{O(\log(1/τ))}$.
As a corollary, we give the first protocol for estimating stabilizer fidelity, a standard measure of magic for quantum states, to error $ε$ in $n^3 \mathrm{quasipoly}(1/ε)$ time.
△ Less
Submitted 4 December, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices
Authors:
Linxiao Cao,
Yifei Zhu,
Wei Gong
Abstract:
Large pre-trained models have exhibited remarkable achievements across various domains. The substantial training costs associated with these models have led to wide studies of fine-tuning for effectively harnessing their capabilities in solving downstream tasks. Yet, conventional fine-tuning approaches become infeasible when the model lacks access to downstream data due to privacy concerns. Naivel…
▽ More
Large pre-trained models have exhibited remarkable achievements across various domains. The substantial training costs associated with these models have led to wide studies of fine-tuning for effectively harnessing their capabilities in solving downstream tasks. Yet, conventional fine-tuning approaches become infeasible when the model lacks access to downstream data due to privacy concerns. Naively integrating fine-tuning approaches with the emerging federated learning frameworks incurs substantial communication overhead and exerts high demand on local computing resources, making it impractical for common resource-limited devices. In this paper, we introduce SFPrompt, an innovative privacy-preserving fine-tuning method tailored for the federated setting where direct uploading of raw data is prohibited and local devices are resource-constrained to run a complete pre-trained model. In essence, SFPrompt judiciously combines split learning with federated learning to handle these challenges. Specifically, the pre-trained model is first partitioned into client and server components, thereby streamlining the client-side model and substantially alleviating computational demands on local resources. SFPrompt then introduces soft prompts into the federated model to enhance the fine-tuning performance. To further reduce communication costs, a novel dataset pruning algorithm and a local-loss update strategy are devised during the fine-tuning process. Extensive experiments demonstrate that SFPrompt delivers competitive performance as the federated full fine-tuning approach while consuming a mere 0.46% of local computing resources and incurring 53% less communication cost.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
VCoME: Verbal Video Composition with Multimodal Editing Effects
Authors:
Weibo Gong,
Xiaojie Jin,
Xin Li,
Dongliang He,
Xinglong Wu
Abstract:
Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating mult…
▽ More
Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating multimodal editing effects across textual, visual, and audio categories. To achieve this, we curate a large-scale dataset of video effects compositions from publicly available sources. We then formulate this task as a generative problem, involving the identification of appropriate positions in the verbal content and the recommendation of editing effects for these positions. To address this task, we propose VCoME, a general framework that employs a large multimodal model to generate editing effects for video composition. Specifically, VCoME takes in the multimodal video context and autoregressively outputs where to apply effects within the verbal content and which effects are most appropriate for each position. VCoME also supports prompt-based control of composition density and style, providing substantial flexibility for diverse applications. Through extensive quantitative and qualitative evaluations, we clearly demonstrate the effectiveness of VCoME. A comprehensive user study shows that our method produces videos of professional quality while being 85$\times$ more efficient than professional editors.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Probability Distribution Learning and Its Application in Deep Learning
Authors:
Binchuan Qi,
Wei Gong,
Li Li
Abstract:
This paper aims to elucidate the theoretical mechanisms underlying deep learning from a probability distribution estimation perspective, with Fenchel-Young Loss serving as the loss function. In our approach, the learning error , which measures the discrepancy between the model's predicted distribution and the posterior expectation of the true unknown distribution given sampling, is formulated as t…
▽ More
This paper aims to elucidate the theoretical mechanisms underlying deep learning from a probability distribution estimation perspective, with Fenchel-Young Loss serving as the loss function. In our approach, the learning error , which measures the discrepancy between the model's predicted distribution and the posterior expectation of the true unknown distribution given sampling, is formulated as the primary optimization objective. Therefore, the learning error can be regarded as the posterior expectation of the expected risk. As many important loss functions, such as Softmax Cross-Entropy Loss and Mean Squared Error Loss, are specific instances of Fenchel-Young Losses, this paper further theoretically demonstrates that Fenchel-Young Loss is a natural choice for machine learning tasks, thereby ensuring the broad applicability of the conclusions drawn in this work. In the case of using Fenchel-Young Loss, the paper proves that the model's fitting error is controlled by the gradient norm and structural error, thereby providing new insights into the mechanisms of non-convex optimization and various techniques employed in model training, such as over-parameterization and skip connections. Furthermore, it establishes model-independent bounds on the learning error, demonstrating that the correlation between features and labels (equivalent to mutual information) controls the upper bound of the model's generalization error. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, demonstrating its practical effectiveness.
△ Less
Submitted 23 April, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
GLADformer: A Mixed Perspective for Graph-level Anomaly Detection
Authors:
Fan Xu,
Nan Wang,
Hao Wu,
Xuezhi Wen,
Dalin Zhang,
Siyang Lu,
Binyong Li,
Wei Gong,
Hai Wan,
Xibin Zhao
Abstract:
Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level…
▽ More
Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level anomaly detector namely GLADformer, consisting of two key modules. Specifically, we first design a Graph Transformer module with global spectrum enhancement, which ensures balanced and resilient parameter distributions by fusing global features and spectral distribution characteristics. Furthermore, to uncover local anomalous attributes, we customize a band-pass spectral GNN message passing module that further enhances the model's generalization capability. Through comprehensive experiments on ten real-world datasets from multiple domains, we validate the effectiveness and robustness of GLADformer. This demonstrates that GLADformer outperforms current state-of-the-art models in graph-level anomaly detection, particularly in effectively capturing global anomaly representations and spectral characteristics.
△ Less
Submitted 3 July, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises
Authors:
Zhihan Zhang,
Weiyuan Gong,
Weikang Li,
Dong-Ling Deng
Abstract:
We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly wit…
▽ More
We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly with a larger-than-exponentially-small probability. This unconditional near-optimal quantum-classical separation originates from the quantum nonlocality property that distinguishes quantum circuits from their classical counterparts. We further derive the noise thresholds for demonstrating such a separation on near-term quantum devices under the depolarization noise model. We prove that this separation will persist if the noise strength is upper bounded by an inverse polynomial with respect to the system size, and vanish if the noise strength is greater than an inverse polylogarithmic function. In addition, for quantum devices with constant noise strength, we prove that no super-polynomial classical-quantum separation exists for any classification task defined by shallow Clifford circuits, independent of the structures of the circuits that specify the learning models.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Optimal tradeoffs for estimating Pauli observables
Authors:
Sitan Chen,
Weiyuan Gong,
Qi Ye
Abstract:
We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $ρ$, estimate $\text{tr}(Pρ)$ for some set of Pauli operators $P$ to within additive error $ε$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate…
▽ More
We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $ρ$, estimate $\text{tr}(Pρ)$ for some set of Pauli operators $P$ to within additive error $ε$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate $|\text{tr}(Pρ)|$ for all $P$ using $O(n/ε^4)$ copies, but with $k\le n$ qubits of memory, $Ω(2^{(n-k)/3})$ copies are needed.
These results leave open several natural questions. How does this picture change in the physically relevant setting where one only needs to estimate a certain subset of Paulis? What is the optimal dependence on $ε$? What is the optimal tradeoff between quantum memory and sample complexity?
We answer all of these questions. For any subset $A$ of Paulis and any family of measurement strategies, we completely characterize the optimal sample complexity, up to $\log |A|$ factors. We show any protocol that makes $\text{poly}(n)$-copy measurements must make $Ω(1/ε^4)$ measurements. For any protocol that makes $\text{poly}(n)$-copy measurements and only has $k < n$ qubits of memory, we show that $\widetildeΘ(\min\{2^n/ε^2, 2^{n-k}/ε^4\})$ copies are necessary and sufficient.
The protocols we propose can also estimate the actual values $\text{tr}(Pρ)$, rather than just their absolute values as in prior work. Additionally, as a byproduct of our techniques, we establish tight bounds for the task of purity testing and show that it exhibits an intriguing phase transition not present in the memory-sample tradeoff for Pauli shadow tomography.
△ Less
Submitted 14 November, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
A Survey of Bluetooth Indoor Localization
Authors:
Taolei Shi,
Wei Gong
Abstract:
Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the develop…
▽ More
Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the development of Bluetooth location technology is slow and there are not many papers and surveys in this field, although the performance and market value of Bluetooth are increasing steadily. In this paper, we aim to provide a detailed survey of various indoor localization systems with Bluetooth. In contrast with the existing surveys, we categorize the exciting localization techniques that have been proposed in the literature in order to sketch the development of Bluetooth location compared to other technologies. We also evaluate different systems from the perspective of availability, cost, scalability, and accuracy. We also discuss remaining problems and challenges to accurate Bluetooth localization.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
DIDLM: A SLAM Dataset for Difficult Scenarios Featuring Infrared, Depth Cameras, LIDAR, 4D Radar, and Others under Adverse Weather, Low Light Conditions, and Rough Roads
Authors:
Weisheng Gong,
Kaijie Su,
Qingyong Li,
Chen He,
Tong Wu,
Z. Jane Wang
Abstract:
Adverse weather conditions, low-light environments, and bumpy road surfaces pose significant challenges to SLAM in robotic navigation and autonomous driving. Existing datasets in this field predominantly rely on single sensors or combinations of LiDAR, cameras, and IMUs. However, 4D millimeter-wave radar demonstrates robustness in adverse weather, infrared cameras excel in capturing details under…
▽ More
Adverse weather conditions, low-light environments, and bumpy road surfaces pose significant challenges to SLAM in robotic navigation and autonomous driving. Existing datasets in this field predominantly rely on single sensors or combinations of LiDAR, cameras, and IMUs. However, 4D millimeter-wave radar demonstrates robustness in adverse weather, infrared cameras excel in capturing details under low-light conditions, and depth images provide richer spatial information. Multi-sensor fusion methods also show potential for better adaptation to bumpy roads. Despite some SLAM studies incorporating these sensors and conditions, there remains a lack of comprehensive datasets addressing low-light environments and bumpy road conditions, or featuring a sufficiently diverse range of sensor data. In this study, we introduce a multi-sensor dataset covering challenging scenarios such as snowy weather, rainy weather, nighttime conditions, speed bumps, and rough terrains. The dataset includes rarely utilized sensors for extreme conditions, such as 4D millimeter-wave radar, infrared cameras, and depth cameras, alongside 3D LiDAR, RGB cameras, GPS, and IMU. It supports both autonomous driving and ground robot applications and provides reliable GPS/INS ground truth data, covering structured and semi-structured terrains. We evaluated various SLAM algorithms using this dataset, including RGB images, infrared images, depth images, LiDAR, and 4D millimeter-wave radar. The dataset spans a total of 18.5 km, 69 minutes, and approximately 660 GB, offering a valuable resource for advancing SLAM research under complex and extreme conditions. Our dataset is available at https://github.com/GongWeiSheng/DIDLM.
△ Less
Submitted 14 January, 2025; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Accurately Predicting Probabilities of Safety-Critical Rare Events for Intelligent Systems
Authors:
Ruoxuan Bai,
Jingxuan Yang,
Weiduo Gong,
Yi Zhang,
Qiujing Lu,
Shuo Feng
Abstract:
Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality…
▽ More
Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality arises from the extreme data imbalance caused by rare events in high dimensional variables associated with the rare events, a challenge we refer to as the curse of rarity. Existing methods tend to be either overly conservative or prone to overlooking safety-critical events, thus struggling to achieve both high precision and recall rates, which severely limits their applicability. This study endeavors to develop a criticality prediction model that excels in both precision and recall rates for evaluating the criticality of safety-critical autonomous systems. We propose a multi-stage learning framework designed to progressively densify the dataset, mitigating the curse of rarity across stages. To validate our approach, we evaluate it in two cases: lunar lander and bipedal walker scenarios. The results demonstrate that our method surpasses traditional approaches, providing a more accurate and dependable assessment of criticality in intelligent systems.
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Lightweight Object Detection: A Study Based on YOLOv7 Integrated with ShuffleNetv2 and Vision Transformer
Authors:
Wenkai Gong
Abstract:
As mobile computing technology rapidly evolves, deploying efficient object detection algorithms on mobile devices emerges as a pivotal research area in computer vision. This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms while ensuring high accuracy. Leveraging a synergy of advanced techniques such as Group Convolution, ShuffleN…
▽ More
As mobile computing technology rapidly evolves, deploying efficient object detection algorithms on mobile devices emerges as a pivotal research area in computer vision. This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms while ensuring high accuracy. Leveraging a synergy of advanced techniques such as Group Convolution, ShuffleNetV2, and Vision Transformer, this research has effectively minimized the model's parameter count and memory usage, streamlined the network architecture, and fortified the real-time object detection proficiency on resource-constrained devices. The experimental outcomes reveal that the refined YOLO model demonstrates exceptional performance, markedly enhancing processing velocity while sustaining superior detection accuracy.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Constrained Multi-objective Optimization with Deep Reinforcement Learning Assisted Operator Selection
Authors:
Fei Ming,
Wenyin Gong,
Ling Wang,
Yaochu Jin
Abstract:
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the opera…
▽ More
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the operators used, however, it is usually difficult to select suitable operators for the problem at hand. Hence, improving operator selection is promising and necessary for CMOEAs. This work proposes an online operator selection framework assisted by Deep Reinforcement Learning. The dynamics of the population, including convergence, diversity, and feasibility, are regarded as the state; the candidate operators are considered as actions; and the improvement of the population state is treated as the reward. By using a Q-Network to learn a policy to estimate the Q-values of all actions, the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance. The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems. The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.
△ Less
Submitted 15 January, 2024;
originally announced February 2024.
-
The Essential Role of Causality in Foundation World Models for Embodied AI
Authors:
Tarun Gupta,
Wenbo Gong,
Chao Ma,
Nick Pawlowski,
Agrin Hilmkil,
Meyer Scetbon,
Marc Rigter,
Ade Famoti,
Ashley Juan Llorens,
Jianfeng Gao,
Stefan Bauer,
Danica Kragic,
Bernhard Schölkopf,
Cheng Zhang
Abstract:
Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E…
▽ More
Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research.
△ Less
Submitted 29 April, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming
Authors:
Jiayang Bai,
Letian Huang,
Jie Guo,
Wen Gong,
Yuanqi Li,
Yanwen Guo
Abstract:
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto th…
▽ More
3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of ${360^\circ}$ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (e.g., walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel $360^{\circ}$ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
RecDCL: Dual Contrastive Learning for Recommendation
Authors:
Dan Zhang,
Yangliao Geng,
Wenwen Gong,
Zhongang Qi,
Zhiyu Chen,
Xing Tang,
Ying Shan,
Yuxiao Dong,
Jie Tang
Abstract:
Self-supervised learning (SSL) has recently achieved great success in mining the user-item interactions for collaborative filtering. As a major paradigm, contrastive learning (CL) based SSL helps address data sparsity in Web platforms by contrasting the embeddings between raw and augmented data. However, existing CL-based methods mostly focus on contrasting in a batch-wise way, failing to exploit…
▽ More
Self-supervised learning (SSL) has recently achieved great success in mining the user-item interactions for collaborative filtering. As a major paradigm, contrastive learning (CL) based SSL helps address data sparsity in Web platforms by contrasting the embeddings between raw and augmented data. However, existing CL-based methods mostly focus on contrasting in a batch-wise way, failing to exploit potential regularity in the feature dimension. This leads to redundant solutions during the representation learning of users and items. In this work, we investigate how to employ both batch-wise CL (BCL) and feature-wise CL (FCL) for recommendation. We theoretically analyze the relation between BCL and FCL, and find that combining BCL and FCL helps eliminate redundant solutions but never misses an optimal solution. We propose a dual contrastive learning recommendation framework -- RecDCL. In RecDCL, the FCL objective is designed to eliminate redundant solutions on user-item positive pairs and to optimize the uniform distributions within users and items using a polynomial kernel for driving the representations to be orthogonal; The BCL objective is utilized to generate contrastive embeddings on output vectors for enhancing the robustness of the representations. Extensive experiments on four widely-used benchmarks and one industry dataset demonstrate that RecDCL can consistently outperform the state-of-the-art GNNs-based and SSL-based models (with an improvement of up to 5.65\% in terms of Recall@20). The source code is publicly available (https://github.com/THUDM/RecDCL).
△ Less
Submitted 18 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Exploring consumers response to text-based chatbots in e-commerce: The moderating role of task complexity and chatbot disclosure
Authors:
Xusen Cheng,
Ying Bao,
Alex Zarifis,
Wankun Gong,
Jian Mou
Abstract:
Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to…
▽ More
Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to test the hypotheses. First, the consumers perception of both the empathy and friendliness of the chatbot positively impacts their trust in it. Second, task complexity negatively moderates the relationship between friendliness and consumers trust. Third, disclosure of the text based chatbot negatively moderates the relationship between empathy and consumers trust, while it positively moderates the relationship between friendliness and consumers trust. Fourth, consumers trust in the chatbot increases their reliance on the chatbot and decreases their resistance to the chatbot in future interactions. Adopting the stimulus organism response framework, this study provides important insights on consumers perception and response to the text-based chatbot. The findings of this research also make suggestions that can increase consumers positive responses to text based chatbots. Extant studies have investigated the effects of automated bots attributes on consumers perceptions. However, the boundary conditions of these effects are largely ignored. This research is one of the first attempts to provide a deep understanding of consumers responses to a chatbot.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Complexity of Digital Quantum Simulation in the Low-Energy Subspace: Applications and a Lower Bound
Authors:
Weiyuan Gong,
Shuo Zhou,
Tongyang Li
Abstract:
Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show tha…
▽ More
Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show that the simulation error depends on the effective low-energy norm of the Hamiltonian for a variety of digital quantum simulation algorithms and quantum systems, allowing improvements over the previous complexities for full unitary simulations even for imperfect state preparations due to thermalization. In particular, for simulating spin models in the low-energy subspace, we prove that randomized product formulas such as qDRIFT and random permutation require smaller Trotter numbers. Such improvement also persists in symmetry-protected digital quantum simulations. We prove a similar improvement in simulating the dynamics of power-law quantum interactions. We also provide a query lower bound for general digital quantum simulations in the low-energy subspace.
△ Less
Submitted 11 July, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
MToP: A MATLAB Optimization Platform for Evolutionary Multitasking
Authors:
Yanchi Li,
Wenyin Gong,
Fei Ming,
Tingyu Zhang,
Shuijia Li,
Qiong Gu
Abstract:
Evolutionary multitasking (EMT) has emerged as a popular topic of evolutionary computation over the past decade. It aims to concurrently address multiple optimization tasks within limited computing resources, leveraging inter-task knowledge transfer techniques. Despite the abundance of multitask evolutionary algorithms (MTEAs) proposed for multitask optimization (MTO), there remains a comprehensiv…
▽ More
Evolutionary multitasking (EMT) has emerged as a popular topic of evolutionary computation over the past decade. It aims to concurrently address multiple optimization tasks within limited computing resources, leveraging inter-task knowledge transfer techniques. Despite the abundance of multitask evolutionary algorithms (MTEAs) proposed for multitask optimization (MTO), there remains a comprehensive software platform to help researchers evaluate MTEA performance on benchmark MTO problems as well as explore real-world applications. To bridge this gap, we introduce the first open-source optimization platform, named MTO-Platform (MToP), for EMT. MToP incorporates over 50 MTEAs, more than 200 MTO problem cases with real-world applications, and {over 20 performance metrics}. Moreover, to facilitate comparative analyses between MTEAs and traditional evolutionary algorithms, we adapted over 50 popular single-task evolutionary algorithms to address MTO problems. MToP boasts a user-friendly graphical interface, facilitating results analysis, data export, and schematics visualization. More importantly, MToP is designed with extensibility in mind, allowing users to develop new algorithms and tackle emerging problem domains. The source code of MToP is available at https://github.com/intLyc/MTO-Platform.
△ Less
Submitted 23 February, 2025; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Bridge the Present and Future: A Cross-Layer Matching Game in Dynamic Cloud-Aided Mobile Edge Networks
Authors:
Houyi Qi,
Minghui Liwang,
Xianbin Wang,
Li Li,
Wei Gong,
Jian Jin,
Zhenzhen Jiao
Abstract:
Cloud-aided mobile edge networks (CAMENs) allow edge servers (ESs) to purchase resources from remote cloud servers (CSs), while overcoming resource shortage when handling computation-intensive tasks of mobile users (MUs). Conventional trading mechanisms (e.g., onsite trading) confront many challenges, including decision-making overhead (e.g., latency) and potential trading failures. This paper inv…
▽ More
Cloud-aided mobile edge networks (CAMENs) allow edge servers (ESs) to purchase resources from remote cloud servers (CSs), while overcoming resource shortage when handling computation-intensive tasks of mobile users (MUs). Conventional trading mechanisms (e.g., onsite trading) confront many challenges, including decision-making overhead (e.g., latency) and potential trading failures. This paper investigates a series of cross-layer matching mechanisms to achieve stable and cost-effective resource provisioning across different layers (i.e., MUs, ESs, CSs), seamlessly integrated into a novel hybrid paradigm that incorporates futures and spot trading. In futures trading, we explore an overbooking-driven aforehand cross-layer matching (OA-CLM) mechanism, facilitating two future contract types: contract between MUs and ESs, and contract between ESs and CSs, while assessing potential risks under historical statistical analysis. In spot trading, we design two backup plans respond to current network/market conditions: determination on contractual MUs that should switch to local processing from edge/cloud services; and an onsite cross-layer matching (OS-CLM) mechanism that engages participants in real-time practical transactions. We next show that our matching mechanisms theoretically satisfy stability, individual rationality, competitive equilibrium, and weak Pareto optimality. Comprehensive simulations in real-world and numerical network settings confirm the corresponding efficacy, while revealing remarkable improvements in time/energy efficiency and social welfare.
△ Less
Submitted 8 June, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Neural Structure Learning with Stochastic Differential Equations
Authors:
Benjie Wang,
Joel Jennings,
Wenbo Gong
Abstract:
Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolv…
▽ More
Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolves in discrete-time and/or observations occur at regular time intervals. These mismatched assumptions can often lead to incorrect learned structures and models. In this work, we introduce a novel structure learning method, SCOTCH, which combines neural stochastic differential equations (SDE) with variational inference to infer a posterior distribution over possible structures. This continuous-time approach can naturally handle both learning from and predicting observations at arbitrary time points. Theoretically, we establish sufficient conditions for an SDE and SCOTCH to be structurally identifiable, and prove its consistency under infinite data limits. Empirically, we demonstrate that our approach leads to improved structure learning performance on both synthetic and real-world datasets compared to relevant baselines under regular and irregular sampling intervals.
△ Less
Submitted 5 May, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Efficient Pauli channel estimation with logarithmic quantum memory
Authors:
Sitan Chen,
Weiyuan Gong
Abstract:
Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work [14] proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentia…
▽ More
Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work [14] proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentially many measurements, provided it is non-concatenating. Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward.
This left open a natural question: does the lower bound hold even for general protocols, i.e. ones which chain together many queries to the channel, interleaved with arbitrary data-processing channels, before measuring? Surprisingly, in this work we show the opposite: there is a protocol that can estimate the eigenvalues of a Pauli channel to error $ε$ using only $O(\log n/ε^2)$ ancilla and $\tilde{O}(n^2/ε^2)$ measurements. In contrast, we show that any protocol with zero ancilla, even a concatenating one, must make $Ω(2^n/ε^2)$ measurements, which is tight.
Our results imply, to our knowledge, the first quantum learning task where logarithmically many qubits of quantum memory suffice for an exponential statistical advantage. Our protocol can be naturally extended to a protocol that learns the eigenvalues of Pauli terms within any subset $A$ of a Pauli channel with $O(\log\log(|A|)/ε^2)$ ancilla and $\tilde{O}(n^2/ε^2)$ measurements.
△ Less
Submitted 24 May, 2025; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Adaptive Bitrate Video Semantic Communication over Wireless Networks
Authors:
Wentao Gong,
Haonan Tong,
Sihua Wang,
Zhaohui Yang,
Xinxin He,
Changchuan Yin
Abstract:
This paper investigates the adaptive bitrate (ABR) video semantic communication over wireless networks. In the considered model, video sensing devices must transmit video semantic information to an edge server, to facilitate ubiquitous video sensing services such as road environment monitoring at the edge server in autonomous driving scenario. However, due to the varying wireless network condition…
▽ More
This paper investigates the adaptive bitrate (ABR) video semantic communication over wireless networks. In the considered model, video sensing devices must transmit video semantic information to an edge server, to facilitate ubiquitous video sensing services such as road environment monitoring at the edge server in autonomous driving scenario. However, due to the varying wireless network conditions, it is challenging to guarantee both low transmission delay and high semantic accuracy at the same time if devices continuously transmit a fixed bitrate video semantic information. To address this challenge, we develop an adaptive bitrate video semantic communication (ABRVSC) system, in which devices adaptively adjust the bitrate of video semantic information according to network conditions. Specifically, we first define the quality of experience (QoE) for video semantic communication. Subsequently, a swin transformer-based semantic codec is proposed to extract semantic information with considering the influence of QoE. Then, we propose an Actor-Critic based ABR algorithm for the semantic codec to enhance the robustness of the proposed ABRVSC scheme against network variations. Simulation results demonstrate that at low bitrates, the mean intersection over union (MIoU) of the proposed ABRVSC scheme is nearly twice that of the traditional scheme. Moreover, the proposed ABRVSC scheme, which increases the QoE in video semantic communication by 36.57%, exhibits more robustness against network variations compared to both the fixed bitrate schemes and traditional ABR schemes.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
BayesDAG: Gradient-Based Posterior Inference for Causal Discovery
Authors:
Yashas Annadani,
Nick Pawlowski,
Joel Jennings,
Stefan Bauer,
Cheng Zhang,
Wenbo Gong
Abstract:
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existin…
▽ More
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.
△ Less
Submitted 8 December, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Improved Digital Quantum Simulation by Non-Unitary Channels
Authors:
W. Gong,
Yaroslav Kharkov,
Minh C. Tran,
Przemyslaw Bienias,
Alexey V. Gorshkov
Abstract:
Simulating quantum systems is one of the most promising avenues to harness the computational power of quantum computers. However, hardware errors in noisy near-term devices remain a major obstacle for applications. Ideas based on the randomization of Suzuki-Trotter product formulas have been shown to be a powerful approach to reducing the errors of quantum simulation and lowering the gate count. I…
▽ More
Simulating quantum systems is one of the most promising avenues to harness the computational power of quantum computers. However, hardware errors in noisy near-term devices remain a major obstacle for applications. Ideas based on the randomization of Suzuki-Trotter product formulas have been shown to be a powerful approach to reducing the errors of quantum simulation and lowering the gate count. In this paper, we study the performance of non-unitary simulation channels and consider the error structure of channels constructed from a weighted average of unitary circuits. We show that averaging over just a few simulation circuits can significantly reduce the Trotterization error for both single-step short-time and multi-step long-time simulations. We focus our analysis on two approaches for constructing circuit ensembles for averaging: (i) permuting the order of the terms in the Hamiltonian and (ii) applying a set of global symmetry transformations. We compare our analytical error bounds to empirical performance and show that empirical error reduction surpasses our analytical estimates in most cases. Finally, we test our method on an IonQ trapped-ion quantum computer accessed via the Amazon Braket cloud platform, and benchmark the performance of the averaging approach.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.