-
A high-sensitivity frequency counter for free-induction-decay signals
Authors:
Tong Gong,
Ming-Rui Shu,
Jiang He,
Kai Liu,
Yi-Ren Li,
Xin-Jun Hao,
Dong Sheng,
Yu-Ming Wang,
Yu-Kun Feng
Abstract:
Real-time frequency readout of time-dependent pulsed signals with a high sensitivity are key elements in many applications using atomic devices, such as FID atomic magnetometers. In this paper, we propose a frequency measurement algorithm based on the Hilbert transform and implement such a scheme in a FPGA-based frequency counter. By testing pulsed exponential-decay oscillation signals in the freq…
▽ More
Real-time frequency readout of time-dependent pulsed signals with a high sensitivity are key elements in many applications using atomic devices, such as FID atomic magnetometers. In this paper, we propose a frequency measurement algorithm based on the Hilbert transform and implement such a scheme in a FPGA-based frequency counter. By testing pulsed exponential-decay oscillation signals in the frequency range of 10 to 500 kHz, this frequency counter shows a frequency sensitivity better than 0.1 mHz/Hz^(1/2) at 10 Hz, with an output rate of 200 Hz. When the output rate is increased to 1000 Hz, the sensitivity remains better than 0.4 mHz/Hz^(1/2) at 10 Hz. The performance on frequency sensitivity is comparable with results obtained by off-line nonlinear fitting processes. In addition, this frequency counter does not require the pre-knowledge of the analytic expression of the input signals. The realization of such a device paves the way for practical applications of highly-sensitive FID atomic magnetometers.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Rydberg Atomic Quantum MIMO Receivers for The Multi-User Uplink
Authors:
Tierui Gong,
Chau Yuen,
Chong Meng Samson See,
Mérouane Debbah,
Lajos Hanzo
Abstract:
Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution for evolving wireless receivers from the classical to the quantum domain. To further unleash their great potential in wireless communications, we propose a flexible architecture for Rydberg atomic quantum multiple-input multiple-output (RAQ-MIMO) receivers in the multi-user uplink. Then the corresponding signal model of…
▽ More
Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution for evolving wireless receivers from the classical to the quantum domain. To further unleash their great potential in wireless communications, we propose a flexible architecture for Rydberg atomic quantum multiple-input multiple-output (RAQ-MIMO) receivers in the multi-user uplink. Then the corresponding signal model of the RAQ-MIMO system is constructed by paving the way from quantum physics to classical wireless communications. Explicitly, we outline the associated operating principles and transmission flow. We also validate the linearity of our model and its feasible region. Based on our model, we derive closed-form asymptotic formulas for the ergodic achievable rate (EAR) of both the maximum-ratio combining (MRC) and zero-forcing (ZF) receivers operating in uncorrelated fading channels (UFC) and the correlated fading channels (CFC), respectively. Furthermore, we theoretically characterize the EAR difference both between the UFC and CFC scenarios, as well as MRC and ZF schemes. More particularly, we quantify the superiority of RAQ-MIMO receivers over the classical massive MIMO (M-MIMO) receivers, specifying an increase of $\log_{2} Π$ of the EAR per user, $Π$-fold reduction of the users' transmit power, and $\sqrt[ν]Π$-fold increase of the transmission distance, respectively, where $Π= \text{ReceiverGainRatio} / \text{ReceiverNoisePowerRatio}$ of the single-sensor receivers and $ν$ is the path-loss exponent. Our simulation results reveal that, compared to classical M-MIMO receivers, our RAQ-MIMO scheme can either realize $\sim 12$ bits/s/Hz/user ($\sim 8$ bits/s/Hz/user) higher EAR, or $\sim 10000$-fold ($\sim 500$-fold) lower transmit power, or alternatively, $\sim 100$-fold ($\sim 21$-fold) longer distance in free-space transmissions, in the standard quantum limit (photon shot limit).
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation
Authors:
Chenghao Yang,
Yinbo Luo,
Zhoufutu Wen,
Qi Chu,
Tao Gong,
Longxiang Liu,
Kaiyuan Zhang,
Jianpeng Jiao,
Ge Zhang,
Wenhao Huang,
Nenghai Yu
Abstract:
Large Language Models (\textbf{LLMs}), e.g. ChatGPT, have been widely adopted in real-world dialogue applications. However, LLMs' robustness, especially in handling long complex dialogue sessions, including frequent motivation transfer, sophisticated cross-turn dependency, is criticized all along. Nevertheless, no existing benchmarks can fully reflect these weaknesses. We present \textbf{MARS-Benc…
▽ More
Large Language Models (\textbf{LLMs}), e.g. ChatGPT, have been widely adopted in real-world dialogue applications. However, LLMs' robustness, especially in handling long complex dialogue sessions, including frequent motivation transfer, sophisticated cross-turn dependency, is criticized all along. Nevertheless, no existing benchmarks can fully reflect these weaknesses. We present \textbf{MARS-Bench}, a \textbf{M}ulti-turn \textbf{A}thletic \textbf{R}eal-world \textbf{S}cenario Dialogue \textbf{Bench}mark, designed to remedy the gap. MARS-Bench is constructed from play-by-play text commentary so to feature realistic dialogues specifically designed to evaluate three critical aspects of multi-turn conversations: Ultra Multi-turn, Interactive Multi-turn, and Cross-turn Tasks. Extensive experiments on MARS-Bench also reveal that closed-source LLMs significantly outperform open-source alternatives, explicit reasoning significantly boosts LLMs' robustness on handling long complex dialogue sessions, and LLMs indeed face significant challenges when handling motivation transfer and sophisticated cross-turn dependency. Moreover, we provide mechanistic interpretability on how attention sinks due to special tokens lead to LLMs' performance degradation when handling long complex dialogue sessions based on attention visualization experiment in Qwen2.5-7B-Instruction.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective
Authors:
Yuanhong Zhang,
Muyao Yuan,
Weizhan Zhang,
Tieliang Gong,
Wen Wen,
Jiangyong Ying,
Weijie Shi
Abstract:
The Segment Anything Model (SAM), a vision foundation model, exhibits impressive zero-shot capabilities in general tasks but struggles in specialized domains. Parameter-efficient fine-tuning (PEFT) is a promising approach to unleash the potential of SAM in novel scenarios. However, existing PEFT methods for SAM neglect the domain-invariant relations encoded in the pre-trained model. To bridge this…
▽ More
The Segment Anything Model (SAM), a vision foundation model, exhibits impressive zero-shot capabilities in general tasks but struggles in specialized domains. Parameter-efficient fine-tuning (PEFT) is a promising approach to unleash the potential of SAM in novel scenarios. However, existing PEFT methods for SAM neglect the domain-invariant relations encoded in the pre-trained model. To bridge this gap, we propose InfoSAM, an information-theoretic approach that enhances SAM fine-tuning by distilling and preserving its pre-trained segmentation knowledge. Specifically, we formulate the knowledge transfer process as two novel mutual information-based objectives: (i) to compress the domain-invariant relation extracted from pre-trained SAM, excluding pseudo-invariant information as possible, and (ii) to maximize mutual information between the relational knowledge learned by the teacher (pre-trained SAM) and the student (fine-tuned model). The proposed InfoSAM establishes a robust distillation framework for PEFT of SAM. Extensive experiments across diverse benchmarks validate InfoSAM's effectiveness in improving SAM family's performance on real-world tasks, demonstrating its adaptability and superiority in handling specialized scenarios.
△ Less
Submitted 3 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Frequency Composition for Compressed and Domain-Adaptive Neural Networks
Authors:
Yoojin Kwon,
Hongjun Suh,
Wooseok Lee,
Taesik Gong,
Songyi Han,
Hyung-Sin Kim
Abstract:
Modern on-device neural network applications must operate under resource constraints while adapting to unpredictable domain shifts. However, this combined challenge-model compression and domain adaptation-remains largely unaddressed, as prior work has tackled each issue in isolation: compressed networks prioritize efficiency within a fixed domain, whereas large, capable models focus on handling do…
▽ More
Modern on-device neural network applications must operate under resource constraints while adapting to unpredictable domain shifts. However, this combined challenge-model compression and domain adaptation-remains largely unaddressed, as prior work has tackled each issue in isolation: compressed networks prioritize efficiency within a fixed domain, whereas large, capable models focus on handling domain shifts. In this work, we propose CoDA, a frequency composition-based framework that unifies compression and domain adaptation. During training, CoDA employs quantization-aware training (QAT) with low-frequency components, enabling a compressed model to selectively learn robust, generalizable features. At test time, it refines the compact model in a source-free manner (i.e., test-time adaptation, TTA), leveraging the full-frequency information from incoming data to adapt to target domains while treating high-frequency components as domain-specific cues. LFC are aligned with the trained distribution, while HFC unique to the target distribution are solely utilized for batch normalization. CoDA can be integrated synergistically into existing QAT and TTA methods. CoDA is evaluated on widely used domain-shift benchmarks, including CIFAR10-C and ImageNet-C, across various model architectures. With significant compression, it achieves accuracy improvements of 7.96%p on CIFAR10-C and 5.37%p on ImageNet-C over the full-precision TTA baseline.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Test-Time Adaptation with Binary Feedback
Authors:
Taeckyung Lee,
Sorn Chottananurak,
Junsu Kim,
Jinwoo Shin,
Taesik Gong,
Sung-Ju Lee
Abstract:
Deep learning models perform poorly when domain shifts exist between training and test data. Test-time adaptation (TTA) is a paradigm to mitigate this issue by adapting pre-trained models using only unlabeled test samples. However, existing TTA methods can fail under severe domain shifts, while recent active TTA approaches requiring full-class labels are impractical due to high labeling costs. To…
▽ More
Deep learning models perform poorly when domain shifts exist between training and test data. Test-time adaptation (TTA) is a paradigm to mitigate this issue by adapting pre-trained models using only unlabeled test samples. However, existing TTA methods can fail under severe domain shifts, while recent active TTA approaches requiring full-class labels are impractical due to high labeling costs. To address this issue, we introduce a new setting of TTA with binary feedback. This setting uses a few binary feedback inputs from annotators to indicate whether model predictions are correct, thereby significantly reducing the labeling burden of annotators. Under the setting, we propose BiTTA, a novel dual-path optimization framework that leverages reinforcement learning to balance binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions. Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines, demonstrating its effectiveness in handling severe distribution shifts with minimal labeling effort. The source code is available at https://github.com/taeckyung/BiTTA.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Strong coupling of chiral magnons in altermagnets
Authors:
Zhejunyu Jin,
Tianci Gong,
Jie Liu,
Huanhuan Yang,
Zhaozhuo Zeng,
Yunshan Cao,
Peng Yan
Abstract:
Altermagnets recently are identified as a new class of magnets that break the time-reversal symmetry without exhibiting net magnetization. The role of the dipole-dipole interaction (DDI) on their dynamical properties however is yet to be addressed. In this work, we show that the DDI can induce the strong coupling between exchange magnons with opposite chiralities in altermagnets, manifesting as a…
▽ More
Altermagnets recently are identified as a new class of magnets that break the time-reversal symmetry without exhibiting net magnetization. The role of the dipole-dipole interaction (DDI) on their dynamical properties however is yet to be addressed. In this work, we show that the DDI can induce the strong coupling between exchange magnons with opposite chiralities in altermagnets, manifesting as a significant level repulsion in the magnon spectrum. Crucially, the predicted magnon-magnon coupling is highly anisotropic, and observable in practical experiments. These exotic features are absent in conventional antiferromagnets. Our findings open a new pathway for quantum magnonic information processing based on altermagnetism.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning
Authors:
Ajian Liu,
Haocheng Yuan,
Xiao Guo,
Hui Ma,
Wanyi Zhuang,
Changtao Miao,
Yan Hong,
Chuanbiao Song,
Jun Lan,
Qi Chu,
Tao Gong,
Yanyan Liang,
Weiqiang Wang,
Jun Wan,
Xiaoming Liu,
Zhen Lei
Abstract:
Presentation Attack Detection and Face Forgery Detection are designed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes respectively. But separate training of these two models makes them vulnerable to unknown attacks and burdens deployment environments. The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly…
▽ More
Presentation Attack Detection and Face Forgery Detection are designed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes respectively. But separate training of these two models makes them vulnerable to unknown attacks and burdens deployment environments. The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly due to two factors. First, there's a lack of adequate benchmarks for models to explore. Existing UAD datasets have limited attack types and samples, restricting the model's ability to address advanced threats. To address this, we propose UniAttackDataPlus (UniAttackData+), the most extensive and sophisticated collection of forgery techniques to date. It includes 2,875 identities and their 54 kinds of falsified samples, totaling 697,347 videos. Second, there's a lack of a reliable classification criterion. Current methods try to find an arbitrary criterion within the same semantic space, which fails when encountering diverse attacks. So, we present a novel Visual-Language Model-based Hierarchical Prompt Tuning Framework (HiPTune) that adaptively explores multiple classification criteria from different semantic spaces. We build a Visual Prompt Tree to explore various classification rules hierarchically. Then, by adaptively pruning the prompts, the model can select the most suitable prompts to guide the encoder to extract discriminative features at different levels in a coarse-to-fine way. Finally, to help the model understand the classification criteria in visual space, we propose a Dynamically Prompt Integration module to project the visual prompts to the text encoder for more accurate semantics. Experiments on 12 datasets have shown the potential to inspire further innovations in the UAD field.
△ Less
Submitted 19 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Robust Deep Learning-Based Physical Layer Communications: Strategies and Approaches
Authors:
Fenghao Zhu,
Xinquan Wang,
Chen Zhu,
Tierui Gong,
Zhaohui Yang,
Chongwen Huang,
Xiaoming Chen,
Zhaoyang Zhang,
Mérouane Debbah
Abstract:
Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies…
▽ More
Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies present the great potential, they also face significant challenges related to the robustness, which are expected to intensify in the complex and demanding 6G environment. Specifically, current DL models typically exhibit substantial performance degradation in dynamic environments with time-varying channels, interference of noise and different scenarios, which affect their effectiveness in diverse real-world applications. This paper provides a comprehensive overview of strategies and approaches for robust DL-based methods in physical layer communications. First we introduce the key challenges that current DL models face. Then we delve into a detailed examination of DL approaches specifically tailored to enhance robustness in 6G, which are classified into data-driven and model-driven strategies. Finally, we verify the effectiveness of these methods by case studies and outline future research directions.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Can Code Outlove Blood? An LLM-based VR Experience to Prompt Reflection on Parental Verbal Abuse
Authors:
Jiaying Fu,
Jialin Gu,
Tianyue Gong,
Tiange Zhou
Abstract:
Parental verbal abuse leaves lasting emotional impacts, yet current therapeutic approaches often lack immersive self-reflection opportunities. To address this, we developed a VR experience powered by LLMs to foster reflection on parental verbal abuse. Participants with relevant experiences engage in a dual-phase VR experience: first assuming the role of a verbally abusive parent, interacting with…
▽ More
Parental verbal abuse leaves lasting emotional impacts, yet current therapeutic approaches often lack immersive self-reflection opportunities. To address this, we developed a VR experience powered by LLMs to foster reflection on parental verbal abuse. Participants with relevant experiences engage in a dual-phase VR experience: first assuming the role of a verbally abusive parent, interacting with an LLM portraying a child, then observing the LLM reframing abusive dialogue into warm, supportive expressions as a nurturing parent. A qualitative study with 12 participants showed that the experience encourages reflection on their past experiences and fosters supportive emotions. However, these effects vary with participants' personal histories, emphasizing the need for greater personalization in AI-driven emotional support. This study explores the use of LLMs in immersive environment to promote emotional reflection, offering insights into the design of AI-driven emotional support systems.
△ Less
Submitted 30 May, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Performance Analysis of Deep Learning Models for Femur Segmentation in MRI Scan
Authors:
Mengyuan Liu,
Yixiao Chen,
Anning Tian,
Xinmeng Wu,
Mozhi Shen,
Tianchou Gong,
Jeongkyu Lee
Abstract:
Convolutional neural networks like U-Net excel in medical image segmentation, while attention mechanisms and KAN enhance feature extraction. Meta's SAM 2 uses Vision Transformers for prompt-based segmentation without fine-tuning. However, biases in these models impact generalization with limited data. In this study, we systematically evaluate and compare the performance of three CNN-based models,…
▽ More
Convolutional neural networks like U-Net excel in medical image segmentation, while attention mechanisms and KAN enhance feature extraction. Meta's SAM 2 uses Vision Transformers for prompt-based segmentation without fine-tuning. However, biases in these models impact generalization with limited data. In this study, we systematically evaluate and compare the performance of three CNN-based models, i.e., U-Net, Attention U-Net, and U-KAN, and one transformer-based model, i.e., SAM 2 for segmenting femur bone structures in MRI scan. The dataset comprises 11,164 MRI scans with detailed annotations of femoral regions. Performance is assessed using the Dice Similarity Coefficient, which ranges from 0.932 to 0.954. Attention U-Net achieves the highest overall scores, while U-KAN demonstrated superior performance in anatomical regions with a smaller region of interest, leveraging its enhanced learning capacity to improve segmentation accuracy.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Single-Satellite Navigation on Lunar North Pole
Authors:
Tim Gong,
Andrew Dempster
Abstract:
The Moon is a primary focus of space exploration. Current navigation methods face significant limitations in providing precise location data for lunar missions. In particular, existing methods often require direct Line of Sight to Earth, have limited capacity, and suffer from long signal travel times. This paper aims to tackle these challenges through a novel single satellite navigation system at…
▽ More
The Moon is a primary focus of space exploration. Current navigation methods face significant limitations in providing precise location data for lunar missions. In particular, existing methods often require direct Line of Sight to Earth, have limited capacity, and suffer from long signal travel times. This paper aims to tackle these challenges through a novel single satellite navigation system at the lunar North Pole. By utilising the Doppler effect, this system facilitates 3D geolocation of a stationary receiver on the lunar surface. Key findings include choosing a Low Lunar Orbit (LLO) suitable for North Pole coverage, designing a 3-step geolocation algorithm tailored to lunar conditions, constructing a comprehensive error budget, and evaluating the system performance through Dilution of Position (DOP).
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image
Authors:
Tianyi Gong,
Boyan Li,
Yifei Zhong,
Fangxin Wang
Abstract:
The increasing demand for augmented and virtual reality applications has highlighted the importance of crafting immersive 3D scenes from a simple single-view image. However, due to the partial priors provided by single-view input, existing methods are often limited to reconstruct low-consistency 3D scenes with narrow fields of view from single-view input. These limitations make them less capable o…
▽ More
The increasing demand for augmented and virtual reality applications has highlighted the importance of crafting immersive 3D scenes from a simple single-view image. However, due to the partial priors provided by single-view input, existing methods are often limited to reconstruct low-consistency 3D scenes with narrow fields of view from single-view input. These limitations make them less capable of generalizing to reconstruct immersive scenes. To address this problem, we propose ExScene, a two-stage pipeline to reconstruct an immersive 3D scene from any given single-view image. ExScene designs a novel multimodal diffusion model to generate a high-fidelity and globally consistent panoramic image. We then develop a panoramic depth estimation approach to calculate geometric information from panorama, and we combine geometric information with high-fidelity panoramic image to train an initial 3D Gaussian Splatting (3DGS) model. Following this, we introduce a GS refinement technique with 2D stable video diffusion priors. We add camera trajectory consistency and color-geometric priors into the denoising process of diffusion to improve color and spatial consistency across image sequences. These refined sequences are then used to fine-tune the initial 3DGS model, leading to better reconstruction quality. Experimental results demonstrate that our ExScene achieves consistent and immersive scene reconstruction using only single-view input, significantly surpassing state-of-the-art baselines.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement
Authors:
Xinghao Wang,
Tao Gong,
Qi Chu,
Bin Liu,
Nenghai Yu
Abstract:
Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods. Recent approaches in image manipulation detection have largely been driven by fully supervised approaches, which require labor-intensive pixel-level annotations. Thus, it is essential to explore weakly supervised image manipulation localization methods that only require i…
▽ More
Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods. Recent approaches in image manipulation detection have largely been driven by fully supervised approaches, which require labor-intensive pixel-level annotations. Thus, it is essential to explore weakly supervised image manipulation localization methods that only require image-level binary labels for training. However, existing weakly supervised image manipulation methods overlook the importance of edge information for accurate localization, leading to suboptimal localization performance. To address this, we propose a Context-Aware Boundary Localization (CABL) module to aggregate boundary features and learn context-inconsistency for localizing manipulated areas. Furthermore, by leveraging Class Activation Mapping (CAM) and Segment Anything Model (SAM), we introduce the CAM-Guided SAM Refinement (CGSR) module to generate more accurate manipulation localization maps. By integrating two modules, we present a novel weakly supervised framework based on a dual-branch Transformer-CNN architecture. Our method achieves outstanding localization performance across multiple datasets.
△ Less
Submitted 31 March, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Adaptive Camera Sensor for Vision Models
Authors:
Eunsu Baek,
Sunghwan Han,
Taesik Gong,
Hyung-Sin Kim
Abstract:
Domain shift remains a persistent challenge in deep-learning-based computer vision, often requiring extensive model modifications or large labeled datasets to address. Inspired by human visual perception, which adjusts input quality through corrective lenses rather than over-training the brain, we propose Lens, a novel camera sensor control method that enhances model performance by capturing high-…
▽ More
Domain shift remains a persistent challenge in deep-learning-based computer vision, often requiring extensive model modifications or large labeled datasets to address. Inspired by human visual perception, which adjusts input quality through corrective lenses rather than over-training the brain, we propose Lens, a novel camera sensor control method that enhances model performance by capturing high-quality images from the model's perspective rather than relying on traditional human-centric sensor control. Lens is lightweight and adapts sensor parameters to specific models and scenes in real-time. At its core, Lens utilizes VisiT, a training-free, model-specific quality indicator that evaluates individual unlabeled samples at test time using confidence scores without additional adaptation costs. To validate Lens, we introduce ImageNet-ES Diverse, a new benchmark dataset capturing natural perturbations from varying sensor and lighting conditions. Extensive experiments on both ImageNet-ES and our new ImageNet-ES Diverse show that Lens significantly improves model accuracy across various baseline schemes for sensor control and model modification while maintaining low latency in image captures. Lens effectively compensates for large model size differences and integrates synergistically with model improvement techniques. Our code and dataset are available at github.com/Edw2n/Lens.git.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Authors:
Jiangbo Shi,
Chen Li,
Tieliang Gong,
Yefeng Zheng,
Huazhu Fu
Abstract:
Multiple instance learning (MIL)-based framework has become the mainstream for processing the whole slide image (WSI) with giga-pixel size and hierarchical image context in digital pathology. However, these methods heavily depend on a substantial number of bag-level labels and solely learn from the original slides, which are easily affected by variations in data distribution. Recently, vision lang…
▽ More
Multiple instance learning (MIL)-based framework has become the mainstream for processing the whole slide image (WSI) with giga-pixel size and hierarchical image context in digital pathology. However, these methods heavily depend on a substantial number of bag-level labels and solely learn from the original slides, which are easily affected by variations in data distribution. Recently, vision language model (VLM)-based methods introduced the language prior by pre-training on large-scale pathological image-text pairs. However, the previous text prompt lacks the consideration of pathological prior knowledge, therefore does not substantially boost the model's performance. Moreover, the collection of such pairs and the pre-training process are very time-consuming and source-intensive.To solve the above problems, we propose a dual-scale vision-language multiple instance learning (ViLa-MIL) framework for whole slide image classification. Specifically, we propose a dual-scale visual descriptive text prompt based on the frozen large language model (LLM) to boost the performance of VLM effectively. To transfer the VLM to process WSI efficiently, for the image branch, we propose a prototype-guided patch decoder to aggregate the patch features progressively by grouping similar patches into the same prototype; for the text branch, we introduce a context-guided text decoder to enhance the text features by incorporating the multi-granular image contexts. Extensive studies on three multi-cancer and multi-center subtyping datasets demonstrate the superiority of ViLa-MIL.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Gotta Hash 'Em All! Speeding Up Hash Functions for Zero-Knowledge Proof Applications
Authors:
Nojan Sheybani,
Tengkai Gong,
Anees Ahmed,
Nges Brian Njungle,
Michel Kinsy,
Farinaz Koushanfar
Abstract:
Collision-resistant cryptographic hash functions (CRHs) are crucial for security in modern systems but are optimized for standard CPUs. While heavily used in zero-knowledge proof (ZKP) applications, traditional CRHs are inefficient in the ZK domain. ZK-friendly hashes have been developed but struggle on consumer hardware due to a lack of specialized ZK-specific hardware. To address this, we presen…
▽ More
Collision-resistant cryptographic hash functions (CRHs) are crucial for security in modern systems but are optimized for standard CPUs. While heavily used in zero-knowledge proof (ZKP) applications, traditional CRHs are inefficient in the ZK domain. ZK-friendly hashes have been developed but struggle on consumer hardware due to a lack of specialized ZK-specific hardware. To address this, we present HashEmAll, a novel collection of FPGA-based realizations of three ZK-friendly hash functions: Griffin, Rescue-Prime, and Reinforced Concrete. Each hash offers different optimization focuses, allowing users to choose based on the constraints of their applications. Through our ZK-optimized arithmetic functions on reconfigurable hardware, HashEmAll outperforms CPU implementations by up to $23\times$ with lower power consumption and compatibility with accessible FPGAs.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Rydberg Atomic Quantum Receivers for the Multi-User MIMO Uplink
Authors:
Tierui Gong,
Chau Yuen,
Chong Meng Samson See,
Mérouane Debbah,
Lajos Hanzo
Abstract:
Rydberg atomic quantum receivers exhibit great potential in assisting classical wireless communications due to their outstanding advantages in detecting radio frequency signals. To realize this potential, we integrate a Rydberg atomic quantum receiver into a classical multi-user multiple-input multiple-output (MIMO) scheme to form a multi-user Rydberg atomic quantum MIMO (RAQ-MIMO) system for the…
▽ More
Rydberg atomic quantum receivers exhibit great potential in assisting classical wireless communications due to their outstanding advantages in detecting radio frequency signals. To realize this potential, we integrate a Rydberg atomic quantum receiver into a classical multi-user multiple-input multiple-output (MIMO) scheme to form a multi-user Rydberg atomic quantum MIMO (RAQ-MIMO) system for the uplink. To study this system, we first construct an equivalent baseband signal model, which facilitates convenient system design, signal processing and optimizations. We then study the ergodic achievable rates under both the maximum ratio combining (MRC) and zero-forcing (ZF) schemes by deriving their tight lower bounds. We next compare the ergodic achievable rates of the RAQ-MIMO and the conventional massive MIMO schemes by offering a closed-form expression for the difference of their ergodic achievable rates, which allows us to directly compare the two systems. Our results show that RAQ-MIMO allows the average transmit power of users to be $> 25$ dBm lower than that of the conventional massive MIMO. Viewed from a different perspective, an extra $\sim 8.8$ bits/s/Hz/user rate becomes achievable by ZF RAQ-MIMO.
△ Less
Submitted 28 February, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis
Authors:
Wen Wen,
Tieliang Gong,
Yuxin Dong,
Shujian Yu,
Weizhan Zhang
Abstract:
Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior remains elusive. This paper aims to bridge this gap by devel…
▽ More
Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior remains elusive. This paper aims to bridge this gap by developing information-theoretic generalization bounds for multi-view learning, with a particular focus on multi-view reconstruction and classification tasks. Our bounds underscore the importance of capturing both consensus and complementary information from multiple different views to achieve maximally disentangled representations. These results also indicate that applying the multi-view information bottleneck regularizer is beneficial for satisfactory generalization performance. Additionally, we derive novel data-dependent bounds under both leave-one-out and supersample settings, yielding computational tractable and tighter bounds. In the interpolating regime, we further establish the fast-rate bound for multi-view learning, exhibiting a faster convergence rate compared to conventional square-root bounds. Numerical results indicate a strong correlation between the true generalization gap and the derived bounds across various learning scenarios.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Towards Sharper Information-theoretic Generalization Bounds for Meta-Learning
Authors:
Wen Wen,
Tieliang Gong,
Yuxin Dong,
Yong-Jin Liu,
Weizhan Zhang
Abstract:
In recent years, information-theoretic generalization bounds have emerged as a promising approach for analyzing the generalization capabilities of meta-learning algorithms. However, existing results are confined to two-step bounds, failing to provide a sharper characterization of the meta-generalization gap that simultaneously accounts for environment-level and task-level dependencies. This paper…
▽ More
In recent years, information-theoretic generalization bounds have emerged as a promising approach for analyzing the generalization capabilities of meta-learning algorithms. However, existing results are confined to two-step bounds, failing to provide a sharper characterization of the meta-generalization gap that simultaneously accounts for environment-level and task-level dependencies. This paper addresses this fundamental limitation by establishing novel single-step information-theoretic bounds for meta-learning. Our bounds exhibit substantial advantages over prior MI- and CMI-based bounds, especially in terms of tightness, scaling behavior associated with sampled tasks and samples per task, and computational tractability. Furthermore, we provide novel theoretical insights into the generalization behavior of two classes of noise and iterative meta-learning algorithms via gradient covariance analysis, where the meta-learner uses either the entire meta-training data (e.g., Reptile), or separate training and test data within the task (e.g., model agnostic meta-learning (MAML)). Numerical results validate the effectiveness of the derived bounds in capturing the generalization dynamics of meta-learning.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching
Authors:
Meng Wang,
Jintao Yang,
Bin Yang,
Hui Li,
Tongxin Gong,
Bo Yang,
Jiangtao Cui
Abstract:
Patch-wise Transformer based time series forecasting achieves superior accuracy. However, this superiority relies heavily on intricate model design with massive parameters, rendering both training and inference expensive, thus preventing their deployments on edge devices with limited resources and low latency requirements. In addition, existing methods often work in an autoregressive manner, which…
▽ More
Patch-wise Transformer based time series forecasting achieves superior accuracy. However, this superiority relies heavily on intricate model design with massive parameters, rendering both training and inference expensive, thus preventing their deployments on edge devices with limited resources and low latency requirements. In addition, existing methods often work in an autoregressive manner, which take into account only historical values, but ignore valuable, easy-to-obtain context information, such as weather forecasts, date and time of day. To contend with the two limitations, we propose LiPFormer, a novel Lightweight Patch-wise Transformer with weak data enriching. First, to simplify the Transformer backbone, LiPFormer employs a novel lightweight cross-patch attention and a linear transformation-based attention to eliminate Layer Normalization and Feed Forward Network, two heavy components in existing Transformers. Second, we propose a lightweight, weak data enriching module to provide additional, valuable weak supervision to the training. It enhances forecasting accuracy without significantly increasing model complexity as it does not involve expensive, human-labeling but using easily accessible context information. This facilitates the weak data enriching to plug-and-play on existing models. Extensive experiments on nine benchmark time series datasets demonstrate that LiPFormer outperforms state-of-the-art methods in accuracy, while significantly reducing parameter scale, training duration, and GPU memory usage. Deployment on an edge device reveals that LiPFormer takes only 1/3 inference time compared to classic Transformers. In addition, we demonstrate that the weak data enriching can integrate seamlessly into various Transformer based models to enhance their accuracy, suggesting its generality.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Rydberg Atomic Quantum Receivers for Multi-Target DOA Estimation
Authors:
Tierui Gong,
Chau Yuen,
Chong Meng Samson See,
Mérouane Debbah,
Lajos Hanzo
Abstract:
Quantum sensing technologies have experienced rapid progresses since entering the `second quantum revolution'. Among various candidates, schemes relying on Rydberg atoms exhibit compelling advantages for detecting radio frequency signals. Based on this, Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution to classical wireless communication and sensing. To harness the adva…
▽ More
Quantum sensing technologies have experienced rapid progresses since entering the `second quantum revolution'. Among various candidates, schemes relying on Rydberg atoms exhibit compelling advantages for detecting radio frequency signals. Based on this, Rydberg atomic quantum receivers (RAQRs) have emerged as a promising solution to classical wireless communication and sensing. To harness the advantages and exploit the potential of RAQRs in wireless sensing, we investigate the realization of the direction of arrival (DOA) estimation by RAQRs. Specifically, we first conceive a Rydberg atomic quantum uniform linear array (RAQ-ULA) aided receiver for multi-target detection and propose the corresponding signal model of this sensing system. Furthermore, we propose the Rydberg atomic quantum estimation of signal parameters by designing a rotational invariance based technique termed as RAQ-ESPRIT relying on our model. The proposed algorithm solves the sensor gain mismatch problem, which is due to the presence of the RF local oscillator in the RAQ-ULA and cannot be well addressed by using the conventional ESPRIT. Lastly, we characterize our scheme through numerical simulations.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
Inclusion 2024 Global Multimedia Deepfake Detection Challenge: Towards Multi-dimensional Face Forgery Detection
Authors:
Yi Zhang,
Weize Gao,
Changtao Miao,
Man Luo,
Jianshu Li,
Wenzhong Deng,
Zhe Li,
Bingyu Hu,
Weibin Yao,
Yunfeng Diao,
Wenbo Zhou,
Tao Gong,
Qi Chu
Abstract:
In this paper, we present the Global Multimedia Deepfake Detection held concurrently with the Inclusion 2024. Our Multimedia Deepfake Detection aims to detect automatic image and audio-video manipulations including but not limited to editing, synthesis, generation, Photoshop,etc. Our challenge has attracted 1500 teams from all over the world, with about 5000 valid result submission counts. We invi…
▽ More
In this paper, we present the Global Multimedia Deepfake Detection held concurrently with the Inclusion 2024. Our Multimedia Deepfake Detection aims to detect automatic image and audio-video manipulations including but not limited to editing, synthesis, generation, Photoshop,etc. Our challenge has attracted 1500 teams from all over the world, with about 5000 valid result submission counts. We invite the top 20 teams to present their solutions to the challenge, from which the top 3 teams are awarded prizes in the grand finale. In this paper, we present the solutions from the top 3 teams of the two tracks, to boost the research work in the field of image and audio-video forgery detection. The methodologies developed through the challenge will contribute to the development of next-generation deepfake detection systems and we encourage participants to open source their methods.
△ Less
Submitted 3 June, 2025; v1 submitted 30 December, 2024;
originally announced December 2024.
-
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Authors:
Taesik Gong,
Fahim Kawsar,
Chulhong Min
Abstract:
Tiny machine learning (TinyML) aims to run ML models on small devices and is increasingly favored for its enhanced privacy, reduced latency, and low cost. Recently, the advent of tiny AI accelerators has revolutionized the TinyML field by significantly enhancing hardware processing power. These accelerators, equipped with multiple parallel processors and dedicated per-processor memory instances, o…
▽ More
Tiny machine learning (TinyML) aims to run ML models on small devices and is increasingly favored for its enhanced privacy, reduced latency, and low cost. Recently, the advent of tiny AI accelerators has revolutionized the TinyML field by significantly enhancing hardware processing power. These accelerators, equipped with multiple parallel processors and dedicated per-processor memory instances, offer substantial performance improvements over traditional microcontroller units (MCUs). However, their limited data memory often necessitates downsampling input images, resulting in accuracy degradation. To address this challenge, we propose Data channel EXtension (DEX), a novel approach for efficient CNN execution on tiny AI accelerators. DEX incorporates additional spatial information from original images into input images through patch-wise even sampling and channel-wise stacking, effectively extending data across input channels. By leveraging underutilized processors and data memory for channel extension, DEX facilitates parallel execution without increasing inference latency. Our evaluation with four models and four datasets on tiny AI accelerators demonstrates that this simple idea improves accuracy on average by 3.5%p while keeping the inference latency the same on the AI accelerator. The source code is available at https://github.com/Nokia-Bell-Labs/data-channel-extension.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Rydberg Atomic Quantum Receivers for Classical Wireless Communications and Sensing: Their Models and Performance
Authors:
Tierui Gong,
Jiaming Sun,
Chau Yuen,
Guangwei Hu,
Yufei Zhao,
Yong Liang Guan,
Chong Meng Samson See,
Mérouane Debbah,
Lajos Hanzo
Abstract:
The significant progress of quantum sensing technologies offer numerous radical solutions for measuring a multitude of physical quantities at an unprecedented precision. Among them, Rydberg atomic quantum receivers (RAQRs) emerge as an eminent solution for detecting the electric field of radio frequency (RF) signals, exhibiting great potential in assisting classical wireless communications and sen…
▽ More
The significant progress of quantum sensing technologies offer numerous radical solutions for measuring a multitude of physical quantities at an unprecedented precision. Among them, Rydberg atomic quantum receivers (RAQRs) emerge as an eminent solution for detecting the electric field of radio frequency (RF) signals, exhibiting great potential in assisting classical wireless communications and sensing. So far, most experimental studies have aimed for the proof of physical concepts to reveal its promise, while the practical signal model of RAQR-aided wireless communications and sensing remained under-explored. Furthermore, the performance of RAQR-based wireless receivers and their advantages over classical RF receivers have not been fully characterized. To fill these gaps, we introduce the RAQR to the wireless community by presenting an end-to-end reception scheme. We then develop a corresponding equivalent baseband signal model relying on a realistic reception flow. Our scheme and model provide explicit design guidance to RAQR-aided wireless systems. We next study the performance of RAQR-aided wireless systems based on our model, and compare them to classical RF receivers. The results show that the RAQR is capable of achieving a substantial received signal-to-noise ratio (SNR) gain of over $27$ decibel (dB) and $40$ dB in the photon shot limit regime and the standard quantum limit regime, respectively.
△ Less
Submitted 13 May, 2025; v1 submitted 7 December, 2024;
originally announced December 2024.
-
AMAZE: Accelerated MiMC Hardware Architecture for Zero-Knowledge Applications on the Edge
Authors:
Anees Ahmed,
Nojan Sheybani,
Davi Moreno,
Nges Brian Njungle,
Tengkai Gong,
Michel Kinsy,
Farinaz Koushanfar
Abstract:
Collision-resistant, cryptographic hash (CRH) functions have long been an integral part of providing security and privacy in modern systems. Certain constructions of zero-knowledge proof (ZKP) protocols aim to utilize CRH functions to perform cryptographic hashing. Standard CRH functions, such as SHA2, are inefficient when employed in the ZKP domain, thus calling for ZK-friendly hashes, which are…
▽ More
Collision-resistant, cryptographic hash (CRH) functions have long been an integral part of providing security and privacy in modern systems. Certain constructions of zero-knowledge proof (ZKP) protocols aim to utilize CRH functions to perform cryptographic hashing. Standard CRH functions, such as SHA2, are inefficient when employed in the ZKP domain, thus calling for ZK-friendly hashes, which are CRH functions built with ZKP efficiency in mind. The most mature ZK-friendly hash, MiMC, presents a block cipher and hash function with a simple algebraic structure that is well-suited, due to its achieved security and low complexity, for ZKP applications. Although ZK-friendly hashes have improved the performance of ZKP generation in software, the underlying computation of ZKPs, including CRH functions, must be optimized on hardware to enable practical applications. The challenge we address in this work is determining how to efficiently incorporate ZK-friendly hash functions, such as MiMC, into hardware accelerators, thus enabling more practical applications. In this work, we introduce AMAZE, a highly hardware-optimized open-source framework for computing the MiMC block cipher and hash function. Our solution has been primarily directed at resource-constrained edge devices; consequently, we provide several implementations of MiMC with varying power, resource, and latency profiles. Our extensive evaluations show that the AMAZE-powered implementation of MiMC outperforms standard CPU implementations by more than 13$\times$. In all settings, AMAZE enables efficient ZK-friendly hashing on resource-constrained devices. Finally, we highlight AMAZE's underlying open-source arithmetic backend as part of our end-to-end design, thus allowing developers to utilize the AMAZE framework for custom ZKP applications.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Point processes with event time uncertainty
Authors:
Xiuyuan Cheng,
Tingnan Gong,
Yao Xie
Abstract:
Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the form…
▽ More
Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the formulation in the continuous-time setting under a few assumptions motivated by application scenarios. After imposing a time grid, we obtain a discrete-time model that facilitates inference and can be computed by first-order optimization methods such as Gradient Descent or Variation inequality (VI) using batch-based Stochastic Gradient Descent (SGD). The parameter recovery guarantee is proved for VI inference at an $O(1/k)$ convergence rate using $k$ SGD steps. Our framework handles non-stationary processes by modeling the inference kernel as a matrix (or tensor on a network) and it covers the stationary process, such as the classical Hawkes process, as a special case. We experimentally show that the proposed approach outperforms previous General Linear model (GLM) baselines on simulated and real data and reveals meaningful causal relations on a Sepsis-associated Derangements dataset.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Conjugate-dual clusters
Authors:
Silei Wang,
Jing Tian,
Jiayu Li,
Tian Gong,
Xing Yan,
Jijun Zhao,
Xin-Gao Gong,
Xiao Gu
Abstract:
Discovery of clusters with high symmetrical geometry, such as C60 fullerene, always attract lots of interest because of their diverse nature. However, most of such interesting cluster were sporadically discovered, is there any systematic method to explore all possible high symmetrical clusters? Herein, we propose an idea to systematically construct high symmetrical structures based on novel conjug…
▽ More
Discovery of clusters with high symmetrical geometry, such as C60 fullerene, always attract lots of interest because of their diverse nature. However, most of such interesting cluster were sporadically discovered, is there any systematic method to explore all possible high symmetrical clusters? Herein, we propose an idea to systematically construct high symmetrical structures based on novel conjugate-dual (co-dual) concept. A co-dual structure would be constructed by conjugately combining one high symmetrical polyhedron and its corresponding dual. In such co-dual structures, the symmetry of the original polyhedron could be kept or even promoted in some special cases. This provides a way to explore high symmetrical polyhedra and leads to a new cluster family, co-dual clusters. In this paper, we have systematically studied the spherical co-dual clusters with one- or two-element shells, and found a series of stable cage-like and core-shell clusters. The co-dual structures would be a new treasure box for hunting novel clusters.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Rectified Diffusion Guidance for Conditional Generation
Authors:
Mengfei Xia,
Nan Xue,
Yujun Shen,
Ran Yi,
Tieliang Gong,
Yong-Jin Liu
Abstract:
Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG cannot be expressed as a reciprocal diffusion process, which may consequently leave some hidden risks during use. In this work, we revisit the theory behind CFG…
▽ More
Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG cannot be expressed as a reciprocal diffusion process, which may consequently leave some hidden risks during use. In this work, we revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution. To rectify this issue, we propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory. We further show that our approach enjoys a closed-form solution given the guidance strength. That way, the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected. Empirical evidence on real-world data demonstrate the compatibility of our post-hoc design with existing state-of-the-art diffusion models, including both class-conditioned ones (e.g., EDM2 on ImageNet) and text-conditioned ones (e.g., SD3 on CC12M), without any retraining. We will open-source the code to facilitate further research.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Toric varieties modulo reflections
Authors:
Colin Crowley,
Tao Gong,
Connor Simpson
Abstract:
Let $W$ be a finite group generated by reflections of a lattice $M$. If a lattice polytope $P \subset M \otimes_{\mathbb Z}\mathbb R$ is preserved by $W$, then we show that the quotient of the projective toric variety $X_P$ by $W$ is isomorphic to the toric variety $X_{P \cap D}$, where $D$ is a fundamental domain for the action of $W$. This answers a question of Horiguchi-Masuda-Shareshian-Song,…
▽ More
Let $W$ be a finite group generated by reflections of a lattice $M$. If a lattice polytope $P \subset M \otimes_{\mathbb Z}\mathbb R$ is preserved by $W$, then we show that the quotient of the projective toric variety $X_P$ by $W$ is isomorphic to the toric variety $X_{P \cap D}$, where $D$ is a fundamental domain for the action of $W$. This answers a question of Horiguchi-Masuda-Shareshian-Song, and recovers results of Blume, of the second author, and of Gui-Hu-Liu.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection
Authors:
Tianxiang Chen,
Zhentao Tan,
Tao Gong,
Yue Wu,
Qi Chu,
Bin Liu,
Jieping Ye,
Nenghai Yu
Abstract:
As a manner to augment pre-trained large language models (LLM), knowledge injection is critical to develop vertical domain large models and has been widely studied. Although most current approaches, including parameter-efficient fine-tuning (PEFT) and block expansion methods, uniformly apply knowledge across all LLM layers, it raises the question: are all layers equally crucial for knowledge injec…
▽ More
As a manner to augment pre-trained large language models (LLM), knowledge injection is critical to develop vertical domain large models and has been widely studied. Although most current approaches, including parameter-efficient fine-tuning (PEFT) and block expansion methods, uniformly apply knowledge across all LLM layers, it raises the question: are all layers equally crucial for knowledge injection? We begin by evaluating the importance of each layer in finding the optimal layer range for knowledge injection. Intuitively, the more important layers should play a more critical role in knowledge injection and deserve a denser injection. We observe performance dips in question-answering benchmarks after the removal or expansion of the shallow layers, and the degradation shrinks as the layer gets deeper, indicating that the shallow layers hold the key to knowledge injection. This insight leads us to propose the S strategy, a post-pretraining strategy of selectively enhancing shallow layers while pruning the less effective deep ones. Based on this strategy, we introduce Llama Slayer-8B and Llama Slayer-8B-Instruct. We experimented on the corpus of code $\&$ math and demonstrated the effectiveness of our strategy. Further experiments across different LLM, Mistral-7B, and a legal corpus confirmed the general applicability of the approach, underscoring its wide-ranging efficacy. Our code is available at: \https://github.com/txchen-USTC/Llama-Slayer
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Accelerating Non-Maximum Suppression: A Graph Theory Perspective
Authors:
King-Siong Si,
Lu Sun,
Weizhan Zhang,
Tieliang Gong,
Jiahao Wang,
Jiang Liu,
Hao Sun
Abstract:
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. With the continuous optimization of network models, NMS has become the ``last mile'' to enhance the efficiency of object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. Consequently, we propose two optimization methods,…
▽ More
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. With the continuous optimization of network models, NMS has become the ``last mile'' to enhance the efficiency of object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. Consequently, we propose two optimization methods, namely QSI-NMS and BOE-NMS. The former is a fast recursive divide-and-conquer algorithm with negligible mAP loss, and its extended version (eQSI-NMS) achieves optimal complexity of $\mathcal{O}(n\log n)$. The latter, concentrating on the locality of NMS, achieves an optimization at a constant level without an mAP loss penalty. Moreover, to facilitate rapid evaluation of NMS methods for researchers, we introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods. Taking the YOLOv8-N model on MS COCO 2017 as the benchmark setup, our method QSI-NMS provides $6.2\times$ speed of original NMS on the benchmark, with a $0.1\%$ decrease in mAP. The optimal eQSI-NMS, with only a $0.3\%$ mAP decrease, achieves $10.7\times$ speed. Meanwhile, BOE-NMS exhibits $5.1\times$ speed with no compromise in mAP.
△ Less
Submitted 24 November, 2024; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Higher-criticism for sparse multi-stream change-point detection
Authors:
Tingnan Gong,
Alon Kipnis,
Yao Xie
Abstract:
We study a statistical procedure based on higher criticism (HC) to address the sparse multi-stream quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of multiple data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing…
▽ More
We study a statistical procedure based on higher criticism (HC) to address the sparse multi-stream quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of multiple data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing for a change point in individual streams and combining multiple tests using higher criticism. Relying on HC thresholding, the procedure also indicates a set of streams suspected to be affected by the change. We provide a theoretical analysis under a sparse heteroscedastic normal change-point model. We establish an information-theoretic detection delay lower bound when individual tests are based on the likelihood ratio or the generalized likelihood ratio statistics and show that the delay of the HC-based method converges in distribution to this bound. In the special case of constant variance, our bound coincides with known results in (Chan, 2017). We demonstrate the effectiveness of the HC-based method compared to other methods in detecting sparse changes through extensive numerical evaluations.
△ Less
Submitted 19 April, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Rydberg Atomic Quantum Receivers for Classical Wireless Communication and Sensing
Authors:
Tierui Gong,
Aveek Chandra,
Chau Yuen,
Yong Liang Guan,
Rainer Dumke,
Chong Meng Samson See,
Mérouane Debbah,
Lajos Hanzo
Abstract:
The Rydberg atomic quantum receivers (RAQR) are emerging quantum precision sensing platforms designed for receiving radio frequency (RF) signals. It relies on creation of Rydberg atoms from normal atoms by exciting one or more electrons to a very high energy level, thereby making the atom sensitive to RF signals. RAQRs realize RF-to-optical conversions based on light-atom interactions relying on t…
▽ More
The Rydberg atomic quantum receivers (RAQR) are emerging quantum precision sensing platforms designed for receiving radio frequency (RF) signals. It relies on creation of Rydberg atoms from normal atoms by exciting one or more electrons to a very high energy level, thereby making the atom sensitive to RF signals. RAQRs realize RF-to-optical conversions based on light-atom interactions relying on the so called electromagnetically induced transparency (EIT) and Aulter-Townes splitting (ATS), so that the desired RF signal can be read out optically. The large dipole moments of Rydberg atoms associated with rich choices of Rydberg states and various modulation schemes facilitate an ultra-high sensitivity ($\sim$ nV/cm/$\sqrt{\text{Hz}}$) and an ultra-broadband tunability (direct-current to Terahertz). RAQRs also exhibit compelling scalability and lend themselves to the construction of innovative, compact receivers. Initial experimental studies have demonstrated their capabilities in classical wireless communications and sensing. To fully harness their potential in a wide variety of applications, we commence by outlining the underlying fundamentals of Rydberg atoms, followed by the principles and schemes of RAQRs. Then, we overview the state-of-the-art studies from both physics and communication societies. Furthermore, we conceive Rydberg atomic quantum single-input single-output (RAQ-SISO) and multiple-input multiple-output (RAQ-MIMO) schemes for facilitating the integration of RAQRs with classical wireless systems. Finally, we conclude with a set of potent research directions.
△ Less
Submitted 18 January, 2025; v1 submitted 22 September, 2024;
originally announced September 2024.
-
PRIME: Phase Reversed Interleaved Multi-Echo acquisition enables highly accelerated distortion-free diffusion MRI
Authors:
Yohan Jun,
Qiang Liu,
Ting Gong,
Jaejin Cho,
Shohei Fujita,
Xingwang Yong,
Susie Y Huang,
Lipeng Ning,
Anastasia Yendiki,
Yogesh Rathi,
Berkin Bilgic
Abstract:
Purpose: To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition. Methods: A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution…
▽ More
Purpose: To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition. Methods: A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution, and distortion-free dMRI, which includes two echoes where the first echo is for target diffusion-weighted imaging (DWI) acquisition with high-resolution and the second echo is acquired with either 1) lower-resolution for high-fidelity field map estimation, or 2) matching resolution to enable efficient diffusion relaxometry acquisitions. The sequence was evaluated on in vivo data acquired from healthy volunteers on clinical and Connectome 2.0 scanners. Results: In vivo experiments demonstrated that 1) high in-plane acceleration (Rin-plane of 5-fold with 2D partial Fourier) was achieved using the high-fidelity field maps estimated from the second echo, which was made at a lower resolution/acceleration to increase its SNR while matching the effective echo spacing of the first readout, 2) high-resolution diffusion relaxometry parameters were estimated from dual-echo PRIME data using a white matter model of multi-TE spherical mean technique (MTE-SMT), and 3) high-fidelity mesoscale DWI at 550 um isotropic resolution could be obtained in vivo by capitalizing on the high-performance gradients of the Connectome 2.0 scanner. Conclusion: The proposed PRIME sequence enabled highly accelerated, high-resolution, and distortion-free dMRI using an additional echo without prolonging scan time when gSlider encoding is utilized.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
SpotActor: Training-Free Layout-Controlled Consistent Image Generation
Authors:
Jiahao Wang,
Caixia Yan,
Weizhan Zhang,
Haonan Lin,
Mengmeng Wang,
Guang Dai,
Tieliang Gong,
Hao Sun,
Jingdong Wang
Abstract:
Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected spot nor maintain the consistent appearance of each subject across images. For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI…
▽ More
Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected spot nor maintain the consistent appearance of each subject across images. For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts. To accomplish this challenging task, we present a new formalization of dual energy guidance with optimization in a dual semantic-latent space and thus propose a training-free pipeline, SpotActor, which features a layout-conditioned backward update stage and a consistent forward sampling stage. In the backward stage, we innovate a nuanced layout energy function to mimic the attention activations with a sigmoid-like objective. While in the forward stage, we design Regional Interconnection Self-Attention (RISA) and Semantic Fusion Cross-Attention (SFCA) mechanisms that allow mutual interactions across images. To evaluate the performance, we present ActorBench, a specified benchmark with hundreds of reasonable prompt-box pairs stemming from object detection datasets. Comprehensive experiments are conducted to demonstrate the effectiveness of our method. The results prove that SpotActor fulfills the expectations of this task and showcases the potential for practical applications with superior layout alignment, subject consistency, prompt conformity and background diversity.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Compartment-specific estimation of T2 and T2* with diffusion-PEPTIDE MRI
Authors:
Ting Gong,
Merlin J. Fair,
Kawin Setsompop,
Hui Zhang
Abstract:
We present a microstructure imaging technique for estimating compartment-specific T2 and T2* simultaneously in the human brain. Microstructure imaging with diffusion MRI (dMRI) has enabled the modelling of intra-neurite and extra-neurite diffusion signals separately allowing for the estimation of compartment-specific tissue properties. These compartment-specific properties have been widely used in…
▽ More
We present a microstructure imaging technique for estimating compartment-specific T2 and T2* simultaneously in the human brain. Microstructure imaging with diffusion MRI (dMRI) has enabled the modelling of intra-neurite and extra-neurite diffusion signals separately allowing for the estimation of compartment-specific tissue properties. These compartment-specific properties have been widely used in clinical studies. However, conventional dMRI cannot disentangle differences in relaxations between tissue compartments, causing biased estimates of diffusion measures which also change with TE. To solve the problem, combined relaxometry-diffusion imaging methods have been developed in recent years, providing compartmental T2-diffusion or T2*-diffusion imaging respectively, but not T2 and T2* together. As they provide complementary information, a technique that can estimate both jointly with diffusion is appealing to neuroimaging studies. The aim of this work is to develop a method to map compartmental T2-T2*-diffusion simultaneously. Using an advanced MRI acquisition called diffusion-PEPTIDE, a novel microstructure model is proposed and a multi-step fitting method is developed to estimate parameters of interest. We demonstrate for the first time that compartmental T2, T2* can be estimated simultaneously from in vivo data. we further show the accuracy and precision of parameter estimation with simulation.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Prescribed-time Convergent Distributed Multiobjective Optimization with Dynamic Event-triggered Communication
Authors:
Tengyang Gong,
Zhongguo Li,
Yiqiao Xu,
Zhengtao Ding
Abstract:
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) within multi-agent networks, where each agent has multiple, potentially conflicting local objectives, constrained by both local and global constraints. By reformulating the DCMRAP as a single-objective weighted $L_p$ problem, a distributed solution is enabled, which eliminates the need for predetermi…
▽ More
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) within multi-agent networks, where each agent has multiple, potentially conflicting local objectives, constrained by both local and global constraints. By reformulating the DCMRAP as a single-objective weighted $L_p$ problem, a distributed solution is enabled, which eliminates the need for predetermined weighting factors or centralized decision-making in traditional methods. Leveraging prescribed-time control and dynamic event-triggered mechanisms (ETMs), novel distributed algorithms are proposed to achieve Pareto optimality within a prescribed settling time through sampled communication. Using generalized time-based generators (TBGs), these algorithms provide more flexibility in optimizing accuracy and control smoothness without the constraints of initial conditions. Novel dynamic ETMs are designed to work with generalized TBGs to promote communication efficiency, which adjusts to both local error metrics and network-based disagreements. The Zeno behavior is excluded. Validated by Lyapunov analysis and simulations, our method demonstrates superior control performance and efficiency compared to existing methods, advancing distributed optimization in complex environments.
△ Less
Submitted 7 February, 2025; v1 submitted 18 August, 2024;
originally announced August 2024.
-
V3rified: Revelation vs Non-Revelation Mechanisms for Decentralized Verifiable Computation
Authors:
Tiantian Gong,
Aniket Kate,
Alexandros Psomas,
Athina Terzoglou
Abstract:
In the era of Web3, decentralized technologies have emerged as the cornerstone of a new digital paradigm. Backed by a decentralized blockchain architecture, the Web3 space aims to democratize all aspects of the web. From data-sharing to learning models, outsourcing computation is an established, prevalent practice. Verifiable computation makes this practice trustworthy as clients/users can now eff…
▽ More
In the era of Web3, decentralized technologies have emerged as the cornerstone of a new digital paradigm. Backed by a decentralized blockchain architecture, the Web3 space aims to democratize all aspects of the web. From data-sharing to learning models, outsourcing computation is an established, prevalent practice. Verifiable computation makes this practice trustworthy as clients/users can now efficiently validate the integrity of a computation. As verifiable computation gets considered for applications in the Web3 space, decentralization is crucial for system reliability, ensuring that no single entity can suppress clients. At the same time, however, decentralization needs to be balanced with efficiency: clients want their computations done as quickly as possible.
Motivated by these issues, we study the trade-off between decentralization and efficiency when outsourcing computational tasks to strategic, rational solution providers. Specifically, we examine this trade-off when the client employs (1) revelation mechanisms, i.e. auctions, where solution providers bid their desired reward for completing the task by a specific deadline and then the client selects which of them will do the task and how much they will be rewarded, and (2) simple, non-revelation mechanisms, where the client commits to the set of rules she will use to map solutions at specific times to rewards and then solution providers decide whether they want to do the task or not. We completely characterize the power and limitations of revelation and non-revelation mechanisms in our model.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Distributed Feedback-Feedforward Algorithms for Time-Varying Resource Allocation
Authors:
Yiqiao Xu,
Tengyang Gong,
Zhengtao Ding,
Alessandra Parisio
Abstract:
In this paper, we address distributed Time-Varying Resource Allocation (TVRA) problem, where the local cost functions, global equality constraint, and Local Feasibility Constraints (LFCs) vary with time. To track the optimal trajectories, algorithms that mimic the structure of feedback-feedforward control systems are proposed. We begin with their conceptual design in the absence of LFCs, developin…
▽ More
In this paper, we address distributed Time-Varying Resource Allocation (TVRA) problem, where the local cost functions, global equality constraint, and Local Feasibility Constraints (LFCs) vary with time. To track the optimal trajectories, algorithms that mimic the structure of feedback-feedforward control systems are proposed. We begin with their conceptual design in the absence of LFCs, developing a feedback-feedforward algorithm that is fixed-time convergent. For cases with LFCs, existing approaches predominantly rely on constructing a time-dependent barrier function, which may impede the design of fixed-time convergent algorithms. Therefore, by exploring the connection between projection and penalty functions, switched feedforward laws are tailored to handle LFCs, with projection used in conjunction. Based on this, we develop a projection-based feedback-feedforward algorithm, which converges to the exact optimal trajectories, possibly along with a number of switching instants, while exhibiting fixed-time convergence between consecutive switching instants. Numerical experiments verify the effectiveness of the proposed algorithms.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization
Authors:
Changtao Miao,
Qi Chu,
Tao Gong,
Zhentao Tan,
Zhenchao Jin,
Wanyi Zhuang,
Man Luo,
Honggang Hu,
Nenghai Yu
Abstract:
With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited det…
▽ More
With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited detection performance, or employ a naive two-branch structure to simultaneously obtain detection and localization results, which cannot effectively benefit the localization capability due to limited interaction between two tasks. This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization. The MoNFAP primarily introduces two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM). The FUP integrates detection and localization tasks using a token learning strategy and multiple forgery-aware transformers, which facilitates the use of classification information to enhance localization capability. Besides, motivated by the crucial role of noise information in forgery detection, the MNM leverages multiple noise extractors based on the concept of the mixture of experts to enhance the general RGB features, further boosting the performance of our framework. Finally, we establish a comprehensive benchmark for multi-face detection and localization and the proposed \textit{MoNFAP} achieves significant performance. The codes will be made available.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Three-dimensional solitons supported by the spin-orbit coupling and Rydberg-Rydberg interactions in PT-symmetric potentials
Authors:
Yuan Zhao,
Qihong Huang,
Tixian Gong,
Siliu Xu,
Zeping Li,
Boris A. Malomed
Abstract:
Excited states (ESs) of two- and three-dimensional (2D and 3D) solitons of the semivortex (SV) and mixed-mode (MM) types, supported by the interplay of the spin-orbit coupling (SOC) and local nonlinearity in binary Bose-Einstein condensates, are unstable, on the contrary to the stability of the SV and MM solitons in their fundamental states. We propose a stabilization strategy for these states in…
▽ More
Excited states (ESs) of two- and three-dimensional (2D and 3D) solitons of the semivortex (SV) and mixed-mode (MM) types, supported by the interplay of the spin-orbit coupling (SOC) and local nonlinearity in binary Bose-Einstein condensates, are unstable, on the contrary to the stability of the SV and MM solitons in their fundamental states. We propose a stabilization strategy for these states in 3D, combining SOC and long-range Rydberg-Rydberg interactions (RRI), in the presence of a spatially-periodic potential, that may include a parity-time (PT)-symmetric component. ESs of the SV solitons, which carry integer vorticities S and S+1 in their two components, exhibit robustness up to S= 4. ESs of MM solitons feature an interwoven necklace-like structure, with the components carrying opposite fractional values of the orbital angular momentum. Regions of the effective stability of the 3D solitons of the SV and MM types (both fundamental ones and ESs), are identified as functions of the imaginary component of the PT-symmetric potential and strengths of the SOC and RRI terms.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Joint Active and Passive Beamforming Design for IRS-aided MIMO ISAC Based on Sensing Mutual Information
Authors:
Jin Li,
Gui Zhou,
Tantao Gong,
Nan Liu,
Rui Zhang
Abstract:
In this paper, we investigate the intelligent reflecting surface (IRS)/reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system based on sensing mutual information (MI). Specifically, the base station (BS) perceives the sensing target via the reflected sensing signal by the IRS, while communicating with the users simultaneously. Our aim is to maximize the s…
▽ More
In this paper, we investigate the intelligent reflecting surface (IRS)/reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system based on sensing mutual information (MI). Specifically, the base station (BS) perceives the sensing target via the reflected sensing signal by the IRS, while communicating with the users simultaneously. Our aim is to maximize the sensing MI, subject to the quality of service (QoS) constraints for all communication users, the transmit power constraint at the BS, and the unit-modulus constraint on the IRS's passive reflection. We solve this problem under two cases: one simplified case assuming a line-of-sight (LoS) channel between the BS and IRS and no clutter interference to sensing, and the other generalized case considering the Rician fading channel of the BS-IRS link and the presence of clutter interference to sensing. For the first case, we prove that the dedicated sensing beamformer is unnecessary for improving sensing MI and develop a low-complexity iterative algorithm to jointly optimize the BS and IRS active/passive beamformers. Then, for the second case, we propose an alternative iterative algorithm, which can also be applied to the first case, to solve the beamforming design problem under the general setup. Numerical results are provided to validate the performance of the proposed algorithms, as compared to various benchmark schemes.
△ Less
Submitted 2 April, 2025; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Homotopy Types Of Toric Orbifolds From Weyl Polytopes
Authors:
Tao Gong
Abstract:
Given a reduced crystallographic root system with a fixed simple system, it is associated to a Weyl group $W$, parabolic subgroups $W_K$'s and a polytope $P$ which is the convex hull of a dominant weight. The quotient $P/W_K$ can be identified with a polytope. Polytopes $P$ and $P/W_K$ are associated to toric varieties $X_P$ and $X_{P/W_K}$ respectively. It turns out the underlying topological spa…
▽ More
Given a reduced crystallographic root system with a fixed simple system, it is associated to a Weyl group $W$, parabolic subgroups $W_K$'s and a polytope $P$ which is the convex hull of a dominant weight. The quotient $P/W_K$ can be identified with a polytope. Polytopes $P$ and $P/W_K$ are associated to toric varieties $X_P$ and $X_{P/W_K}$ respectively. It turns out the underlying topological spaces $X_P/W_K$ and $X_{P/W_K}$ are homotopy equivalent, when considering the polytopes in the real span of the root lattice or of the weight lattice.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Authors:
Hyungjun Yoon,
Biniyam Aschalew Tolera,
Taesik Gong,
Kimin Lee,
Sung-Ju Lee
Abstract:
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt…
▽ More
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8 times. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks. The source code is available at https://github.com/diamond264/ByMyEyes.
△ Less
Submitted 29 September, 2024; v1 submitted 14 July, 2024;
originally announced July 2024.
-
Microstructure.jl: a Julia Package for Probabilistic Microstructure Model Fitting with Diffusion MRI
Authors:
Ting Gong,
Anastasia Yendiki
Abstract:
Microstructure.jl is a Julia package designed for probabilistic estimation of tissue microstructural parameters from diffusion or combined diffusion-relaxometry MRI data. It provides a flexible and extensible framework for defining compartment models and includes robust and unified estimators for parameter fitting and uncertainty quantification. The package incorporates several established models…
▽ More
Microstructure.jl is a Julia package designed for probabilistic estimation of tissue microstructural parameters from diffusion or combined diffusion-relaxometry MRI data. It provides a flexible and extensible framework for defining compartment models and includes robust and unified estimators for parameter fitting and uncertainty quantification. The package incorporates several established models from the literature, such as the spherical mean technique and soma and neurite density imaging (SANDI), along with their extensions for analyzing combined diffusion and T2 mapping data acquired at multiple echo times. For parameter estimation, it features methods like Markov Chain Monte Carlo (MCMC) sampling and Monte Carlo dropout with neural networks, which provide probabilistic estimates by approximating the posterior distributions of model parameters. In this study, we introduce the major modules, functionality, and design of this package. We demonstrate its usage in optimizing acquisition protocols and evaluating model fitting performance with synthesized datasets. We also showcase practical applications with publicly available datasets. Microstructure.jl is applicable to in vivo and ex vivo imaging data acquired with typical research, high-performance, or pre-clinical scanners.
△ Less
Submitted 29 April, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Interplay between MRI-based axon diameter and myelination estimates in macaque and human brain
Authors:
Ting Gong,
Chiara Maffei,
Evan Dann,
Hong-Hsi Lee,
Hansol Lee,
Jean C. Augustinack,
Susie Y. Huang,
Suzanne N. Haber,
Anastasia Yendiki
Abstract:
Axon diameter and myelin thickness affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. Electron microscopy studies suggest that axon diameter and myelin thickness are closely related to each other. However, the relationship between MRI-based estimates of these…
▽ More
Axon diameter and myelin thickness affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. Electron microscopy studies suggest that axon diameter and myelin thickness are closely related to each other. However, the relationship between MRI-based estimates of these microstructural measures, known to be relative indices, have not been investigated across the brain mainly due to methodological limitations. In recent years, studies using ultra-high gradient strength diffusion MRI (dMRI) have demonstrated improved estimation of axon diameter index across white-matter (WM) tracts in the human brain, making such investigations feasible. In this study, we aim to investigate relationships between tissue microstructure properties across white-matter tracts, as estimated with MRI-based methods. We collected dMRI with ultra-high gradient strength and multi-echo spin-echo MRI on ex vivo macaque and human brain samples on a preclinical scanner. From these data, we found that the correlations between axon diameter index and other microstructural imaging parameters were weak but consistent across WM tracts in samples estimated with sufficient signal to noise ratio. In well-myelinated regions, tissue voxels with larger axon diameter indices were associated with lower packing density, lower MWF and a tendency of higher g-ratio. We also found that intra-axonal signal fractions and MWF were not consistently correlated when assessed in different samples. Overall, the findings suggest that MRI-based axon geometry and myelination measures can provide complementary information about fiber morphology, and the relationships between these measures agree with prior electron microscopy studies in smaller field of views.
△ Less
Submitted 29 April, 2025; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Distribution-Free Online Change Detection for Low-Rank Images
Authors:
Tingnan Gong,
Seong-Hee Kim,
Yao Xie
Abstract:
We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific para…
▽ More
We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific parametric distribution. We propose new monitoring statistics that utilize the low-rank structure of the in-control mean matrix. Additionally, we study the properties of the proposed detection procedure, assessing whether the monitoring statistics effectively capture a mean shift and evaluating the rate of increase in the average run length relative to the control limit in both the in-control and out-of-control cases. The effectiveness of our procedure is demonstrated through simulated and real data experiments.
△ Less
Submitted 27 February, 2025; v1 submitted 23 June, 2024;
originally announced June 2024.
-
How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis
Authors:
Yuxin Dong,
Tieliang Gong,
Hong Chen,
Shuangyong Song,
Weizhan Zhang,
Chen Li
Abstract:
Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution match…
▽ More
Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Electromagnetic Information Theory for Holographic MIMO Communications
Authors:
Li Wei,
Tierui Gong,
Chongwen Huang,
Zhaoyang Zhang,
Wei E. I. Sha,
Zhi Ning Chen,
Linglong Dai,
Merouane Debbah,
Chau Yuen
Abstract:
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it…
▽ More
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
△ Less
Submitted 12 December, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.