-
GeoLocSFT: Efficient Visual Geolocation via Supervised Fine-Tuning of Multimodal Foundation Models
Authors:
Qiang Yi,
Lianlei Shan
Abstract:
Accurately determining the geographic location where a single image was taken, visual geolocation, remains a formidable challenge due to the planet's vastness and the deceptive similarity among distant locations. We introduce GeoLocSFT, a framework that demonstrates how targeted supervised fine-tuning (SFT) of a large multimodal foundation model (Gemma 3) using a small, high-quality dataset can yi…
▽ More
Accurately determining the geographic location where a single image was taken, visual geolocation, remains a formidable challenge due to the planet's vastness and the deceptive similarity among distant locations. We introduce GeoLocSFT, a framework that demonstrates how targeted supervised fine-tuning (SFT) of a large multimodal foundation model (Gemma 3) using a small, high-quality dataset can yield highly competitive geolocation performance. GeoLocSFT is trained with only 2700 carefully selected image-GPS pairs from our geographically diverse MR600k dataset. Despite this limited data, our SFT-centric approach substantially improves over baseline models and achieves robust results on standard benchmarks such as Im2GPS-3k and YFCC-4k, as well as on our newly proposed and challenging MR40k benchmark, aimed specifically at sparsely populated regions. Further, we explore multi-candidate inference and aggregation strategies but find that the core gains are already realized at the SFT stage. Our findings highlight the power of high-quality supervision and efficient SFT for planet-scale image geolocation, especially when compared to prior methods that require massive databases or complex pipelines. To foster further research, we publicly release the MR40k benchmark dataset.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
Authors:
Zekun Wang,
Minghua Ma,
Zexin Wang,
Rongchuan Mu,
Liping Shan,
Ming Liu,
Bing Qin
Abstract:
Large Vision-Language Models (LVLMs) have achieved remarkable success, yet their significant computational demands hinder practical deployment. While efforts to improve LVLM efficiency are growing, existing methods lack comprehensive evaluation across diverse backbones, benchmarks, and metrics. In this work, we systematically evaluate mainstream acceleration techniques for LVLMs, categorized into…
▽ More
Large Vision-Language Models (LVLMs) have achieved remarkable success, yet their significant computational demands hinder practical deployment. While efforts to improve LVLM efficiency are growing, existing methods lack comprehensive evaluation across diverse backbones, benchmarks, and metrics. In this work, we systematically evaluate mainstream acceleration techniques for LVLMs, categorized into token and parameter compression. We introduce EffiVLM-Bench, a unified framework for assessing not only absolute performance but also generalization and loyalty, while exploring Pareto-optimal trade-offs. Our extensive experiments and in-depth analyses offer insights into optimal strategies for accelerating LVLMs. We open-source code and recipes for EffiVLM-Bench to foster future research.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
Authors:
Shixian Luo,
Zezhou Zhu,
Yu Yuan,
Yuncheng Yang,
Lianlei Shan,
Yong Wu
Abstract:
Geometric spatial reasoning forms the foundation of many applications in artificial intelligence, yet the ability of large language models (LLMs) to operate over geometric spatial information expressed in procedural code remains underexplored. In this paper, we address this gap by formalizing the Program-to-Geometry task, which challenges models to translate programmatic drawing code into accurate…
▽ More
Geometric spatial reasoning forms the foundation of many applications in artificial intelligence, yet the ability of large language models (LLMs) to operate over geometric spatial information expressed in procedural code remains underexplored. In this paper, we address this gap by formalizing the Program-to-Geometry task, which challenges models to translate programmatic drawing code into accurate and abstract geometric reasoning. To evaluate this capability, we present GeoGramBench, a benchmark of 500 carefully refined problems organized by a tailored three-level taxonomy that considers geometric complexity rather than traditional mathematical reasoning complexity. Our comprehensive evaluation of 17 frontier LLMs reveals consistent and pronounced deficiencies: even the most advanced models achieve less than 50% accuracy at the highest abstraction level. These results highlight the unique challenges posed by program-driven spatial reasoning and establish GeoGramBench as a valuable resource for advancing research in symbolic-to-spatial geometric reasoning. Project page: https://github.com/LiAuto-DSR/GeoGramBench.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
GMM-Based Comprehensive Feature Extraction and Relative Distance Preservation For Few-Shot Cross-Modal Retrieval
Authors:
Chengsong Sun,
Weiping Li,
Xiang Li,
Yuankun Liu,
Lianlei Shan
Abstract:
Few-shot cross-modal retrieval focuses on learning cross-modal representations with limited training samples, enabling the model to handle unseen classes during inference. Unlike traditional cross-modal retrieval tasks, which assume that both training and testing data share the same class distribution, few-shot retrieval involves data with sparse representations across modalities. Existing methods…
▽ More
Few-shot cross-modal retrieval focuses on learning cross-modal representations with limited training samples, enabling the model to handle unseen classes during inference. Unlike traditional cross-modal retrieval tasks, which assume that both training and testing data share the same class distribution, few-shot retrieval involves data with sparse representations across modalities. Existing methods often fail to adequately model the multi-peak distribution of few-shot cross-modal data, resulting in two main biases in the latent semantic space: intra-modal bias, where sparse samples fail to capture intra-class diversity, and inter-modal bias, where misalignments between image and text distributions exacerbate the semantic gap. These biases hinder retrieval accuracy. To address these issues, we propose a novel method, GCRDP, for few-shot cross-modal retrieval. This approach effectively captures the complex multi-peak distribution of data using a Gaussian Mixture Model (GMM) and incorporates a multi-positive sample contrastive learning mechanism for comprehensive feature modeling. Additionally, we introduce a new strategy for cross-modal semantic alignment, which constrains the relative distances between image and text feature distributions, thereby improving the accuracy of cross-modal representations. We validate our approach through extensive experiments on four benchmark datasets, demonstrating superior performance over six state-of-the-art methods.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization
Authors:
Hailong Luo,
Bin Wu,
Hongyong Jia,
Qingqing Zhu,
Lianlei Shan
Abstract:
Graph neural networks (GNNs) have advanced recommender systems by modeling interaction relationships. However, existing graph-based recommenders rely on sparse ID features and do not fully exploit textual information, resulting in low information density within representations. Furthermore, graph contrastive learning faces challenges. Random negative sampling can introduce false negative samples,…
▽ More
Graph neural networks (GNNs) have advanced recommender systems by modeling interaction relationships. However, existing graph-based recommenders rely on sparse ID features and do not fully exploit textual information, resulting in low information density within representations. Furthermore, graph contrastive learning faces challenges. Random negative sampling can introduce false negative samples, while fixed temperature coefficients cannot adapt to the heterogeneity of different nodes. In addition, current efforts to enhance recommendations with large language models (LLMs) have not fully utilized their Chain-of-Thought (CoT) reasoning capabilities to guide representation learning. To address these limitations, we introduces LGHRec (LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization). This framework leverages the CoT reasoning ability of LLMs to generate semantic IDs, enriching reasoning processes and improving information density and semantic quality of representations. Moreover, we design a reinforcement learning algorithm, Harmonized Group Policy Optimization (HGPO), to optimize negative sampling strategies and temperature coefficients in contrastive learning. This approach enhances long-tail recommendation performance and ensures optimization consistency across different groups. Experimental results on three datasets demonstrate that LGHRec improves representation quality through semantic IDs generated by LLM's CoT reasoning and effectively boosts contrastive learning with HGPO. Our method outperforms several baseline models. The code is available at: https://anonymous.4open.science/r/LLM-Rec.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
AD-GPT: Large Language Models in Alzheimer's Disease
Authors:
Ziyu Liu,
Lintao Tang,
Zeliang Sun,
Zhengliang Liu,
Yanjun Lyu,
Wei Ruan,
Yangshuang Xu,
Liang Shan,
Jiyoon Shin,
Xiaohe Chen,
Dajiang Zhu,
Tianming Liu,
Rongjie Liu,
Chao Huang
Abstract:
Large language models (LLMs) have emerged as powerful tools for medical information retrieval, yet their accuracy and depth remain limited in specialized domains such as Alzheimer's disease (AD), a growing global health challenge. To address this gap, we introduce AD-GPT, a domain-specific generative pre-trained transformer designed to enhance the retrieval and analysis of AD-related genetic and n…
▽ More
Large language models (LLMs) have emerged as powerful tools for medical information retrieval, yet their accuracy and depth remain limited in specialized domains such as Alzheimer's disease (AD), a growing global health challenge. To address this gap, we introduce AD-GPT, a domain-specific generative pre-trained transformer designed to enhance the retrieval and analysis of AD-related genetic and neurobiological information. AD-GPT integrates diverse biomedical data sources, including potential AD-associated genes, molecular genetic information, and key gene variants linked to brain regions. We develop a stacked LLM architecture combining Llama3 and BERT, optimized for four critical tasks in AD research: (1) genetic information retrieval, (2) gene-brain region relationship assessment, (3) gene-AD relationship analysis, and (4) brain region-AD relationship mapping. Comparative evaluations against state-of-the-art LLMs demonstrate AD-GPT's superior precision and reliability across these tasks, underscoring its potential as a robust and specialized AI tool for advancing AD research and biomarker discovery.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Computing High-dimensional Confidence Sets for Arbitrary Distributions
Authors:
Chao Gao,
Liren Shan,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $δ$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $δ$ coverage of $D$, i.e., $\mathbb{P}_{y \sim D} \left[ y \in S \right] \ge δ$, and the volume of $S$ is as small as pos…
▽ More
We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $δ$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $δ$ coverage of $D$, i.e., $\mathbb{P}_{y \sim D} \left[ y \in S \right] \ge δ$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in finding confidence sets, uncertainty quantification, and support estimation.
In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $C$ with bounded VC-dimension. An algorithm is competitive with class $C$ if, given samples from an arbitrary distribution $D$, it outputs in polynomial time a set that achieves $δ$ coverage of $D$, and whose volume is competitive with the smallest set in $C$ with the required coverage $δ$. This problem is computationally challenging even in the basic setting when $C$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball.
Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{1/2}))$ factor competitive with the optimal ball having the desired coverage. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.
△ Less
Submitted 12 May, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
Cognitive Memory in Large Language Models
Authors:
Lianlei Shan,
Shixian Luo,
Zezhou Zhu,
Yu Yuan,
Yong Wu
Abstract:
This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or s…
▽ More
This paper examines memory mechanisms in Large Language Models (LLMs), emphasizing their importance for context-rich responses, reduced hallucinations, and improved efficiency. It categorizes memory into sensory, short-term, and long-term, with sensory memory corresponding to input prompts, short-term memory processing immediate context, and long-term memory implemented via external databases or structures. The text-based memory section covers acquisition (selection and summarization), management (updating, accessing, storing, and resolving conflicts), and utilization (full-text search, SQL queries, semantic search). The KV cache-based memory section discusses selection methods (regularity-based summarization, score-based approaches, special token embeddings) and compression techniques (low-rank compression, KV merging, multimodal compression), along with management strategies like offloading and shared attention mechanisms. Parameter-based memory methods (LoRA, TTT, MoE) transform memories into model parameters to enhance efficiency, while hidden-state-based memory approaches (chunk mechanisms, recurrent transformers, Mamba model) improve long-text processing by combining RNN hidden states with current methods. Overall, the paper offers a comprehensive analysis of LLM memory mechanisms, highlighting their significance and future research directions.
△ Less
Submitted 23 April, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
Enhancing Deep Learning Based Structured Illumination Microscopy Reconstruction with Light Field Awareness
Authors:
Long-Kun Shan,
Ze-Hao Wang,
Tong-Tian Weng,
Xiang-Dong Chen,
Fang-Wen Sun
Abstract:
Structured illumination microscopy (SIM) is a pivotal technique for dynamic subcellular imaging in live cells. Conventional SIM reconstruction algorithms depend on accurately estimating the illumination pattern and can introduce artefacts when this estimation is imprecise. Although recent deep learning-based SIM reconstruction methods have improved speed, accuracy, and robustness, they often strug…
▽ More
Structured illumination microscopy (SIM) is a pivotal technique for dynamic subcellular imaging in live cells. Conventional SIM reconstruction algorithms depend on accurately estimating the illumination pattern and can introduce artefacts when this estimation is imprecise. Although recent deep learning-based SIM reconstruction methods have improved speed, accuracy, and robustness, they often struggle with out-of-distribution data. To address this limitation, we propose an Awareness-of-Light-field SIM (AL-SIM) reconstruction approach that directly estimates the actual light field to correct for errors arising from data distribution shifts. Through comprehensive experiments on both simulated filament structures and live BSC1 cells, our method demonstrates a 7% reduction in the normalized root mean square error (NRMSE) and substantially lowers reconstruction artefacts. By minimizing these artefacts and improving overall accuracy, AL-SIM broadens the applicability of SIM for complex biological systems.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
Authors:
Xirui Zhou,
Lianlei Shan,
Xiaolin Gui
Abstract:
Visual Question Answering (VQA) models, which fall under the category of vision-language models, conventionally execute multiple downsampling processes on image inputs to strike a balance between computational efficiency and model performance. Although this approach aids in concentrating on salient features and diminishing computational burden, it incurs the loss of vital detailed information, a d…
▽ More
Visual Question Answering (VQA) models, which fall under the category of vision-language models, conventionally execute multiple downsampling processes on image inputs to strike a balance between computational efficiency and model performance. Although this approach aids in concentrating on salient features and diminishing computational burden, it incurs the loss of vital detailed information, a drawback that is particularly damaging in end-to-end autonomous driving scenarios. Downsampling can lead to an inadequate capture of distant or small objects such as pedestrians, road signs, or obstacles, all of which are crucial for safe navigation. This loss of features negatively impacts an autonomous driving system's capacity to accurately perceive the environment, potentially escalating the risk of accidents. To tackle this problem, we put forward the Dynamic Resolution Vision Language Model (DynRsl-VLM). DynRsl-VLM incorporates a dynamic resolution image input processing approach that captures all entity feature information within an image while ensuring that the image input remains computationally tractable for the Vision Transformer (ViT). Moreover, we devise a novel image-text alignment module to replace the Q-Former, enabling simple and efficient alignment with text when dealing with dynamic resolution image inputs. Our method enhances the environmental perception capabilities of autonomous driving systems without overstepping computational constraints.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Synthetic Lung X-ray Generation through Cross-Attention and Affinity Transformation
Authors:
Ruochen Pi,
Lianlei Shan
Abstract:
Collecting and annotating medical images is a time-consuming and resource-intensive task. However, generating synthetic data through models such as Diffusion offers a cost-effective alternative. This paper introduces a new method for the automatic generation of accurate semantic masks from synthetic lung X-ray images based on a stable diffusion model trained on text-image pairs. This method uses c…
▽ More
Collecting and annotating medical images is a time-consuming and resource-intensive task. However, generating synthetic data through models such as Diffusion offers a cost-effective alternative. This paper introduces a new method for the automatic generation of accurate semantic masks from synthetic lung X-ray images based on a stable diffusion model trained on text-image pairs. This method uses cross-attention mapping between text and image to extend text-driven image synthesis to semantic mask generation. It employs text-guided cross-attention information to identify specific areas in an image and combines this with innovative techniques to produce high-resolution, class-differentiated pixel masks. This approach significantly reduces the costs associated with data collection and annotation. The experimental results demonstrate that segmentation models trained on synthetic data generated using the method are comparable to, and in some cases even better than, models trained on real datasets. This shows the effectiveness of the method and its potential to revolutionize medical image analysis.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Symmetry-Broken Kondo Screening and Zero-Energy Mode in the Kagome Superconductor CsV3Sb5
Authors:
Yubing Tu,
Zongyuan Zhang,
Wenjian Lu,
Tao Han,
Run Lv,
Zhuying Wang,
Zekun Zhou,
Xinyuan Hou,
Ning Hao,
Zhenyu Wang,
Xianhui Chen,
Lei Shan
Abstract:
The quantum states of matter reorganize themselves in response to defects, giving rise to emergent local excitations that imprint unique characteristics of the host states. While magnetic impurities are known to generate Kondo screening in a Fermi liquid and Yu-Shiba-Rusinov (YSR) states in a conventional superconductor, it remains unclear whether they can evoke distinct phenomena in the kagome su…
▽ More
The quantum states of matter reorganize themselves in response to defects, giving rise to emergent local excitations that imprint unique characteristics of the host states. While magnetic impurities are known to generate Kondo screening in a Fermi liquid and Yu-Shiba-Rusinov (YSR) states in a conventional superconductor, it remains unclear whether they can evoke distinct phenomena in the kagome superconductor AV3Sb5 (where A is K, Rb or Cs), which may host an orbital-antiferromagnetic charge density wave (CDW) state and an unconventional superconducting state driven by the convergence of topology, geometric frustration and electron correlations. In this work, we visualize the local density of states induced near various types of impurities in both the CDW and superconducting phases of CsV3-xMxSb5 (M = Ta, Cr) using scanning tunneling microscopy. We observe Kondo resonance states near magnetic Cr dopants. Notably, unlike in any known metal or CDW compound, the spatial pattern of Kondo screening breaks all in-plane mirror symmetries of the kagome lattice, suggesting an electronic chirality due to putative orbital loop currents. While Cooper pairs show relative insensitivity to nonmagnetic impurities, native V vacancies with weak magnetic moments induce a pronounced zero-bias conductance peak (ZBCP). This ZBCP coexists with trivial YSR states within the superconducting gap and does not split in energy with increasing tunneling transmission, tending instead to saturate. This behavior is reminiscent of signature of Majorana zero modes, which could be trapped by a sign-change boundary in the superconducting order parameter near a V vacancy, consistent with a surface topological superconducting state. Our findings provide a new approach to exploring novel quantum states on kagome lattices.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Unified Kernel-Segregated Transpose Convolution Operation
Authors:
Vijay Srinivas Tida,
Md Imran Hossen,
Liqun Shan,
Sai Venkatesh Chilukoti,
Sonya Hsu,
Xiali Hei
Abstract:
The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output feature map with odd dimensions while launching a thread. To mitigate this problem, we introduce a unified kernel segregation approach that limits the usage of memor…
▽ More
The optimization of the transpose convolution layer for deep learning applications is achieved with the kernel segregation mechanism. However, kernel segregation has disadvantages, such as computing extra elements to obtain the output feature map with odd dimensions while launching a thread. To mitigate this problem, we introduce a unified kernel segregation approach that limits the usage of memory and computational resources by employing one unified kernel to execute four sub-kernels. The findings reveal that the suggested approach achieves an average computational speedup of 2.03x (3.89x) when tested on specific datasets with an RTX 2070 GPU (Intel Xeon CPU). The ablation study shows an average computational speedup of 3.5x when evaluating the transpose convolution layers from well-known Generative Adversarial Networks (GANs). The implementation of the proposed method for the transpose convolution layers in the EB-GAN model demonstrates significant memory savings of up to 35 MB.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Volume Optimality in Conformal Prediction with Structured Prediction Sets
Authors:
Chao Gao,
Liren Shan,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove an impossibility of volume optimality where any distribution-free method can only find a trivial solution. We t…
▽ More
Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove an impossibility of volume optimality where any distribution-free method can only find a trivial solution. We then introduce a new notion of volume optimality by restricting the prediction sets to belong to a set family (of finite VC-dimension), specifically a union of $k$-intervals. Our main contribution is an efficient distribution-free algorithm based on dynamic programming (DP) to find a union of $k$-intervals that is guaranteed for any distribution to have near-optimal volume among all unions of $k$-intervals satisfying the desired coverage property. By adopting the framework of distributional conformal prediction (Chernozhukov et al., 2021), the new DP based conformity score can also be applied to achieve approximate conditional coverage and conditional restricted volume optimality, as long as a reasonable estimator of the conditional CDF is available. While the theoretical results already establish volume-optimality guarantees, they are complemented by experiments that demonstrate that our method can significantly outperform existing methods in many settings.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Verifying Classification with Limited Disclosure
Authors:
Siddharth Bhandari,
Liren Shan
Abstract:
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) motivated by electronic discovery. In this problem, our goal is to design a protocol that guarantees the requesting party receives nearly all responsive documents while minimizing the disclosure of nonresponsive documents. We develop verification protocols that certify the correctness of a cl…
▽ More
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) motivated by electronic discovery. In this problem, our goal is to design a protocol that guarantees the requesting party receives nearly all responsive documents while minimizing the disclosure of nonresponsive documents. We develop verification protocols that certify the correctness of a classifier by disclosing a few nonresponsive documents.
We introduce a combinatorial notion called the Leave-One-Out dimension of a family of classifiers and show that the number of nonresponsive documents disclosed by our protocol is at most this dimension in the realizable setting, where a perfect classifier exists in this family. For linear classifiers with a margin, we characterize the trade-off between the margin and the number of nonresponsive documents that must be disclosed for verification. Specifically, we establish a trichotomy in this requirement: for $d$ dimensional instances, when the margin exceeds $1/3$, verification can be achieved by revealing only $O(1)$ nonresponsive documents; when the margin is exactly $1/3$, in the worst case, at least $Ω(d)$ nonresponsive documents must be disclosed; when the margin is smaller than $1/3$, verification requires $Ω(e^d)$ nonresponsive documents. We believe this result is of independent interest with applications to coding theory and combinatorial geometry. We further extend our protocols to the nonrealizable setting defining an analogous combinatorial quantity robust Leave-One-Out dimension, and to scenarios where the protocol is tolerant to misclassification errors by Alice.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Multi-dimensional Test Design
Authors:
Xiaoyun Qiu,
Liren Shan
Abstract:
How should one jointly design tests and the arrangement of agencies to administer these tests (testing procedure)? To answer this question, we analyze a model where a principal must use multiple tests to screen an agent with a multi-dimensional type, knowing that the agent can change his type at a cost. We identify a new tradeoff between setting difficult tests and using a difficult testing proced…
▽ More
How should one jointly design tests and the arrangement of agencies to administer these tests (testing procedure)? To answer this question, we analyze a model where a principal must use multiple tests to screen an agent with a multi-dimensional type, knowing that the agent can change his type at a cost. We identify a new tradeoff between setting difficult tests and using a difficult testing procedure. We compare two settings: (1) the agent only misrepresents his type (manipulation) and (2) the agent improves his actual type (investment). Examples include interviews, regulations, and data classification. We show that in the manipulation setting, stringent tests combined with an easy procedure, i.e., offering tests sequentially in a fixed order, is optimal. In contrast, in the investment setting, non-stringent tests with a difficult procedure, i.e., offering tests simultaneously, is optimal; however, under mild conditions offering them sequentially in a random order may be as good. Our results suggest that whether the agent manipulates or invests in his type determines which arrangement of agencies is optimal.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Differentially private fine-tuned NF-Net to predict GI cancer type
Authors:
Sai Venkatesh Chilukoti,
Imran Hossen Md,
Liqun Shan,
Vijay Srinivas Tida,
Xiali Hei
Abstract:
Based on global genomic status, the cancer tumor is classified as Microsatellite Instable (MSI) and Microsatellite Stable (MSS). Immunotherapy is used to diagnose MSI, whereas radiation and chemotherapy are used for MSS. Therefore, it is significant to classify a gastro-intestinal (GI) cancer tumor into MSI vs. MSS to provide appropriate treatment. The existing literature showed that deep learning…
▽ More
Based on global genomic status, the cancer tumor is classified as Microsatellite Instable (MSI) and Microsatellite Stable (MSS). Immunotherapy is used to diagnose MSI, whereas radiation and chemotherapy are used for MSS. Therefore, it is significant to classify a gastro-intestinal (GI) cancer tumor into MSI vs. MSS to provide appropriate treatment. The existing literature showed that deep learning could directly predict the class of GI cancer tumors from histological images. However, deep learning (DL) models are susceptible to various threats, including membership inference attacks, model extraction attacks, etc. These attacks render the use of DL models impractical in real-world scenarios. To make the DL models useful and maintain privacy, we integrate differential privacy (DP) with DL. In particular, this paper aims to predict the state of GI cancer while preserving the privacy of sensitive data. We fine-tuned the Normalizer Free Net (NF-Net) model. We obtained an accuracy of 88.98\% without DP to predict (GI) cancer status. When we fine-tuned the NF-Net using DP-AdamW and adaptive DP-AdamW, we got accuracies of 74.58% and 76.48%, respectively. Moreover, we investigate the Weighted Random Sampler (WRS) and Class weighting (CW) to solve the data imbalance. We also evaluated and analyzed the DP algorithms in different settings.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Heterogeneous Multi-agent Multi-armed Bandits on Stochastic Block Models
Authors:
Mengfan Xu,
Liren Shan,
Fatemeh Ghaffari,
Xuchuang Wang,
Xutong Liu,
Mohammad Hajiesmaili
Abstract:
We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models, influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on stochastic block models - a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known o…
▽ More
We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models, influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on stochastic block models - a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. In addition, the cluster structure in stochastic block model also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity, and rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities. In contrast, prior works have not accounted for this refined problem complexity, rely on more stringent assumptions, and exhibit limited scalability.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Structure of weakly collisional shock waves of multicomponent plasmas inside hohlraums of indirect inertial confinement fusions
Authors:
Tianyi Liang,
Dong Wu,
Lifeng Wang,
Lianqiang Shan,
Zongqiang Yuan,
Hongbo Cai,
Yuqiu Gu,
Zhengmao Sheng,
Xiantu He
Abstract:
In laser-driven indirect inertial confinement fusion (ICF), a hohlraum--a cavity constructed from high-Z materials--serves the purpose of converting laser energy into thermal x-ray energy. This process involves the interaction of low-density ablated plasmas, which can give rise to weakly collisional shock waves characterized by a Knudsen number $K_n$ on the order of 1. The Knudsen number serves as…
▽ More
In laser-driven indirect inertial confinement fusion (ICF), a hohlraum--a cavity constructed from high-Z materials--serves the purpose of converting laser energy into thermal x-ray energy. This process involves the interaction of low-density ablated plasmas, which can give rise to weakly collisional shock waves characterized by a Knudsen number $K_n$ on the order of 1. The Knudsen number serves as a metric for assessing the relative importance of collisional interactions. Preliminary experimental investigations and computational simulations have demonstrated that the kinetic effects associated with weakly collisional shock waves significantly impact the efficiency of the implosion process. Therefore, a comprehensive understanding of the physics underlying weakly collisional shock waves is essential. This research aims to explore the formation and fundamental structural properties of weakly collisional shock waves within a hohlraum, as well as the phenomena of ion mixing and ion separation in multicomponent plasmas. Weakly collisional shocks occupy a transition regime between collisional shock waves ($K_n \ll 1$) and collisionless shock waves ($K_n \gg 1$), thereby exhibiting both kinetic effects and hydrodynamic behavior. These shock waves are primarily governed by an electrostatic field, which facilitates significant electrostatic sheath acceleration and ion reflection acceleration. The differentiation of ions occurs due to the varying charge-to-mass ratios of different ion species in the presence of electrostatic field, resulting in the separation of ion densities, velocities, temperatures and concentrations. The presence of weakly collisional shock waves within the hohlraum is expected to affect the transition of laser energy and the overall efficiency of the implosion process.
△ Less
Submitted 17 November, 2024;
originally announced November 2024.
-
First Proof of Principle Experiment for Muon Production with Ultrashort High Intensity Laser
Authors:
Feng Zhang,
Li Deng,
Yanjie Ge,
Jiaxing Wen,
Bo Cui,
Ke Feng,
Hao Wang,
Chen Wu,
Ziwen Pan,
Hongjie Liu,
Zhigang Deng,
Zongxin Zhang,
Liangwen Chen,
Duo Yan,
Lianqiang Shan,
Zongqiang Yuan,
Chao Tian,
Jiayi Qian,
Jiacheng Zhu,
Yi Xu,
Yuhong Yu,
Xueheng Zhang,
Lei Yang,
Weimin Zhou,
Yuqiu Gu
, et al. (4 additional authors not shown)
Abstract:
Muons, which play a crucial role in both fundamental and applied physics, have traditionally been generated through proton accelerators or from cosmic rays. With the advent of ultra-short high-intensity lasers capable of accelerating electrons to GeV levels, it has become possible to generate muons in laser laboratories. In this work, we show the first proof of principle experiment for novel muon…
▽ More
Muons, which play a crucial role in both fundamental and applied physics, have traditionally been generated through proton accelerators or from cosmic rays. With the advent of ultra-short high-intensity lasers capable of accelerating electrons to GeV levels, it has become possible to generate muons in laser laboratories. In this work, we show the first proof of principle experiment for novel muon production with an ultra-short, high-intensity laser device through GeV electron beam bombardment on a lead converter target. The muon physical signal is confirmed by measuring its lifetime which is the first clear demonstration of laser-produced muons. Geant4 simulations were employed to investigate the photo-production, electro-production, and Bethe-Heitler processes response for muon generation and their subsequent detection. The results show that the dominant contributions of muons are attributed to the photo-production/electro-production and a significant yield of muons up to 0.01 $μ$/$e^-$ out of the converter target could be achieved. This laser muon source features compact, ultra-short pulse and high flux. Moreover, its implementation in a small laser laboratory is relatively straightforward, significantly reducing the barriers to entry for research in areas such as muonic X-ray elemental analysis, muon spin spectroscopy and so on.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
On the Oscillations in Cournot Games with Best Response Strategies
Authors:
Zhengyang Liu,
Haolin Lu,
Liang Shan,
Zihe Wang
Abstract:
In this paper, we consider the dynamic oscillation in the Cournot oligopoly model, which involves multiple firms producing homogeneous products. To explore the oscillation under the updates of best response strategies, we focus on the linear price functions. In this setting, we establish the existence of oscillations. In particular, we show that for the scenario of different costs among firms, the…
▽ More
In this paper, we consider the dynamic oscillation in the Cournot oligopoly model, which involves multiple firms producing homogeneous products. To explore the oscillation under the updates of best response strategies, we focus on the linear price functions. In this setting, we establish the existence of oscillations. In particular, we show that for the scenario of different costs among firms, the best response converges to either a unique equilibrium or a two-period oscillation. We further characterize the oscillations and propose linear-time algorithms for finding all types of two-period oscillations. To the best of our knowledge, our work is the first step toward fully analyzing the periodic oscillation in the Cournot oligopoly model.
△ Less
Submitted 6 May, 2025; v1 submitted 12 October, 2024;
originally announced October 2024.
-
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Authors:
Yuxin Wang,
Minghua Ma,
Zekun Wang,
Jingchang Chen,
Huiming Fan,
Liping Shan,
Qing Yang,
Dongliang Xu,
Ming Liu,
Bing Qin
Abstract:
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical sp…
▽ More
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward pass to compute feature activations. Specifically, we first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block. In addition, we introduce a recovery fine-tuning strategy that adaptively allocates training overhead based on coarse-grained importance to further improve performance. Experimental results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets. Our code will be available at https://github.com/wyxscir/CFSP.
△ Less
Submitted 9 December, 2024; v1 submitted 20 September, 2024;
originally announced September 2024.
-
Probing a light long-lived pseudo-scalar from Higgs decay via displaced taus at the LHC
Authors:
Lianyou Shan,
Lei Wang,
Jin Min Yang,
Rui Zhu
Abstract:
A light (GeV mass) long-lived ($cτ$ around dozens of millimeters) CP-odd scalar can be readily predicted in new physics models. In this work we investigate the Higgs decay into such a light scalar plus a $Z$-boson and take the aligned two-Higgs-doublet model (2HDM) as an example. This light long-lived scalar, with the dominant decay to tau leptons, will fly over a distance from the production poin…
▽ More
A light (GeV mass) long-lived ($cτ$ around dozens of millimeters) CP-odd scalar can be readily predicted in new physics models. In this work we investigate the Higgs decay into such a light scalar plus a $Z$-boson and take the aligned two-Higgs-doublet model (2HDM) as an example. This light long-lived scalar, with the dominant decay to tau leptons, will fly over a distance from the production point and present a displaced vertex in an Inner Detector of a generally purposed experiment like ATLAS or CMS. In our study we focus on the LHC experiment and perform Monte Carlo simulations for the signal and backgrounds. We demonstrate some benchmark points for the aligned 2HDM and find the signal to be detectable when the luminosity is accumulated to 300 fb$^{-1}$. So our study suggests an experimental search for this process in the ongoing LHC.
△ Less
Submitted 3 March, 2025; v1 submitted 14 August, 2024;
originally announced August 2024.
-
LiD-FL: Towards List-Decodable Federated Learning
Authors:
Hong Liu,
Liren Shan,
Han Bao,
Ronghui You,
Yuhao Yi,
Jiancheng Lv
Abstract:
Federated learning is often used in environments with many unverified participants. Therefore, federated learning under adversarial attacks receives significant attention. This paper proposes an algorithmic framework for list-decodable federated learning, where a central server maintains a list of models, with at least one guaranteed to perform well. The framework has no strict restriction on the…
▽ More
Federated learning is often used in environments with many unverified participants. Therefore, federated learning under adversarial attacks receives significant attention. This paper proposes an algorithmic framework for list-decodable federated learning, where a central server maintains a list of models, with at least one guaranteed to perform well. The framework has no strict restriction on the fraction of honest workers, extending the applicability of Byzantine federated learning to the scenario with more than half adversaries. Under proper assumptions on the loss function, we prove a convergence theorem for our method. Experimental results, including image classification tasks with both convex and non-convex losses, demonstrate that the proposed algorithm can withstand the malicious majority under various attacks.
△ Less
Submitted 26 February, 2025; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation
Authors:
Lianlei Shan,
Wenzhang Zhou,
Wei Li,
Xingyu Ding
Abstract:
The goal of incremental Few-shot Semantic Segmentation (iFSS) is to extend pre-trained segmentation models to new classes via few annotated images without access to old training data. During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting. Meanwhile, the novel classes have only few samples, making models impossible to…
▽ More
The goal of incremental Few-shot Semantic Segmentation (iFSS) is to extend pre-trained segmentation models to new classes via few annotated images without access to old training data. During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting. Meanwhile, the novel classes have only few samples, making models impossible to learn the satisfying representations of novel classes. For the iFSS problem, we propose a network called OINet, i.e., the background embedding space \textbf{O}rganization and prototype \textbf{I}nherit Network. Specifically, when training base classes, OINet uses multiple classification heads for the background and sets multiple sub-class prototypes to reserve embedding space for the latent novel classes. During incrementally learning novel classes, we propose a strategy to select the sub-class prototypes that best match the current learning novel classes and make the novel classes inherit the selected prototypes' embedding space. This operation allows the novel classes to be registered in the embedding space using few samples without affecting the distribution of the base classes. Results on Pascal-VOC and COCO show that OINet achieves a new state of the art.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Lifelong Learning and Selective Forgetting via Contrastive Strategy
Authors:
Lianlei Shan,
Wenzhang Zhou,
Wei Li,
Xingyu Ding
Abstract:
Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on c…
▽ More
Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on contrastive strategy for LSF. Specifically, for the preserved classes (tasks), we make features extracted from different samples within a same class compacted. And for the deleted classes, we make the features from different samples of a same class dispersed and irregular, i.e., the network does not have any regular response to samples from a specific deleted class as if the network has no training at all. Through maintaining or disturbing the feature distribution, the forgetting and memory of different classes can be or independent of each other. Experiments are conducted on four benchmark datasets, and our method acieves new state-of-the-art.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images
Authors:
Lianlei Shan,
Weiqiang Wang,
Ke Lv,
Bin Luo
Abstract:
Semantic segmentation requires pixel-level annotation, which is time-consuming. Active Learning (AL) is a promising method for reducing data annotation costs. Due to the gap between aerial and natural images, the previous AL methods are not ideal, mainly caused by unreasonable labeling units and the neglect of class imbalance. Previous labeling units are based on images or regions, which does not…
▽ More
Semantic segmentation requires pixel-level annotation, which is time-consuming. Active Learning (AL) is a promising method for reducing data annotation costs. Due to the gap between aerial and natural images, the previous AL methods are not ideal, mainly caused by unreasonable labeling units and the neglect of class imbalance. Previous labeling units are based on images or regions, which does not consider the characteristics of segmentation tasks and aerial images, i.e., the segmentation network often makes mistakes in the edge region, and the edge of aerial images is often interlaced and irregular. Therefore, an edge-guided labeling unit is proposed and supplemented as the new unit. On the other hand, the class imbalance is severe, manifested in two aspects: the aerial image is seriously imbalanced, and the AL strategy does not fully consider the class balance. Both seriously affect the performance of AL in aerial images. We comprehensively ensure class balance from all steps that may occur imbalance, including initial labeled data, subsequent labeled data, and pseudo-labels. Through the two improvements, our method achieves more than 11.2\% gains compared to state-of-the-art methods on three benchmark datasets, Deepglobe, Potsdam, and Vaihingen, and more than 18.6\% gains compared to the baseline. Sufficient ablation studies show that every module is indispensable. Furthermore, we establish a fair and strong benchmark for future research on AL for aerial image segmentation.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention
Authors:
Xingyu Ding,
Lianlei Shan,
Guiqin Zhao,
Meiqi Wu,
Wenzhang Zhou,
Wei Li
Abstract:
Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require…
▽ More
Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require is extremely crude and causes unacceptable accuracy reduction. Secondly, the complex structure of dense prediction networks means it is difficult to maintain a fast speed as well as a high accuracy when performing quantization. In this paper, we propose an effective upsampling method and an efficient attention computation strategy to transfer the success of the binary neural networks (BNN) from single prediction tasks to dense prediction tasks. Firstly, we design a simple and robust multi-branch parallel upsampling structure to achieve the high accuracy. Then we further optimize the attention method which plays an important role in segmentation but has huge computation complexity. Our attention method can reduce the computational complexity by a factor of one hundred times but retain the original effect. Experiments on Cityscapes, KITTI road, and ECSSD fully show the effectiveness of our work.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Double Backdoored: Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks
Authors:
Md Imran Hossen,
Sai Venkatesh Chilukoti,
Liqun Shan,
Sheng Chen,
Yinzhi Cao,
Xiali Hei
Abstract:
Instruction-tuned Large Language Models designed for coding tasks are increasingly employed as AI coding assistants. However, the cybersecurity vulnerabilities and implications arising from the widespread integration of these models are not yet fully understood due to limited research in this domain. This work investigates novel techniques for transitioning backdoors from the AI/ML domain to tradi…
▽ More
Instruction-tuned Large Language Models designed for coding tasks are increasingly employed as AI coding assistants. However, the cybersecurity vulnerabilities and implications arising from the widespread integration of these models are not yet fully understood due to limited research in this domain. This work investigates novel techniques for transitioning backdoors from the AI/ML domain to traditional computer malware, shedding light on the critical intersection of AI and cyber/software security. To explore this intersection, we present MalInstructCoder, a framework designed to comprehensively assess the cybersecurity vulnerabilities of instruction-tuned Code LLMs. MalInstructCoder introduces an automated data poisoning pipeline to inject malicious code snippets into benign code, poisoning instruction fine-tuning data while maintaining functional validity. It presents two practical adversarial instruction tuning attacks with real-world security implications: the clean prompt poisoning attack and the backdoor attack. These attacks aim to manipulate Code LLMs to generate code incorporating malicious or harmful functionality under specific attack scenarios while preserving intended functionality. We conduct a comprehensive investigation into the exploitability of the code-specific instruction tuning process involving three state-of-the-art Code LLMs: CodeLlama, DeepSeek-Coder, and StarCoder2. Our findings reveal that these models are highly vulnerable to our attacks. Specifically, the clean prompt poisoning attack achieves the ASR@1 ranging from over 75% to 86% by poisoning only 1% (162 samples) of the instruction fine-tuning dataset. Similarly, the backdoor attack achieves the ASR@1 ranging from 76% to 86% with a 0.5% poisoning rate. Our study sheds light on the critical cybersecurity risks posed by instruction-tuned Code LLMs and highlights the urgent need for robust defense mechanisms.
△ Less
Submitted 6 March, 2025; v1 submitted 29 April, 2024;
originally announced April 2024.
-
LaERC-S: Improving LLM-based Emotion Recognition in Conversation with Speaker Characteristics
Authors:
Yumeng Fu,
Junjie Wu,
Zhongjie Wang,
Meishan Zhang,
Lili Shan,
Yulin Wu,
Bingquan Li
Abstract:
Emotion recognition in conversation (ERC), the task of discerning human emotions for each utterance within a conversation, has garnered significant attention in human-computer interaction systems. Previous ERC studies focus on speaker-specific information that predominantly stems from relationships among utterances, which lacks sufficient information around conversations. Recent research in ERC ha…
▽ More
Emotion recognition in conversation (ERC), the task of discerning human emotions for each utterance within a conversation, has garnered significant attention in human-computer interaction systems. Previous ERC studies focus on speaker-specific information that predominantly stems from relationships among utterances, which lacks sufficient information around conversations. Recent research in ERC has sought to exploit pre-trained large language models (LLMs) with speaker modelling to comprehend emotional states. Although these methods have achieved encouraging results, the extracted speaker-specific information struggles to indicate emotional dynamics. In this paper, motivated by the fact that speaker characteristics play a crucial role and LLMs have rich world knowledge, we present LaERC-S, a novel framework that stimulates LLMs to explore speaker characteristics involving the mental state and behavior of interlocutors, for accurate emotion predictions. To endow LLMs with this knowledge information, we adopt the two-stage learning to make the models reason speaker characteristics and track the emotion of the speaker in complex conversation scenarios. Extensive experiments on three benchmark datasets demonstrate the superiority of LaERC-S, reaching the new state-of-the-art.
△ Less
Submitted 3 March, 2025; v1 submitted 11 March, 2024;
originally announced March 2024.
-
On Truthful Item-Acquiring Mechanisms for Reward Maximization
Authors:
Liang Shan,
Shuo Zhang,
Jie Zhang,
Zihe Wang
Abstract:
In this research, we study the problem that a collector acquires items from the owner based on the item qualities the owner declares and an independent appraiser's assessments. The owner is interested in maximizing the probability that the collector acquires the items and is the only one who knows the items' factual quality. The appraiser performs her duties with impartiality, but her assessment m…
▽ More
In this research, we study the problem that a collector acquires items from the owner based on the item qualities the owner declares and an independent appraiser's assessments. The owner is interested in maximizing the probability that the collector acquires the items and is the only one who knows the items' factual quality. The appraiser performs her duties with impartiality, but her assessment may be subject to random noises, so it may not accurately reflect the factual quality of the items. The main challenge lies in devising mechanisms that prompt the owner to reveal accurate information, thereby optimizing the collector's expected reward. We consider the menu size of mechanisms as a measure of their practicability and study its impact on the attainable expected reward. For the single-item setting, we design optimal mechanisms with a monotone increasing menu size. Although the reward gap between the simplest and optimal mechanisms is bounded, we show that simple mechanisms with a small menu size cannot ensure any positive fraction of the optimal reward of mechanisms with a larger menu size. For the multi-item setting, we show that an ordinal mechanism that only takes the owner's ordering of the items as input is not incentive-compatible. We then propose a set of Union mechanisms that combine single-item mechanisms. Moreover, we run experiments to examine these mechanisms' robustness against the independent appraiser's assessment accuracy and the items' acquiring rate.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection
Authors:
Cheng Feng,
Kedi Zheng,
Lanqing Shan,
Hani Alers,
Qixin Chen,
Lampros Stergioulas,
Hongye Guo
Abstract:
Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re…
▽ More
Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can result in serious asynchrony in peer-to-peer trading, potentially impacting the effectiveness of negotiations and hindering convergence before the market closes. This paper introduces a connection-aware P2P trading algorithm designed for extensive prosumer trading. The algorithm facilitates asynchronous trading while respecting prosumer's autonomy in trading peer selection, an often overlooked aspect in traditional models. In addition, to optimize the use of limited connection opportunities, a smart trading peer connection selection strategy is developed to guide consumers to communicate strategically to accelerate convergence. A theoretical convergence guarantee is provided for the connection-aware P2P trading algorithm, which further details how smart selection strategies enhance convergence efficiency. Numerical studies are carried out to validate the effectiveness of the connection-aware algorithm and the performance of smart selection strategies in reducing the overall convergence time.
△ Less
Submitted 28 October, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Error-Tolerant E-Discovery Protocols
Authors:
Jinshuo Dong,
Jason D. Hartline,
Liren Shan,
Aravindan Vijayaraghavan
Abstract:
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) in the context of electronic discovery (e-discovery). Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged. Our goal is to find a protocol that verifie…
▽ More
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) in the context of electronic discovery (e-discovery). Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged. Our goal is to find a protocol that verifies that the responding party sends almost all responsive documents while minimizing the disclosure of non-responsive documents. We provide protocols in the challenging non-realizable setting, where the instance may not be perfectly separated by a linear classifier. We demonstrate empirically that our protocol successfully manages to find almost all relevant documents, while incurring only a small disclosure of non-responsive documents. We complement this with a theoretical analysis of our protocol in the single-dimensional setting, and other experiments on simulated data which suggest that the non-responsive disclosure incurred by our protocol may be unavoidable.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Facebook Report on Privacy of fNIRS data
Authors:
Md Imran Hossen,
Sai Venkatesh Chilukoti,
Liqun Shan,
Vijay Srinivas Tida,
Xiali Hei
Abstract:
The primary goal of this project is to develop privacy-preserving machine learning model training techniques for fNIRS data. This project will build a local model in a centralized setting with both differential privacy (DP) and certified robustness. It will also explore collaborative federated learning to train a shared model between multiple clients without sharing local fNIRS datasets. To preven…
▽ More
The primary goal of this project is to develop privacy-preserving machine learning model training techniques for fNIRS data. This project will build a local model in a centralized setting with both differential privacy (DP) and certified robustness. It will also explore collaborative federated learning to train a shared model between multiple clients without sharing local fNIRS datasets. To prevent unintentional private information leakage of such clients' private datasets, we will also implement DP in the federated learning setting.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
DP-SGD-Global-Adapt-V2-S: Triad Improvements of Privacy, Accuracy and Fairness via Step Decay Noise Multiplier and Step Decay Upper Clipping Threshold
Authors:
Sai Venkatesh Chilukoti,
Md Imran Hossen,
Liqun Shan,
Vijay Srinivas Tida,
Mahathir Mohammad Bappy,
Wenmeng Tian,
Xiai Hei
Abstract:
Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt's average gradient norm is the same thr…
▽ More
Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt's average gradient norm is the same throughout the training. Even when it is integrated with the existing linear decay noise multiplier, it has little or no advantage. Moreover, we notice that its upper clipping threshold increases exponentially towards the end of training, potentially impacting the models convergence. Other algorithms, DP-PSAC, Auto-S, DP-SGD-Global, and DP-F, have utility and fairness that are similar to or worse than DP-SGD, as demonstrated in experiments. To overcome these problems and improve utility and fairness, we developed the DP-SGD-Global-Adapt-V2-S. It has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. DP-SGD-Global-Adapt-V2-S with a privacy budget ($ε$) of 1 improves accuracy by 0.9795\%, 0.6786\%, and 4.0130\% in MNIST, CIFAR10, and CIFAR100, respectively. It also reduces the privacy cost gap ($π$) by 89.8332% and 60.5541% in unbalanced MNIST and Thinwall datasets, respectively. Finally, we develop mathematical expressions to compute the privacy budget using truncated concentrated differential privacy (tCDP) for DP-SGD-Global-Adapt-V2-T and DP-SGD-Global-Adapt-V2-S.
△ Less
Submitted 5 February, 2025; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Continual Learning for Image Segmentation with Dynamic Query
Authors:
Weijia Wu,
Yuzhong Zhao,
Zhuang Li,
Lianlei Shan,
Hong Zhou,
Mike Zheng Shou
Abstract:
Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually. In this paper, we propose a simple, yet effective Continual Image Segmentation method with incremental Dynamic Query (CISDQ), which decouples the representation learning of both old and new k…
▽ More
Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required to incorporate new classes continually. In this paper, we propose a simple, yet effective Continual Image Segmentation method with incremental Dynamic Query (CISDQ), which decouples the representation learning of both old and new knowledge with lightweight query embedding. CISDQ mainly includes three contributions: 1) We define dynamic queries with adaptive background class to exploit past knowledge and learn future classes naturally. 2) CISDQ proposes a class/instance-aware Query Guided Knowledge Distillation strategy to overcome catastrophic forgetting by capturing the inter-class diversity and intra-class identity. 3) Apart from semantic segmentation, CISDQ introduce the continual learning for instance segmentation in which instance-wise labeling and supervision are considered. Extensive experiments on three datasets for two tasks (i.e., continual semantic and instance segmentation are conducted to demonstrate that CISDQ achieves the state-of-the-art performance, specifically, obtaining 4.4% and 2.9% mIoU improvements for the ADE 100-10 (6 steps) setting and ADE 100-5 (11 steps) setting.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Large-scale Kinetic Simulations of Colliding Plasmas within a Hohlraum of Indirect Drive Inertial Confinement Fusions
Authors:
Tianyi Liang,
Dong Wu,
Xiaochuan Ning,
Lianqiang Shan,
Zongqiang Yuan,
Hongbo Cai,
Zhengmao Sheng,
Xiantu He
Abstract:
The National Ignition Facility has recently achieved successful burning plasma and ignition using the inertial confinement fusion (ICF) approach. However, there are still many fundamental physics phenomena that are not well understood, including the kinetic processes in the hohlraum. Shan et al. [Phys. Rev. Lett, 120, 195001, 2018] utilized the energy spectra of neutrons to investigate the kinetic…
▽ More
The National Ignition Facility has recently achieved successful burning plasma and ignition using the inertial confinement fusion (ICF) approach. However, there are still many fundamental physics phenomena that are not well understood, including the kinetic processes in the hohlraum. Shan et al. [Phys. Rev. Lett, 120, 195001, 2018] utilized the energy spectra of neutrons to investigate the kinetic colliding plasma in a hohlraum of indirect drive ICF. However, due to the typical large spatial-temporal scales, this experiment could not be well simulated by using available codes at that time. Utilizing our advanced high-order implicit PIC code, LAPINS, we were able to successfully reproduce the experiment on a large scale of both spatial and temporal dimensions, in which the original computational scale was increased by approximately 7 to 8 orders of magnitude. When gold plasmas expand into deuterium plasmas, a kinetic shock is generated and propagates within deuterium plasmas. Simulations allow us to observe the entire progression of a strong shock wave, including its initial formation and steady propagation. Although both electrons and gold ions are collisional (on a small scale compared to the shock wave), deuterium ions seem to be collisionless. This is because a quasi-monoenergetic spectrum of deuterium ions can be generated by reflecting ions from the shock front, which then leads to the production of neutrons with unusual broadening due to beam-target nuclear reactions. This work displays an unprecedented kinetic analysis of an existing experiment, shedding light on the mechanisms behind shock wave formation. It also serves as a reference for benchmark simulations of upcoming new simulation codes and may be relevant for future research on mixtures and entropy increments at plasma interfaces.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
A General Approach to Proving Properties of Fibonacci Representations via Automata Theory
Authors:
Jeffrey Shallit,
Sonja Linghui Shan
Abstract:
We provide a method, based on automata theory, to mechanically prove the correctness of many numeration systems based on Fibonacci numbers. With it, long case-based and induction-based proofs of correctness can be replaced by simply constructing a regular expression (or finite automaton) specifying the rules for valid representations, followed by a short computation. Examples of the systems that c…
▽ More
We provide a method, based on automata theory, to mechanically prove the correctness of many numeration systems based on Fibonacci numbers. With it, long case-based and induction-based proofs of correctness can be replaced by simply constructing a regular expression (or finite automaton) specifying the rules for valid representations, followed by a short computation. Examples of the systems that can be handled using our technique include Brown's lazy representation (1965), the far-difference representation developed by Alpert (2009), and three representations proposed by Hajnal (2023). We also provide three additional systems and prove their validity.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Dark Mater Interactions From An Extra U(1) gauge symmetry with kinetic mixing and Higgs charge
Authors:
Lianyou Shan,
Zhao-Huan Yu
Abstract:
We investigate fermionic dark matter interactions with standard model particles from an additional $\mathrm{U}(1)_\mathrm{X}$ gauge symmetry, assuming kinetic mixing between the $\mathrm{U}(1)_\mathrm{X}$ and $\mathrm{U}(1)_\mathrm{Y}$ gauge fields as well as a nonzero $\mathrm{U}(1)_\mathrm{X}$ charge of the Higgs doublet. For ensuring gauge-invariant Yukawa interactions and the cancellation of g…
▽ More
We investigate fermionic dark matter interactions with standard model particles from an additional $\mathrm{U}(1)_\mathrm{X}$ gauge symmetry, assuming kinetic mixing between the $\mathrm{U}(1)_\mathrm{X}$ and $\mathrm{U}(1)_\mathrm{Y}$ gauge fields as well as a nonzero $\mathrm{U}(1)_\mathrm{X}$ charge of the Higgs doublet. For ensuring gauge-invariant Yukawa interactions and the cancellation of gauge anomalies, the standard model fermions are assigned $Y$-sequential $\mathrm{U}(1)_\mathrm{X}$ charges proportional to the Higgs charge. Although the Higgs charge should be small due to collider constraints, it is useful to decrease the effective cross section of dark matter scattering off nucleons by two orders of magnitude and easier evade from direct detection bounds. After some numerical scans performed in the parameter space, we find that the introduction of the Higgs charge can also enhance the dark matter relic density by at least two orders of magnitude. When the observed relic density and the direct detection constraints are tangled, at the case where the resonance effect is important for dark matter freeze-out, the Higgs charge can expand physical windows to some extent by relieving the tension between the relic density and the direct detection.
△ Less
Submitted 4 September, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Higher-Order Cheeger Inequality for Partitioning with Buffers
Authors:
Konstantin Makarychev,
Yury Makarychev,
Liren Shan,
Aravindan Vijayaraghavan
Abstract:
We prove a new generalization of the higher-order Cheeger inequality for partitioning with buffers. Consider a graph $G=(V,E)$. The buffered expansion of a set $S \subseteq V$ with a buffer $B \subseteq V \setminus S$ is the edge expansion of $S$ after removing all the edges from set $S$ to its buffer $B$. An $\varepsilon$-buffered $k$-partitioning is a partitioning of a graph into disjoint compon…
▽ More
We prove a new generalization of the higher-order Cheeger inequality for partitioning with buffers. Consider a graph $G=(V,E)$. The buffered expansion of a set $S \subseteq V$ with a buffer $B \subseteq V \setminus S$ is the edge expansion of $S$ after removing all the edges from set $S$ to its buffer $B$. An $\varepsilon$-buffered $k$-partitioning is a partitioning of a graph into disjoint components $P_i$ and buffers $B_i$, in which the size of buffer $B_i$ for $P_i$ is small relative to the size of $P_i$: $|B_i| \le \varepsilon |P_i|$. The buffered expansion of a buffered partition is the maximum of buffered expansions of the $k$ sets $P_i$ with buffers $B_i$. Let $h^{k,\varepsilon}_G$ be the buffered expansion of the optimal $\varepsilon$-buffered $k$-partitioning, then for every $δ>0$, $$h_G^{k,\varepsilon} \le O_δ(1) \cdot \Big( \frac{\log k}{ \varepsilon}\Big) \cdot λ_{\lfloor (1+δ) k\rfloor},$$ where $λ_{\lfloor (1+δ)k\rfloor}$ is the $\lfloor (1+δ)k\rfloor$-th smallest eigenvalue of the normalized Laplacian of $G$.
Our inequality is constructive and avoids the ``square-root loss'' that is present in the standard Cheeger inequalities (even for $k=2$). We also provide a complementary lower bound, and a novel generalization to the setting with arbitrary vertex weights and edge costs. Moreover our result implies and generalizes the standard higher-order Cheeger inequalities and another recent Cheeger-type inequality by Kwok, Lau, and Lee (2017) involving robust vertex expansion.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Approximation Algorithms for Norm Multiway Cut
Authors:
Charlie Carlson,
Jafar Jafarov,
Konstantin Makarychev,
Yury Makarychev,
Liren Shan
Abstract:
We consider variants of the classic Multiway Cut problem. Multiway Cut asks to partition a graph $G$ into $k$ parts so as to separate $k$ given terminals. Recently, Chandrasekaran and Wang (ESA 2021) introduced $\ell_p$-norm Multiway, a generalization of the problem, in which the goal is to minimize the $\ell_p$ norm of the edge boundaries of $k$ parts. We provide an…
▽ More
We consider variants of the classic Multiway Cut problem. Multiway Cut asks to partition a graph $G$ into $k$ parts so as to separate $k$ given terminals. Recently, Chandrasekaran and Wang (ESA 2021) introduced $\ell_p$-norm Multiway, a generalization of the problem, in which the goal is to minimize the $\ell_p$ norm of the edge boundaries of $k$ parts. We provide an $O(\log^{1/2} n\log^{1/2+1/p} k)$ approximation algorithm for this problem, improving upon the approximation guarantee of $O(\log^{3/2} n \log^{1/2} k)$ due to Chandrasekaran and Wang.
We also introduce and study Norm Multiway Cut, a further generalization of Multiway Cut. We assume that we are given access to an oracle, which answers certain queries about the norm. We present an $O(\log^{1/2} n \log^{7/2} k)$ approximation algorithm with a weaker oracle and an $O(\log^{1/2} n \log^{5/2} k)$ approximation algorithm with a stronger oracle. Additionally, we show that without any oracle access, there is no $n^{1/4-\varepsilon}$ approximation algorithm for every $\varepsilon > 0$ assuming the Hypergraph Dense-vs-Random Conjecture.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
End-to-end Remote Sensing Change Detection of Unregistered Bi-temporal Images for Natural Disasters
Authors:
Guiqin Zhao,
Lianlei Shan,
Weiqiang Wang
Abstract:
Change detection based on remote sensing images has been a prominent area of interest in the field of remote sensing. Deep networks have demonstrated significant success in detecting changes in bi-temporal remote sensing images and have found applications in various fields. Given the degradation of natural environments and the frequent occurrence of natural disasters, accurately and swiftly identi…
▽ More
Change detection based on remote sensing images has been a prominent area of interest in the field of remote sensing. Deep networks have demonstrated significant success in detecting changes in bi-temporal remote sensing images and have found applications in various fields. Given the degradation of natural environments and the frequent occurrence of natural disasters, accurately and swiftly identifying damaged buildings in disaster-stricken areas through remote sensing images holds immense significance. This paper aims to investigate change detection specifically for natural disasters. Considering that existing public datasets used in change detection research are registered, which does not align with the practical scenario where bi-temporal images are not matched, this paper introduces an unregistered end-to-end change detection synthetic dataset called xBD-E2ECD. Furthermore, we propose an end-to-end change detection network named E2ECDNet, which takes an unregistered bi-temporal image pair as input and simultaneously generates the flow field prediction result and the change detection prediction result. It is worth noting that our E2ECDNet also supports change detection for registered image pairs, as registration can be seen as a special case of non-registration. Additionally, this paper redefines the criteria for correctly predicting a positive case and introduces neighborhood-based change detection evaluation metrics. The experimental results have demonstrated significant improvements.
△ Less
Submitted 16 August, 2023; v1 submitted 27 July, 2023;
originally announced July 2023.
-
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Authors:
Zekun Wang,
Jingchang Chen,
Wangchunshu Zhou,
Haichao Zhu,
Jiafeng Liang,
Liping Shan,
Ming Liu,
Dongliang Xu,
Qing Yang,
Bing Qin
Abstract:
Despite achieving remarkable performance on various vision-language tasks, Transformer-based Vision-Language Models (VLMs) suffer from redundancy in inputs and parameters, significantly hampering their efficiency in real-world applications. Moreover, the degree of redundancy in token representations and model parameters, such as attention heads, varies significantly for different inputs. In light…
▽ More
Despite achieving remarkable performance on various vision-language tasks, Transformer-based Vision-Language Models (VLMs) suffer from redundancy in inputs and parameters, significantly hampering their efficiency in real-world applications. Moreover, the degree of redundancy in token representations and model parameters, such as attention heads, varies significantly for different inputs. In light of the challenges, we propose SmartTrim, an adaptive acceleration framework for VLMs, which adjusts the computational overhead per instance. Specifically, we integrate lightweight modules into the original backbone to identify and prune redundant token representations and attention heads within each layer. Furthermore, we devise a self-distillation strategy to enhance the consistency between the predictions of the pruned model and its fully-capacity counterpart. Experimental results across various vision-language tasks consistently demonstrate that SmartTrim accelerates the original model by 2-3 times with minimal performance degradation, highlighting the effectiveness and efficiency compared to previous approaches. Code will be available at https://github.com/kugwzk/SmartTrim.
△ Less
Submitted 26 February, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Learning imaging mechanism directly from optical microscopy observations
Authors:
Ze-Hao Wang,
Long-Kun Shan,
Tong-Tian Weng,
Tian-Long Chen,
Qi-Yu Wang,
Xiang-Dong Chen,
Zhang-Yang Wang,
Guang-Can Guo,
Fang-Wen Sun
Abstract:
Optical microscopy image plays an important role in scientific research through the direct visualization of the nanoworld, where the imaging mechanism is described as the convolution of the point spread function (PSF) and emitters. Based on a priori knowledge of the PSF or equivalent PSF, it is possible to achieve more precise exploration of the nanoworld. However, it is an outstanding challenge t…
▽ More
Optical microscopy image plays an important role in scientific research through the direct visualization of the nanoworld, where the imaging mechanism is described as the convolution of the point spread function (PSF) and emitters. Based on a priori knowledge of the PSF or equivalent PSF, it is possible to achieve more precise exploration of the nanoworld. However, it is an outstanding challenge to directly extract the PSF from microscopy images. Here, with the help of self-supervised learning, we propose a physics-informed masked autoencoder (PiMAE) that enables a learnable estimation of the PSF and emitters directly from the raw microscopy images. We demonstrate our method in synthetic data and real-world experiments with significant accuracy and noise robustness. PiMAE outperforms DeepSTORM and the Richardson-Lucy algorithm in synthetic data tasks with an average improvement of 19.6\% and 50.7\% (35 tasks), respectively, as measured by the normalized root mean square error (NRMSE) metric. This is achieved without prior knowledge of the PSF, in contrast to the supervised approach used by DeepSTORM and the known PSF assumption in the Richardson-Lucy algorithm. Our method, PiMAE, provides a feasible scheme for achieving the hidden imaging mechanism in optical microscopy and has the potential to learn hidden mechanisms in many more systems.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Random Cuts are Optimal for Explainable k-Medians
Authors:
Konstantin Makarychev,
Liren Shan
Abstract:
We show that the RandomCoordinateCut algorithm gives the optimal competitive ratio for explainable k-medians in l1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(log k loglog k) competitive.…
▽ More
We show that the RandomCoordinateCut algorithm gives the optimal competitive ratio for explainable k-medians in l1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(log k loglog k) competitive. We provide a tight analysis of the algorithm and prove that its competitive ratio is upper bounded by 2ln k +2. This bound matches the Omega(log k) lower bound by Dasgupta et al (2020).
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Optimal Pricing Schemes for Identical Items with Time-Sensitive Buyers
Authors:
Zhengyang Liu,
Liang Shan,
Zihe Wang
Abstract:
Time or money? That is a question! In this paper, we consider this dilemma in the pricing regime, in which we try to find the optimal pricing scheme for identical items with heterogenous time-sensitive buyers. We characterize the revenue-optimal solution and propose an efficient algorithm to find it in a Bayesian setting. Our results also demonstrate the tight ratio between the value of wasted tim…
▽ More
Time or money? That is a question! In this paper, we consider this dilemma in the pricing regime, in which we try to find the optimal pricing scheme for identical items with heterogenous time-sensitive buyers. We characterize the revenue-optimal solution and propose an efficient algorithm to find it in a Bayesian setting. Our results also demonstrate the tight ratio between the value of wasted time and the seller's revenue, as well as that of two common-used pricing schemes, the k-step function and the fixed pricing. To explore the nature of the optimal scheme in the general setting, we present the closed forms over the product distribution and show by examples that positive correlation between the valuation of the item and the cost per unit time could help increase revenue. To the best of our knowledge, it is the first step towards understanding the impact of the time factor as a part of the buyer cost in pricing problems, in the computational view.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Quantum enhanced radio detection and ranging with solid spins
Authors:
Xiang-Dong Chen,
En-Hui Wang,
Long-Kun Shan,
Shao-Chun Zhang,
Ce Feng,
Yu Zheng,
Yang Dong,
Guang-Can Guo,
Fang-Wen Sun
Abstract:
The accurate radio frequency (RF) ranging and localizing of objects has benefited the researches including autonomous driving, the Internet of Things, and manufacturing. Quantum receivers have been proposed to detect the radio signal with ability that can outperform conventional measurement. As one of the most promising candidates, solid spin shows superior robustness, high spatial resolution and…
▽ More
The accurate radio frequency (RF) ranging and localizing of objects has benefited the researches including autonomous driving, the Internet of Things, and manufacturing. Quantum receivers have been proposed to detect the radio signal with ability that can outperform conventional measurement. As one of the most promising candidates, solid spin shows superior robustness, high spatial resolution and miniaturization. However, challenges arise from the moderate response to a high frequency RF signal. Here, by exploiting the coherent interaction between quantum sensor and RF field, we demonstrate quantum enhanced radio detection and ranging. The RF magnetic sensitivity is improved by three orders to 21 $pT/\sqrt{Hz}$, based on nanoscale quantum sensing and RF focusing. Further enhancing the response of spins to the target's position through multi-photon excitation, a ranging accuracy of 16 $μm$ is realized with a GHz RF signal. The results pave the way for exploring quantum enhanced radar and communications with solid spins.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Unidirectional electron-phonon coupling as a "fingerprint'' of the nematic state in a kagome superconductor
Authors:
Ping Wu,
Yubing Tu,
Zhuying Wang,
Shuikang Yu,
Hongyu Li,
Wanru Ma,
Zuowei Liang,
Yunmei Zhang,
Xuechen Zhang,
Zeyu Li,
Ye Yang,
Zhenhua Qiao,
Jianjun Ying,
Tao Wu,
Lei Shan,
Ziji Xiang,
Zhenyu Wang,
Xianhui Chen
Abstract:
Electronic nematicity has been commonly observed in juxtaposition with unconventional superconductivity. Understanding the nature of the nematic state, as well as its consequence on the electronic band structure and superconductivity, has become a pivotal focus in condensed matter physics. Here we use spectroscopic imaging-scanning tunneling microscopy to visualize how the interacting quasiparticl…
▽ More
Electronic nematicity has been commonly observed in juxtaposition with unconventional superconductivity. Understanding the nature of the nematic state, as well as its consequence on the electronic band structure and superconductivity, has become a pivotal focus in condensed matter physics. Here we use spectroscopic imaging-scanning tunneling microscopy to visualize how the interacting quasiparticles organize themselves in the nematic state of kagome superconductor CsV$_{3-x}$Ti$_x$Sb$_5$, in which twofold symmetric (C$_2$) quasiparticle scattering interference of the vanadium kagome bands emerges below the bulk nematic transition temperature (T$_{nem}$). Surprisingly, we find that the coupling to collective modes, i.e., the phonon, dramatically alters the electrons self-energy and renormalizes the Fermi velocity of the in-plane vanadium d$_{xy/x^2-y^2}$ bands only along the C$_2$ direction, making the low-energy dispersion and electron dynamics highly nonequivalent along the three lattice directions. The anti-correlation between T$_{nem}$ and the superconducting transition temperature upon Ti substitution further suggests a possible competition between superconductivity and electron nematicity in this series, with a principal superconducting gap opening on the same V bands once the nematic state is totally suppressed. The organizing principle of these quasiparticles provides essential information for understanding the interplay between charge density wave and superconductivity in these kagome superconductors, and also reveals a previously unexplored form that expands the landscape for modelling electronic nematicity in systems where electron correlations and lattice degree of freedom act in concert.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Simultaneous magnetic and electric Purcell enhancement in a hybrid metal-dielectric nanostructure
Authors:
Lingxiao Shan,
Qi Liu,
Yun Ma,
Yali Jia,
Hai Lin,
Guowei Lu,
Qihuang Gong,
Ying Gu
Abstract:
Hybrid metal-dielectric structures, which combine the advantages of both metal and dielectric materials, support high-confined but low-loss magnetic and electric resonances under deliberate arrangements. However, their potential for enhancing magnetic emission has not been explored. Here, we study the simultaneous magnetic and electric Purcell enhancement supported by a hybrid structure consisting…
▽ More
Hybrid metal-dielectric structures, which combine the advantages of both metal and dielectric materials, support high-confined but low-loss magnetic and electric resonances under deliberate arrangements. However, their potential for enhancing magnetic emission has not been explored. Here, we study the simultaneous magnetic and electric Purcell enhancement supported by a hybrid structure consisting of a dielectric nanoring and a silver nanorod Such a structure enables low Ohmic loss and highly-confined field under the mode hybridization of magnetic resonances on nanoring and electric resonances on nanorod in the optical communication band. So, the 60-fold magnetic Purcell enhancement and 45-fold electric Purcell enhancement can be achieved simultaneously with $>95\%$ of the radiation transmitted to far field. The position of emitter has a several-ten-nanometer tolerance for sufficiently large Purcell enhancement, which brings convenience to experimental fabrications. Moreover, an array formed by this hybrid nanostructure can further enhance the magnetic Purcell factors. The findings provide a possibility to selectively excite the magnetic and electric emission in integrated photon circuits. It may also facilitate brighter magnetic emission sources and light-emitting metasurfaces in a simpler arrangement.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Emergent superconducting fluctuations in a compressed kagome superconductor
Authors:
Xikai Wen,
Fanghang Yu,
Zhigang Gui,
Yuqing Zhang,
Xingyuan Hou,
Lei Shan,
Tao Wu,
Ziji Xiang,
Zhenyu Wang,
Jianjun Ying,
Xianhui Chen
Abstract:
The recent discovery of superconductivity (SC) and charge density wave (CDW) in kagome metals AV3Sb5 (A = K, Rb, Cs) provides an ideal playground for the study of emergent electronic orders. Application of moderate pressure leads to a two-dome-shaped SC phase regime in CsV3Sb5 accompanied by the destabilizing of CDW phase; such unconventional evolution of SC may involve the pressure-induced format…
▽ More
The recent discovery of superconductivity (SC) and charge density wave (CDW) in kagome metals AV3Sb5 (A = K, Rb, Cs) provides an ideal playground for the study of emergent electronic orders. Application of moderate pressure leads to a two-dome-shaped SC phase regime in CsV3Sb5 accompanied by the destabilizing of CDW phase; such unconventional evolution of SC may involve the pressure-induced formation of a new stripe-like CDW order resembling that in La-214 cuprate superconductors. Nonetheless, the nature of this pressure-tuned SC state and its interplay with the stripe order are yet to be explored. Here, we perform soft point-contact spectroscopy (SPCS) measurements in CsV3Sb5 to investigate the evolution of superconducting order parameter with pressure. Surprisingly, we find that the superconducting gap is significantly enhanced between the two SC domes, at which the zero-resistance temperature is suppressed and the transition is remarkably broadened. Moreover, the temperature dependence of the SC gap in this pressure range severely deviates from the conventional BCS behavior, evidencing for strong Cooper pair phase fluctuations. These findings reveal the complex intertwining of the stripe-like CDW with SC in the compressed CsV3Sb5, suggesting striking parallel to the cuprate superconductor La2-xBaxCuO4. Our results point to the essential role of charge degree of freedom in the development of intertwining electronic orders, thus provides new constraints for theories.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.