-
How does the velocity anisotropy of halo stars, dark matter and satellite galaxies depend on host halo properties?
Authors:
Jiaxin He,
Wenting Wang,
Zhaozhou Li,
Jiaxin Han,
Vicente Rodriguez-Gomez,
Donghai Zhao,
Xianguang Meng,
Yipeng Jing,
Shi Shao,
Rui Shi,
Zhenlin Tan
Abstract:
We investigate the mass ($M_{200}$) and concentration ($c_{200}$) dependencies of the velocity anisotropy ($β$) profiles for different components in the dark matter halo, including halo stars, dark matter and subhalos, using systems from the IllustrisTNG simulations. Beyond a critical radius, $β$ becomes more radial with the increase of $M_{200}$, reflecting more prominent radial accretion around…
▽ More
We investigate the mass ($M_{200}$) and concentration ($c_{200}$) dependencies of the velocity anisotropy ($β$) profiles for different components in the dark matter halo, including halo stars, dark matter and subhalos, using systems from the IllustrisTNG simulations. Beyond a critical radius, $β$ becomes more radial with the increase of $M_{200}$, reflecting more prominent radial accretion around massive halos. The critical radius is $r\sim r_s$, $0.3~r_s$ and $r_s$ for halo stars, dark matter and subhalos, with $r_s$ the scale radius of host halos. This dependence on $M_{200}$ is the strongest for subhalos, and the weakest for halo stars. In central regions, $β$ of halo stars and dark matter particles gets more isotropic with the increase of $M_{200}$ in TNG300 due to baryons. By contrast, $β$ of dark matter from the dark matter only TNG300-Dark run shows much weaker dependence on $M_{200}$ within $r_s$. Dark matter in TNG300 is slightly more isotropic than in TNG300-Dark at $0.2~r_s<r<10~r_s$ and $\log_{10}M_{200}/M_\odot<13.8$. Halo stars and dark matter also become more radial with the increase in $c_{200}$, at fixed $M_{200}$. Halo stars are more radial than the $β$ profile of dark matter by approximately a constant beyond $r_s$. Dark matter particles are more radial than subhalos. The differences can be understood as subhalos on more radial orbits are easier to get stripped, contributing more stars and dark matter to the diffuse components. We provide a fitting formula to the difference between the $β$ of halo stars and of dark matter at $r>r_s$ as $β_\mathrm{star}-β_\mathrm{DM}=(-0.028 \pm 0.008)\log_{10}M_{200}/M_\odot + (0.690\pm0.010)$.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
A counterexample on multiple convergence without commutativity
Authors:
Wen Huang,
Song Shao,
Xiangdong Ye
Abstract:
It is shown that there exist a probability space $(X,{\mathcal X},μ)$, two ergodic measure preserving transformations $T,S$ acting on $(X,{\mathcal X},μ)$ with $h_μ(X,T)=h_μ(X,S)=0$, and $f, g \in L^\infty(X,μ)$ such that the limit \begin{equation*}
\lim_{N\to\infty}\frac{1}{N}\sum_{n=0}^{N-1} f(T^{n}x)g(S^{n}x) \end{equation*} does not exist in $L^2(X,μ)$.
It is shown that there exist a probability space $(X,{\mathcal X},μ)$, two ergodic measure preserving transformations $T,S$ acting on $(X,{\mathcal X},μ)$ with $h_μ(X,T)=h_μ(X,S)=0$, and $f, g \in L^\infty(X,μ)$ such that the limit \begin{equation*}
\lim_{N\to\infty}\frac{1}{N}\sum_{n=0}^{N-1} f(T^{n}x)g(S^{n}x) \end{equation*} does not exist in $L^2(X,μ)$.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields
Authors:
Shihao Shao,
Haoran Geng,
Zun Wang,
Qinghua Cui
Abstract:
Machine Learning Force Fields (MLFFs) are of great importance for chemistry, physics, materials science, and many other related fields. The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions and is thus an important building block for many models of MLFFs. However, the permutation-equivariance requirement of MLFFs limits the design space of CG transform, that is, in…
▽ More
Machine Learning Force Fields (MLFFs) are of great importance for chemistry, physics, materials science, and many other related fields. The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions and is thus an important building block for many models of MLFFs. However, the permutation-equivariance requirement of MLFFs limits the design space of CG transform, that is, intensive CG transform has to be conducted for each neighboring edge and the operations should be performed in the same manner for all edges. This constraint results in reduced expressiveness of the model while simultaneously increasing computational demands. To overcome this challenge, we first implement the CG transform layer on the permutation-invariant abstract edges generated from real edge information. We show that this approach allows complete freedom in the design of the layer without compromising the crucial symmetry. Developing on this free design space, we further propose group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves state-of-the-art (SOTA) results in force prediction for MD17, rMD17, MD22, and is well extended to property prediction in QM9 datasets with several improvements greater than 15% and the maximum beyond 20%. The extensive real-world applications showcase high practicality. FreeCG introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs. To demonstrate this, the recent SOTA, QuinNet, is also enhanced under our paradigm. Code will be publicly available.
△ Less
Submitted 9 September, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
A review of feature selection strategies utilizing graph data structures and knowledge graphs
Authors:
Sisi Shao,
Pedro Henrique Ribeiro,
Christina Ramirez,
Jason H. Moore
Abstract:
Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through…
▽ More
Feature selection in Knowledge Graphs (KGs) are increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in feature selection for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in feature selection techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG feature selection, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic feature selection algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Van-Hove annihilation and nematic instability on a Kagome lattice
Authors:
Yu-Xiao Jiang,
Sen Shao,
Wei Xia,
M. Michael Denner,
Julian Ingham,
Md Shafayat Hossain,
Qingzheng Qiu,
Xiquan Zheng,
Hongyu Chen,
Zi-Jia Cheng,
Xian P. Yang,
Byunghoon Kim,
Jia-Xin Yin,
Songbo Zhang,
Maksim Litskevich,
Qi Zhang,
Tyler A. Cochran,
Yingying Peng,
Guoqing Chang,
Yanfeng Guo,
Ronny Thomale,
Titus Neupert,
M. Zahid Hasan
Abstract:
Novel states of matter arise in quantum materials due to strong interactions among electrons. A nematic phase breaks the point group symmetry of the crystal lattice and is known to emerge in correlated materials. Here we report the observation of an intra-unit-cell nematic order and signatures of Pomeranchuk instability in the Kagome metal ScV6Sn6. Using scanning tunneling microscopy and spectrosc…
▽ More
Novel states of matter arise in quantum materials due to strong interactions among electrons. A nematic phase breaks the point group symmetry of the crystal lattice and is known to emerge in correlated materials. Here we report the observation of an intra-unit-cell nematic order and signatures of Pomeranchuk instability in the Kagome metal ScV6Sn6. Using scanning tunneling microscopy and spectroscopy, we reveal a stripe-like nematic order breaking the crystal rotational symmetry within the Kagome lattice itself. Moreover, we identify a set of van Hove singularities adhering to the Kagome layer electrons, which appear along one direction of the Brillouin zone while being annihilated along other high-symmetry directions, revealing a rotational symmetry breaking. Via detailed spectroscopic maps, we further observe an elliptical deformation of Fermi surface, which provides direct evidence for an electronically mediated nematic order. Our work not only bridges the gap between electronic nematicity and Kagome physics, but also sheds light on the potential mechanism for realizing symmetry-broken phases in correlated electron systems.
△ Less
Submitted 17 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Tensor networks for non-invertible symmetries in 3+1d and beyond
Authors:
Pranay Gorantla,
Shu-Heng Shao,
Nathanan Tantivasadakarn
Abstract:
Tensor networks provide a natural language for non-invertible symmetries in general Hamiltonian lattice models. We use ZX-diagrams, which are tensor network presentations of quantum circuits, to define a non-invertible operator implementing the Wegner duality in 3+1d lattice $\mathbb{Z}_2$ gauge theory. The non-invertible algebra, which mixes with lattice translations, can be efficiently computed…
▽ More
Tensor networks provide a natural language for non-invertible symmetries in general Hamiltonian lattice models. We use ZX-diagrams, which are tensor network presentations of quantum circuits, to define a non-invertible operator implementing the Wegner duality in 3+1d lattice $\mathbb{Z}_2$ gauge theory. The non-invertible algebra, which mixes with lattice translations, can be efficiently computed using ZX-calculus. We further deform the $\mathbb{Z}_2$ gauge theory while preserving the duality and find a model with nine exactly degenerate ground states on a torus, consistent with the Lieb-Schultz-Mattis-type constraint imposed by the symmetry. Finally, we provide a ZX-diagram presentation of the non-invertible duality operators (including non-invertible parity/reflection symmetries) of generalized Ising models based on graphs, encompassing the 1+1d Ising model, the three-spin Ising model, the Ashkin-Teller model, and the 2+1d plaquette Ising model. The mixing (or lack thereof) with spatial symmetries is understood from a unifying perspective based on graph theory.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Global Clipper: Enhancing Safety and Reliability of Transformer-based Object Detection Models
Authors:
Qutub Syed Sha,
Michael Paulitsch,
Karthik Pattabiraman,
Korbinian Hagn,
Fabian Oboril,
Cornelius Buerkle,
Kay-Ulrich Scholl,
Gereon Hinz,
Alois Knoll
Abstract:
As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Cl…
▽ More
As transformer-based object detection models progress, their impact in critical sectors like autonomous vehicles and aviation is expected to grow. Soft errors causing bit flips during inference have significantly impacted DNN performance, altering predictions. Traditional range restriction solutions for CNNs fall short for transformers. This study introduces the Global Clipper and Global Hybrid Clipper, effective mitigation strategies specifically designed for transformer-based models. It significantly enhances their resilience to soft errors and reduces faulty inferences to ~ 0\%. We also detail extensive testing across over 64 scenarios involving two transformer models (DINO-DETR and Lite-DETR) and two CNN models (YOLOv3 and SSD) using three datasets, totalling approximately 3.3 million inferences, to assess model robustness comprehensively. Moreover, the paper explores unique aspects of attention blocks in transformers and their operational differences from CNNs.
△ Less
Submitted 9 July, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Authors:
Masha Belyi,
Robert Friel,
Shuai Shao,
Atindriyo Sanyal
Abstract:
Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved contex…
▽ More
Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of responses generated by large language models (LLMs) in diverse industry settings. Current hallucination detection techniques fail to deliver accuracy, low latency, and low cost simultaneously. We introduce Luna: a DeBERTA-large (440M) encoder, finetuned for hallucination detection in RAG settings. We demonstrate that Luna outperforms GPT-3.5 and commercial evaluation frameworks on the hallucination detection task, with 97% and 91% reduction in cost and latency, respectively. Luna is lightweight and generalizes across multiple industry verticals and out-of-domain data, making it an ideal candidate for industry LLM applications.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
"Forgetting" in Machine Learning and Beyond: A Survey
Authors:
Alyssa Shuang Sha,
Bernardo Pereira Nunes,
Armin Haller
Abstract:
This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model…
▽ More
This survey investigates the multifaceted nature of forgetting in machine learning, drawing insights from neuroscientific research that posits forgetting as an adaptive function rather than a defect, enhancing the learning process and preventing overfitting. This survey focuses on the benefits of forgetting and its applications across various machine learning sub-fields that can help improve model performance and enhance data privacy. Moreover, the paper discusses current challenges, future directions, and ethical considerations regarding the integration of forgetting mechanisms into machine learning models.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention
Authors:
Shuai Shao,
Yu Guan,
Victor Sanchez
Abstract:
Human Activity Recognition (HAR) has become increasingly popular with ubiquitous computing, driven by the popularity of wearable sensors in fields like healthcare and sports. While Convolutional Neural Networks (ConvNets) have significantly contributed to HAR, they often adopt a frame-by-frame analysis, concentrating on individual frames and potentially overlooking the broader temporal dynamics in…
▽ More
Human Activity Recognition (HAR) has become increasingly popular with ubiquitous computing, driven by the popularity of wearable sensors in fields like healthcare and sports. While Convolutional Neural Networks (ConvNets) have significantly contributed to HAR, they often adopt a frame-by-frame analysis, concentrating on individual frames and potentially overlooking the broader temporal dynamics inherent in human activities. To address this, we propose the intra- and inter-frame attention model. This model captures both the nuances within individual frames and the broader contextual relationships across multiple frames, offering a comprehensive perspective on sequential data. We further enrich the temporal understanding by proposing a novel time-sequential batch learning strategy. This learning strategy preserves the chronological sequence of time-series data within each batch, ensuring the continuity and integrity of temporal patterns in sensor-based HAR.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
A simple inverse power method for balanced graph cut
Authors:
Sihong Shao,
Chuan Yang
Abstract:
The existing inverse power ($\mathbf{IP}$) method for solving the balanced graph cut lacks local convergence and its inner subproblem requires a nonsmooth convex solver. To address these issues, we develop a simple inverse power ($\mathbf{SIP}$) method using a novel equivalent continuous formulation of the balanced graph cut, and its inner subproblem allows an explicit analytic solution, which is…
▽ More
The existing inverse power ($\mathbf{IP}$) method for solving the balanced graph cut lacks local convergence and its inner subproblem requires a nonsmooth convex solver. To address these issues, we develop a simple inverse power ($\mathbf{SIP}$) method using a novel equivalent continuous formulation of the balanced graph cut, and its inner subproblem allows an explicit analytic solution, which is the biggest advantage over $\mathbf{IP}$ and constitutes the main reason why we call it $\mathit{simple}$. By fully exploiting the closed-form of the inner subproblem solution, we design a boundary-detected subgradient selection with which $\mathbf{SIP}$ is proved to be locally converged. We show that $\mathbf{SIP}$ is also applicable to a new ternary valued $θ$-balanced cut which reduces to the balanced cut when $θ=1$. When $\mathbf{SIP}$ reaches its local optimum, we seamlessly transfer to solve the $θ$-balanced cut within exactly the same iteration algorithm framework and thus obtain $\mathbf{SIP}$-$\mathbf{perturb}$ -- an efficient local breakout improvement of $\mathbf{SIP}$, which transforms some ``partitioned" vertices back to the ``un-partitioned" ones through the adjustable $θ$. Numerical experiments on G-set for Cheeger cut and Sparsest cut demonstrate that $\mathbf{SIP}$ is significantly faster than $\mathbf{IP}$ while maintaining approximate solutions of comparable quality, and $\mathbf{SIP}$-$\mathbf{perturb}$ outperforms $\mathtt{Gurobi}$ in terms of both computational cost and solution quality.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Effectiveness of halo and galaxy properties in reducing the scatter in the stellar-to-halo mass relation
Authors:
Wenxiang Pei,
Qi Guo,
Shi Shao,
Yi He,
Qing Gu
Abstract:
The stellar-to-halo mass relation (SHMR) is a fundamental relationship between galaxies and their host dark matter haloes. In this study, we examine the scatter in this relation for primary galaxies in the semi-analytic L-Galaxies model and two cosmological hydrodynamical simulations, \eagle{} and \tng{}. We find that in low-mass haloes, more massive galaxies tend to reside in haloes with higher c…
▽ More
The stellar-to-halo mass relation (SHMR) is a fundamental relationship between galaxies and their host dark matter haloes. In this study, we examine the scatter in this relation for primary galaxies in the semi-analytic L-Galaxies model and two cosmological hydrodynamical simulations, \eagle{} and \tng{}. We find that in low-mass haloes, more massive galaxies tend to reside in haloes with higher concentration, earlier formation time, greater environmental density, earlier major mergers, and, to have older stellar populations, which is consistent with findings in various studies. Quantitative analysis reveals the varying significance of halo and galaxy properties in determining SHMR scatter across simulations and models. In \eagle{} and \tng{}, halo concentration and formation time primarily influence SHMR scatter for haloes with $M_{\rm h}<10^{12}~\rm M_\odot$, but the influence diminishes at high mass. Baryonic processes play a more significant role in \lgal{}. For halos with $M_{\rm h} <10^{11}~\rm M_\odot$ and $10^{12}~\rm M_\odot<M_{\rm h}<10^{13}~\rm M_\odot$, the main drivers of scatter are galaxy SFR and age. In the $10^{11.5}~\rm M_\odot<M_{\rm h} <10^{12}~\rm M_\odot$ range, halo concentration and formation time are the primary factors. And for halos with $M_{\rm h} > 10^{13}~\rm M_\odot$, supermassive black hole mass becomes more important. Interestingly, it is found that AGN feedback may increase the amplitude of the scatter and decrease the dependence on halo properties at high masses.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
The Boolean polynomial polytope with multiple choice constraints
Authors:
Sihong Shao,
Yishan Wu
Abstract:
We consider a class of $0$-$1$ polynomial programming termed multiple choice polynomial programming (MCPP) where the constraint requires exact one component per subset of the partition to be $1$ after all the entries are partitioned. Compared to the unconstrained counterpart, there are few polyhedral studies of MCPP in general form. This paper serves as the first attempt to propose a polytope asso…
▽ More
We consider a class of $0$-$1$ polynomial programming termed multiple choice polynomial programming (MCPP) where the constraint requires exact one component per subset of the partition to be $1$ after all the entries are partitioned. Compared to the unconstrained counterpart, there are few polyhedral studies of MCPP in general form. This paper serves as the first attempt to propose a polytope associated with a hypergraph to study MCPP, which is the convex hull of $0$-$1$ vectors satisfying multiple choice constraints and production constraints. With the help of the decomposability property, we obtain an explicit half-space representation of the MCPP polytope when the underlying hypergraph is $α$-acyclic by induction on the number of hyperedges, which is an analogy of the acyclicity results on the multilinear polytope by Del Pia and Khajavirad (SIAM J Optim 28 (2018) 1049) when the hypergraph is $γ$-acyclic. We also present a necessary and sufficient condition for the inequalities lifted from the facet-inducing ones for the multilinear polytope to be still facet-inducing for the MCPP polytope. This result covers the particular cases by Bärmann, Martin and Schneider (SIAM J Optim 33 (2023) 2909).
△ Less
Submitted 19 June, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Non-invertible and higher-form symmetries in 2+1d lattice gauge theories
Authors:
Yichul Choi,
Yaman Sanghavi,
Shu-Heng Shao,
Yunqin Zheng
Abstract:
We explore exact generalized symmetries in the standard 2+1d lattice $\mathbb{Z}_2$ gauge theory coupled to the Ising model, and compare them with their continuum field theory counterparts. One model has a (non-anomalous) non-invertible symmetry, and we identify two distinct non-invertible symmetry protected topological phases. The non-invertible algebra involves a lattice condensation operator, w…
▽ More
We explore exact generalized symmetries in the standard 2+1d lattice $\mathbb{Z}_2$ gauge theory coupled to the Ising model, and compare them with their continuum field theory counterparts. One model has a (non-anomalous) non-invertible symmetry, and we identify two distinct non-invertible symmetry protected topological phases. The non-invertible algebra involves a lattice condensation operator, which creates a toric code ground state from a product state. Another model has a mixed anomaly between a 1-form symmetry and an ordinary symmetry. This anomaly enforces a nontrivial transition in the phase diagram, consistent with the "Higgs=SPT" proposal. Finally, we discuss how the symmetries and anomalies in these two models are related by gauging, which is a 2+1d version of the Kennedy-Tasaki transformation.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression
Authors:
Sisi Shao,
Junhyung Park,
Weng Kee Wong
Abstract:
General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that…
▽ More
General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
GPS-IDS: An Anomaly-based GPS Spoofing Attack Detection Framework for Autonomous Vehicles
Authors:
Murad Mehrab Abrar,
Amal Youssef,
Raian Islam,
Shalaka Satam,
Banafsheh Saber Latibari,
Salim Hariri,
Sicong Shao,
Soheil Salehi,
Pratik Satam
Abstract:
Autonomous Vehicles (AVs) heavily rely on sensors and communication networks like Global Positioning System (GPS) to navigate autonomously. Prior research has indicated that networks like GPS are vulnerable to cyber-attacks such as spoofing and jamming, thus posing serious risks like navigation errors and system failures. These threats are expected to intensify with the widespread deployment of AV…
▽ More
Autonomous Vehicles (AVs) heavily rely on sensors and communication networks like Global Positioning System (GPS) to navigate autonomously. Prior research has indicated that networks like GPS are vulnerable to cyber-attacks such as spoofing and jamming, thus posing serious risks like navigation errors and system failures. These threats are expected to intensify with the widespread deployment of AVs, making it crucial to detect and mitigate such attacks. This paper proposes GPS Intrusion Detection System, or GPS-IDS, an Anomaly-based intrusion detection framework to detect GPS spoofing attacks on AVs. The framework uses a novel physics-based vehicle behavior model where a GPS navigation model is integrated into the conventional dynamic bicycle model for accurate AV behavior representation. Temporal features derived from this behavior model are analyzed using machine learning to detect normal and abnormal navigation behaviors. The performance of the GPS-IDS framework is evaluated on the AV-GPS-Dataset -- a GPS security dataset for AVs comprising real-world data collected using an AV testbed, and simulated data representing urban traffic environments. To the best of our knowledge, this dataset is the first of its kind and has been publicly released for the global research community to address such security challenges.
△ Less
Submitted 17 December, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution
Authors:
Shuo Shao,
Yiming Li,
Hongwei Yao,
Yiling He,
Zhan Qin,
Kui Ren
Abstract:
Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge me…
▽ More
Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors ($i.e.$, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity.
In this paper, we argue that both limitations stem from the `zero-bit' nature of existing watermarking schemes, where they exploit the status ($i.e.$, misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit' watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks ($e.g.$, image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.
△ Less
Submitted 9 September, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets
Authors:
Xiaoyu Huang,
Yufeng Chi,
Ruofeng Wang,
Zhongyu Li,
Xue Bin Peng,
Sophia Shao,
Borivoje Nikolic,
Koushil Sreenath
Abstract:
This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob…
▽ More
This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Elucidating the Design Space of Dataset Condensation
Authors:
Shitong Shao,
Zikai Zhou,
Huanran Chen,
Zhiqiang Shen
Abstract:
Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computationa…
▽ More
Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computational costs which limit scalability to larger datasets (e.g., MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets (e.g., SRe2L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching and adjusting the learning rate schedule. These strategies are grounded in empirical evidence and theoretical backing. Our resulting approach, Elucidate Dataset Condensation (EDC), establishes a benchmark for both small and large-scale dataset condensation. In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance exceeds those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively.
△ Less
Submitted 17 January, 2025; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Microwave seeding time crystal in Floquet driven Rydberg atoms
Authors:
Bang Liu,
Li-Hua Zhang,
Yu Ma,
Tian-Yu Han,
Qi-Feng Wang,
Jun Zhang,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qing Li,
Han-Chao Chen,
Ya-Jun Wang,
Jia-Dou Nan,
Yi-Ming Yin,
Dong-Sheng Ding,
Bao-Sen Shi
Abstract:
Crystal seeding enables a deeper understanding of phase behavior, leading to the development of methods for controlling and manipulating phase transitions in various applications such as materials synthesis, crystallization processes, and phase transformation engineering. How to seed a crystalline in time domain is an open question, which is of great significant and may provide an avenue to unders…
▽ More
Crystal seeding enables a deeper understanding of phase behavior, leading to the development of methods for controlling and manipulating phase transitions in various applications such as materials synthesis, crystallization processes, and phase transformation engineering. How to seed a crystalline in time domain is an open question, which is of great significant and may provide an avenue to understand and control time-dependent quantum many-body physics. Here, we utilize a microwave pulse as a seed to induce the formation of a discrete time crystal in Floquet driven Rydberg atoms. In the experiment, the periodic driving on Rydberg states acts as a seeded crystalline order in subspace, which triggers the time-translation symmetry breaking across the entire ensemble. The behavior of the emergent time crystal is elaborately linked to alterations in the seed, such as the relative phase shift and the frequency difference, which result in phase dependent seeding and corresponding shift in periodicity of the time crystal, leading to embryonic synchronization. This result opens up new possibilities for studying and harnessing time-dependent quantum many-body phenomena, offering insights into the behavior of complex many-body systems under seeding.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Digging into contrastive learning for robust depth estimation with diffusion models
Authors:
Jiyuan Wang,
Chunyu Lin,
Lang Nie,
Kang Liao,
Shuwei Shao,
Yao Zhao
Abstract:
Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode t…
▽ More
Recently, diffusion-based depth estimation methods have drawn widespread attention due to their elegant denoising patterns and promising performance. However, they are typically unreliable under adverse conditions prevalent in real-world scenarios, such as rainy, snowy, etc. In this paper, we propose a novel robust depth estimation method called D4RD, featuring a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. Concretely, we integrate the strength of knowledge distillation into contrastive learning, building the `trinity' contrastive scheme. This scheme utilizes the sampled noise of the forward diffusion process as a natural reference, guiding the predicted noise in diverse scenes toward a more stable and precise optimum. Moreover, we extend noise-level trinity to encompass more generic feature and image levels, establishing a multi-level contrast to distribute the burden of robust perception across the overall network. Before addressing complex scenarios, we enhance the stability of the baseline diffusion model with three straightforward yet effective improvements, which facilitate convergence and remove depth outliers. Extensive experiments demonstrate that D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions. Source code and data are available at \url{https://github.com/wangjiyuan9/D4RD}.
△ Less
Submitted 22 September, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Ultra-Wide Dual-band Rydberg Atomic Receiver Based on Space Division Multiplexing RF-Chip Modules
Authors:
Li-Hua Zhang,
Bang Liu,
Zong-Kai Liu,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qi-Feng Wang,
Ma YuTian-Yu Han,
Guang-Can Guo,
Dong-Sheng Ding,
Bao-Sen Shi
Abstract:
Detecting microwave signals over a wide frequency range has numerous advantages as it enables simultaneous transmission of a large amount of information and access to more spectrum resources. This capability is crucial for applications such as microwave communication, remote sensing, and radar. However, conventional microwave receiving systems are limited by amplifiers and band-pass filters that c…
▽ More
Detecting microwave signals over a wide frequency range has numerous advantages as it enables simultaneous transmission of a large amount of information and access to more spectrum resources. This capability is crucial for applications such as microwave communication, remote sensing, and radar. However, conventional microwave receiving systems are limited by amplifiers and band-pass filters that can only operate efficiently in a specific frequency range. Typically, these systems can only process signals within a three-fold frequency range, which limits the data transfer bandwidth of the microwave communication systems. Developing novel atom-integrated microwave sensors, for example, radio frequency (RF)-chip coupled Rydberg atomic receiver, provides opportunities for a large working bandwidth of microwave sensing at the atomic level. Here, an ultra-wide dual-band RF sensing scheme is demonstrated by space-division multiplexing two RF-chip-integrated atomic receiver modules. The system can simultaneously receive dual-band microwave signals that span a frequency range exceeding 6 octaves (300 MHz and 24 GHz). This work paves the way for multi-band microwave reception applications within an ultra-wide range by RF-chip-integrated Rydberg atomic sensor.
△ Less
Submitted 16 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Early warning signals of the tipping point in strongly interacting Rydberg atoms
Authors:
Jun Zhang,
Li-Hua Zhang,
Bang Liu,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qing Li,
Han-Chao Chen,
Zong-Kai Liu,
Yu Ma,
Tian-Yu Han,
Qi-Feng Wang,
C. Stuart Adams,
Bao-Sen Shi,
Dong-Sheng Ding
Abstract:
The identification of tipping points is essential for prediction of collapses or other sudden changes in complex systems. Applications include studies of ecology, thermodynamics, climatology, and epidemiology. However, detecting early signs of proximity to a tipping is made challenging by complexity and non-linearity. Strongly interacting Rydberg atom gases offer model systems that offer both comp…
▽ More
The identification of tipping points is essential for prediction of collapses or other sudden changes in complex systems. Applications include studies of ecology, thermodynamics, climatology, and epidemiology. However, detecting early signs of proximity to a tipping is made challenging by complexity and non-linearity. Strongly interacting Rydberg atom gases offer model systems that offer both complexity and non-linearity, including phase transition and critical slowing down. Here, via an external probe we observe prior warning of the proximity of a phase transition of Rydberg thermal gases. This warning signal is manifested as a deviation from linear growth of the variance with increasing probe intensity. We also observed the dynamics of the critical slowing down behavior versus different time scales, and atomic densities, thus providing insights into the study of a Rydberg atom system's critical behavior. Our experiment suggests that the full critical slowing down dynamics of strongly-interacting Rydberg atoms can be probed systematically, thus providing a benchmark with which to identify critical phenomena in quantum many-body systems.
△ Less
Submitted 4 October, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Self-supervised Dataset Distillation: A Good Compression Is All You Need
Authors:
Muxin Zhou,
Zeyuan Yin,
Shitong Shao,
Zhiqiang Shen
Abstract:
Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this…
▽ More
Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this work, we consider addressing this task through the new lens of model informativeness in the compression stage on the original dataset pretraining. We observe that with the prior state-of-the-art SRe$^2$L, as model sizes increase, it becomes increasingly challenging for supervised pretrained models to recover learned information during data synthesis, as the channel-wise mean and variance inside the model are flatting and less informative. We further notice that larger variances in BN statistics from self-supervised models enable larger loss signals to update the recovered data by gradients, enjoying more informativeness during synthesis. Building on this observation, we introduce SC-DD, a simple yet effective Self-supervised Compression framework for Dataset Distillation that facilitates diverse information compression and recovery compared to traditional supervised learning schemes, further reaps the potential of large pretrained models with enhanced capabilities. Extensive experiments are conducted on CIFAR-100, Tiny-ImageNet and ImageNet-1K datasets to demonstrate the superiority of our proposed approach. The proposed SC-DD outperforms all previous state-of-the-art supervised dataset distillation methods when employing larger models, such as SRe$^2$L, MTT, TESLA, DC, CAFE, etc., by large margins under the same recovery and post-training budgets. Code is available at https://github.com/VILA-Lab/SRe2L/tree/main/SCDD/.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Adaptive Hyperbolic-cross-space Mapped Jacobi Method on Unbounded Domains with Applications to Solving Multidimensional Spatiotemporal Integrodifferential Equations
Authors:
Yunhong Deng,
Sihong Shao,
Alex Mogilner,
Mingtao Xia
Abstract:
In this paper, we develop a new adaptive hyperbolic-cross-space mapped Jacobi (AHMJ) method for solving multidimensional spatiotemporal integrodifferential equations in unbounded domains. By devising adaptive techniques for sparse mapped Jacobi spectral expansions defined in a hyperbolic cross space, our proposed AHMJ method can efficiently solve various spatiotemporal integrodifferential equation…
▽ More
In this paper, we develop a new adaptive hyperbolic-cross-space mapped Jacobi (AHMJ) method for solving multidimensional spatiotemporal integrodifferential equations in unbounded domains. By devising adaptive techniques for sparse mapped Jacobi spectral expansions defined in a hyperbolic cross space, our proposed AHMJ method can efficiently solve various spatiotemporal integrodifferential equations such as the anomalous diffusion model with reduced numbers of basis functions. Our analysis of the AHMJ method gives a uniform upper error bound for solving a class of spatiotemporal integrodifferential equations, leading to effective error control.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Floquet engineering Rydberg sub-THz frequency comb spectroscopy
Authors:
Li-Hua Zhang,
Zong-Kai Liu,
Bang Liu,
Qi-Feng Wang,
Yu Ma,
Tian-Yu Han,
Zheng-Yuan Zhang,
Han-Chao Chen,
Shi-Yao Shao,
Qing Lim,
Jun Zhang,
Dong-Sheng Ding,
Bao-Sen Shi
Abstract:
Engineering a Terahertz (THz) frequency comb spectroscopy at atomic level advances the precisely measurement in spectroscopy and sensing. Current progresses on THz frequency comb rely on difference-frequency generation, optical parametric oscillation, and other methods. Generating a THz frequency comb poses challenges in source stability and achieving a narrow bandwidth, which traditional THz devi…
▽ More
Engineering a Terahertz (THz) frequency comb spectroscopy at atomic level advances the precisely measurement in spectroscopy and sensing. Current progresses on THz frequency comb rely on difference-frequency generation, optical parametric oscillation, and other methods. Generating a THz frequency comb poses challenges in source stability and achieving a narrow bandwidth, which traditional THz devices are difficult to achieve. Furthermore, accurately measuring the generated THz frequency comb necessitates a high-performance THz detector. Rydberg atoms are well-suited for electric field sensing due to their ultra-wide radio frequency transition energy levels, making them especially sensitive to external electric fields in the DC to THz bandwidth. However, there have been no reports about generating THz frequency comb spectroscopy at the atomic level until now. This work presents a THz frequency comb spectroscopy with Rydberg atoms, in which a Floquet comb-like transition is engineered through a time-periodic drive field. Our approach simplifies the setup required for THz frequency comb spectroscopy while extending the working bandwidth for Rydberg atomic sensors. The THz frequency comb spectroscopy at the atomic level reported in this article shows great potential for various applications in astronomy, remote sensing, spectral detection of biological samples, and other related fields.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Cavity-enhanced Rydberg atom microwave receiver
Authors:
Bang Liu,
Li-Hua Zhang,
Zong-Kai Liu,
Qi-Feng Wang,
Yu Ma,
Tian-Yu Han,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Jun Zhang,
Qing Li,
Han-Chao Chen,
Dong-Sheng Ding,
Bao-Sen Shi
Abstract:
Developing microwave electric field sensing based on Rydberg atom has received significant attention due to its unique advantages. However, achieving effective coupling between Rydberg atom and the microwave electric field in the sensing process is a challenging problem that greatly impacts the sensitivity. To address this, we propose the use of a microwave resonant cavity to enhance the effective…
▽ More
Developing microwave electric field sensing based on Rydberg atom has received significant attention due to its unique advantages. However, achieving effective coupling between Rydberg atom and the microwave electric field in the sensing process is a challenging problem that greatly impacts the sensitivity. To address this, we propose the use of a microwave resonant cavity to enhance the effective coupling between the Rydberg atoms and the microwave electric field. In our experiment, we use a three-photon excitation scheme to prepare Rydberg atoms, make measurements of electric fields without and with a microwave cavity in which the vapor cell is put inside. Through experimental testing, we achieve an 18 dB enhancement of power sensitivity. The experiment shows an effective enhancement in electric field pulse signal detection. This result provides a promising direction for enhancing the sensitivity of Rydberg atomic electric field sensors and paves the way for their application in precision electric field measurements.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
AuditGPT: Auditing Smart Contracts with ChatGPT
Authors:
Shihao Xia,
Shuai Shao,
Mengting He,
Tingting Yu,
Linhai Song,
Yiying Zhang
Abstract:
To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either man…
▽ More
To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either manually audit each single contract or use expert-developed, limited-scope program-analysis tools, both of which are far from being effective in identifying ERC rule violations. This paper presents a tool named AuditGPT that leverages large language models (LLMs) to automatically and comprehensively verify ERC rules against smart contracts. To build AuditGPT, we first conduct an empirical study on 222 ERC rules specified in four popular ERCs to understand their content, their security impacts, their specification in natural language, and their implementation in Solidity. Guided by the study, we construct AuditGPT by separating the large, complex auditing process into small, manageable tasks and design prompts specialized for each ERC rule type to enhance LLMs' auditing performance. In the evaluation, AuditGPT successfully pinpoints 418 ERC rule violations and only reports 18 false positives, showcasing its effectiveness and accuracy. Moreover, AuditGPT beats an auditing service provided by security experts in effectiveness, accuracy, and cost, demonstrating its advancement over state-of-the-art smart-contract auditing practices.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Utilizing Computer Vision for Continuous Monitoring of Vaccine Side Effects in Experimental Mice
Authors:
Chuang Li,
Shuai Shao,
Willian Mikason,
Rubing Lin,
Yantong Liu
Abstract:
The demand for improved efficiency and accuracy in vaccine safety assessments is increasing. Here, we explore the application of computer vision technologies to automate the monitoring of experimental mice for potential side effects after vaccine administration. Traditional observation methods are labor-intensive and lack the capability for continuous monitoring. By deploying a computer vision sys…
▽ More
The demand for improved efficiency and accuracy in vaccine safety assessments is increasing. Here, we explore the application of computer vision technologies to automate the monitoring of experimental mice for potential side effects after vaccine administration. Traditional observation methods are labor-intensive and lack the capability for continuous monitoring. By deploying a computer vision system, our research aims to improve the efficiency and accuracy of vaccine safety assessments. The methodology involves training machine learning models on annotated video data of mice behaviors pre- and post-vaccination. Preliminary results indicate that computer vision effectively identify subtle changes, signaling possible side effects. Therefore, our approach has the potential to significantly enhance the monitoring process in vaccine trials in animals, providing a practical solution to the limitations of human observation.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Cluster state as a non-invertible symmetry protected topological phase
Authors:
Sahand Seifnashri,
Shu-Heng Shao
Abstract:
We show that the standard 1+1d $\mathbb{Z}_2\times \mathbb{Z}_2$ cluster model has a non-invertible global symmetry, described by the fusion category Rep(D$_8$). Therefore, the cluster state is not only a $\mathbb{Z}_2\times \mathbb{Z}_2$ symmetry protected topological (SPT) phase, but also a non-invertible SPT phase. We further find two new commuting Pauli Hamiltonians for the other two Rep(D…
▽ More
We show that the standard 1+1d $\mathbb{Z}_2\times \mathbb{Z}_2$ cluster model has a non-invertible global symmetry, described by the fusion category Rep(D$_8$). Therefore, the cluster state is not only a $\mathbb{Z}_2\times \mathbb{Z}_2$ symmetry protected topological (SPT) phase, but also a non-invertible SPT phase. We further find two new commuting Pauli Hamiltonians for the other two Rep(D$_8$) SPT phases on a tensor product Hilbert space of qubits, matching the classification in field theory and mathematics. We identify the edge modes and the local projective algebras at the interfaces between these non-invertible SPT phases. Finally, we show that there does not exist a symmetric entangler that maps between these distinct SPT states.
△ Less
Submitted 27 May, 2025; v1 submitted 1 April, 2024;
originally announced April 2024.
-
$\mathrm{F^2Depth}$: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis
Authors:
Xiaotong Guo,
Huijie Zhao,
Shuwei Shao,
Xudong Li,
Baochang Zhang
Abstract:
Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we…
▽ More
Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called $\mathrm{F^2Depth}$. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our well-designed patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our $\mathrm{F^2Depth}$, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve $δ_1$ accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning
Authors:
Liangkun Yu,
Xiang Sun,
Rana Albelaihi,
Chaeeun Park,
Sihua Shao
Abstract:
Federated Learning (FL) revolutionizes collaborative machine learning among Internet of Things (IoT) devices by enabling them to train models collectively while preserving data privacy. FL algorithms fall into two primary categories: synchronous and asynchronous. While synchronous FL efficiently handles straggler devices, it can compromise convergence speed and model accuracy. In contrast, asynchr…
▽ More
Federated Learning (FL) revolutionizes collaborative machine learning among Internet of Things (IoT) devices by enabling them to train models collectively while preserving data privacy. FL algorithms fall into two primary categories: synchronous and asynchronous. While synchronous FL efficiently handles straggler devices, it can compromise convergence speed and model accuracy. In contrast, asynchronous FL allows all devices to participate but incurs high communication overhead and potential model staleness. To overcome these limitations, the semi-synchronous FL framework introduces client tiering based on computing and communication latencies. Clients in different tiers upload their local models at distinct frequencies, striking a balance between straggler mitigation and communication costs. Enter the DecantFed algorithm (Dynamic client clustering, bandwidth allocation, and local training for semi-synchronous Federated learning), a dynamic solution that optimizes client clustering, bandwidth allocation, and local training workloads to maximize data sample processing rates. Additionally, DecantFed adapts client learning rates according to their tiers, addressing the model staleness problem. The algorithm's performance shines in extensive simulations using benchmark datasets, including MNIST and CIFAR-10, under independent and identically distributed (IID) and non-IID scenarios. DecantFed outpaces FedAvg and FedProx in terms of convergence speed and delivers a remarkable minimum 28% boost in model accuracy compared to FedProx.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
WindGP: Efficient Graph Partitioning on Heterogenous Machines
Authors:
Li Zeng,
Haohan Huang,
Binfan Zheng,
Kang Yang,
Shengcheng Shao,
Jinhua Zhou,
Jun Xie,
Rongqian Zhao,
Xin Chen
Abstract:
Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider repli…
▽ More
Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider replication factor and neglect the difference of machines in realistic data centers. In this paper, we propose a general graph partitioning algorithm WindGP, which can support fast and high-quality edge partitioning on heterogeneous machines. WindGP designs novel preprocessing techniques to simplify the metric and balance the computation cost according to the characteristics of graphs and machines. Also, best-first search is proposed instead of BFS and DFS, in order to generate clusters with high cohesion. Furthermore, WindGP adaptively tunes the partition results by sophisticated local search methods. Extensive experiments show that WindGP outperforms all state-of-the-art partition methods by 1.35 - 27 times on both dense and sparse distributed graph algorithms, and has good scalability with graph size and machine number.
△ Less
Submitted 6 March, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Higher-order and fractional discrete time crystals in Floquet-driven Rydberg atoms
Authors:
Bang Liu,
Li-Hua Zhang,
Qi-Feng Wang,
Yu Ma,
Tian-Yu Han,
Jun Zhang,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qing Li,
Han-Chao Chen,
Bao-Sen Shi,
Dong-Sheng Ding
Abstract:
Higher-order and fractional discrete time crystals (DTCs) are exotic phases of matter where the discrete time translation symmetry is broken into higher-order and non-integer category. Generation of these unique DTCs has been widely studied theoretically in different systems. However, no current experimental methods can probe these higher-order and fractional DTCs in any quantum many-body systems.…
▽ More
Higher-order and fractional discrete time crystals (DTCs) are exotic phases of matter where the discrete time translation symmetry is broken into higher-order and non-integer category. Generation of these unique DTCs has been widely studied theoretically in different systems. However, no current experimental methods can probe these higher-order and fractional DTCs in any quantum many-body systems. We demonstrate an experimental approach to observe higher-order and fractional DTCs in Floquet-driven Rydberg atomic gases. We have discovered multiple $n$-DTCs with integer values of $n$ = 2, 3, and 4, and others ranging up to 14, along with fractional $n$-DTCs with $n$ values beyond the integers. The system response can transition between adjacent integer DTCs, during which the fractional DTCs are investigated. Study of higher-order and fractional DTCs expands fundamental knowledge of non-equilibrium dynamics and is promising for discovery of more complex temporal symmetries beyond the single discrete time translation symmetry.
△ Less
Submitted 19 October, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Bifurcation of time crystals in driven and dissipative Rydberg atomic gas
Authors:
Bang Liu,
Li-Hua Zhang,
Zong-Kai Liu,
Jun Zhang,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qing Li,
Han-Chao Chen,
Yu Ma,
Tian-Yu Han,
Qi-Feng Wang,
Dong-Sheng Ding,
Bao-Sen Shi
Abstract:
A time crystal is an exotic phase of matter where time-translational symmetry is broken; this phase differs from the spatial symmetry breaking induced in crystals in space. Lots of experiments report the transition from a thermal equilibrium phase to time crystal phase. However, there is no experimental method to probe the bifurcation effect of distinct time crystals in quantum many-body systems.…
▽ More
A time crystal is an exotic phase of matter where time-translational symmetry is broken; this phase differs from the spatial symmetry breaking induced in crystals in space. Lots of experiments report the transition from a thermal equilibrium phase to time crystal phase. However, there is no experimental method to probe the bifurcation effect of distinct time crystals in quantum many-body systems. Here, in a driven and dissipative many-body Rydberg atom system, we observe multiple continuous dissipative time crystals and emergence of more complex temporal symmetries beyond the single time crystal phase. Bifurcation of time crystals in strongly interacting Rydberg atoms is observed; the process manifests as a transition from a time crystal state of long temporal order to one of short temporal order, or vice versa. By manipulating the driving field parameters, we observe the time crystal's bistability and a hysteresis loop. These investigations indicate new possibilities for control and manipulation of the temporal symmetries of non-equilibrium systems.
△ Less
Submitted 27 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Beyond Voice Assistants: Exploring Advantages and Risks of an In-Car Social Robot in Real Driving Scenarios
Authors:
Yuanchao Li,
Lachlan Urquhart,
Nihan Karatas,
Shun Shao,
Hiroshi Ishiguro,
Xun Shen
Abstract:
In-car Voice Assistants (VAs) play an increasingly critical role in automotive user interface design. However, existing VAs primarily perform simple 'query-answer' tasks, limiting their ability to sustain drivers' long-term attention. In this study, we investigate the effectiveness of an in-car Robot Assistant (RA) that offers functionalities beyond voice interaction. We aim to answer the question…
▽ More
In-car Voice Assistants (VAs) play an increasingly critical role in automotive user interface design. However, existing VAs primarily perform simple 'query-answer' tasks, limiting their ability to sustain drivers' long-term attention. In this study, we investigate the effectiveness of an in-car Robot Assistant (RA) that offers functionalities beyond voice interaction. We aim to answer the question: How does the presence of a social robot impact user experience in real driving scenarios? Our study begins with a user survey to understand perspectives on in-car VAs and their influence on driving experiences. We then conduct non-driving and on-road experiments with selected participants to assess user experiences with an RA. Additionally, we conduct subjective ratings to evaluate user perceptions of the RA's personality, which is crucial for robot design. We also explore potential concerns regarding ethical risks. Finally, we provide a comprehensive discussion and recommendations for the future development of in-car RAs.
△ Less
Submitted 20 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Joint Data and Semantics Lossy Compression: Nonasymptotic Converse Bounds and Second-Order Asymptotics
Authors:
Huiyuan Yang,
Yuxuan Shi,
Shuo Shao,
Xiaojun Yuan
Abstract:
This paper studies the joint data and semantics lossy compression problem, i.e., an extension of the hidden lossy source coding problem that entails recovering both the hidden and observable sources. We aim to study the nonasymptotic and second-order properties of this problem, especially the converse aspect. Specifically, we begin by deriving general nonasymptotic converse bounds valid for genera…
▽ More
This paper studies the joint data and semantics lossy compression problem, i.e., an extension of the hidden lossy source coding problem that entails recovering both the hidden and observable sources. We aim to study the nonasymptotic and second-order properties of this problem, especially the converse aspect. Specifically, we begin by deriving general nonasymptotic converse bounds valid for general sources and distortion measures, utilizing properties of distortion-tilted information. Subsequently, a second-order converse bound is derived under the standard block coding setting through asymptotic analysis of the nonasymptotic bounds. This bound is tight since it coincides with a known second-order achievability bound. We then examine the case of erased fair coin flips (EFCF), providing its specific nonasymptotic achievability and converse bounds. Numerical results under the EFCF case demonstrate that our second-order asymptotic approximation effectively approximates the optimum rate at given blocklengths.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Untangle charge-order dependent bulk states from surface effects in a topological kagome metal ScV$_6$Sn$_6$
Authors:
Zi-Jia Cheng,
Sen Shao,
Byunghoon Kim,
Tyler A. Cochran,
Xian P. Yang,
Changjiang Yi,
Yu-Xiao Jiang,
Junyi Zhang,
Md Shafayat Hossain,
Subhajit Roychowdhury,
Turgut Yilmaz,
Elio Vescovo,
Alexei Fedorov,
Shekhar Chandra,
Claudia Felser,
Guoqing Chang,
M. Zahid Hasan
Abstract:
Kagome metals with charge density wave (CDW) order exhibit a broad spectrum of intriguing quantum phenomena. The recent discovery of the novel kagome CDW compound ScV$_6$Sn$_6$ has spurred significant interest. However, understanding the interplay between CDW and the bulk electronic structure has been obscured by a profusion of surface states and terminations in this quantum material. Here, we emp…
▽ More
Kagome metals with charge density wave (CDW) order exhibit a broad spectrum of intriguing quantum phenomena. The recent discovery of the novel kagome CDW compound ScV$_6$Sn$_6$ has spurred significant interest. However, understanding the interplay between CDW and the bulk electronic structure has been obscured by a profusion of surface states and terminations in this quantum material. Here, we employ photoemission spectroscopy and potassium dosing to elucidate the complete bulk band structure of ScV$_6$Sn$_6$, revealing multiple van Hove singularities near the Fermi level. We surprisingly discover a robust spin-polarized topological Dirac surface resonance state at the M point within the two-fold van Hove singularities. Assisted by the first-principle calculations, the temperature dependence of the $k_z$- resolved ARPES spectrum provides unequivocal evidence for the proposed $\sqrt{3}$$\times$$\sqrt{3}$$\times3$ charge order over other candidates. Our work not only enhances the understanding of the CDW-dependent bulk and surface states in ScV$_6$Sn$_6$ but also establishes an essential foundation for potential manipulation of the CDW order in kagome materials.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Your Diffusion Model is Secretly a Certifiably Robust Classifier
Authors:
Huanran Chen,
Yinpeng Dong,
Shitong Shao,
Zhongkai Hao,
Xiao Yang,
Hang Su,
Jun Zhu
Abstract:
Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robust…
▽ More
Generative learning, recognized for its effective modeling of data distributions, offers inherent advantages in handling out-of-distribution instances, especially for enhancing robustness to adversarial attacks. Among these, diffusion classifiers, utilizing powerful diffusion models, have demonstrated superior empirical robustness. However, a comprehensive theoretical understanding of their robustness is still lacking, raising concerns about their vulnerability to stronger future attacks. In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. To achieve non-constant Lipschitzness, thereby obtaining much tighter certified robustness, we generalize diffusion classifiers to classify Gaussian-corrupted data. This involves deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes' theorem. Experimental results show the superior certified robustness of these Noised Diffusion Classifiers (NDCs). Notably, we achieve over 80% and 70% certified robustness on CIFAR-10 under adversarial perturbations with \(\ell_2\) norms less than 0.25 and 0.5, respectively, using a single off-the-shelf diffusion model without any additional data.
△ Less
Submitted 22 February, 2025; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Precise Knowledge Transfer via Flow Matching
Authors:
Shitong Shao,
Zhiqiang Shen,
Linrui Gong,
Huanran Chen,
Xu Dai
Abstract:
In this paper, we propose a novel knowledge transfer framework that introduces continuous normalizing flows for progressive knowledge transformation and leverages multi-step sampling strategies to achieve precision knowledge transfer. We name this framework Knowledge Transfer with Flow Matching (FM-KT), which can be integrated with a metric-based distillation method with any form (\textit{e.g.} va…
▽ More
In this paper, we propose a novel knowledge transfer framework that introduces continuous normalizing flows for progressive knowledge transformation and leverages multi-step sampling strategies to achieve precision knowledge transfer. We name this framework Knowledge Transfer with Flow Matching (FM-KT), which can be integrated with a metric-based distillation method with any form (\textit{e.g.} vanilla KD, DKD, PKD and DIST) and a meta-encoder with any available architecture (\textit{e.g.} CNN, MLP and Transformer). By introducing stochastic interpolants, FM-KD is readily amenable to arbitrary noise schedules (\textit{e.g.}, VP-ODE, VE-ODE, Rectified flow) for normalized flow path estimation. We theoretically demonstrate that the training objective of FM-KT is equivalent to minimizing the upper bound of the teacher feature map or logit negative log-likelihood. Besides, FM-KT can be viewed as a unique implicit ensemble method that leads to performance gains. By slightly modifying the FM-KT framework, FM-KT can also be transformed into an online distillation framework OFM-KT with desirable performance gains. Through extensive experiments on CIFAR-100, ImageNet-1k, and MS-COCO datasets, we empirically validate the scalability and state-of-the-art performance of our proposed methods among relevant comparison approaches.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Authors:
Coleman Hooper,
Sehoon Kim,
Hiva Mohammadzadeh,
Michael W. Mahoney,
Yakun Sophia Shao,
Kurt Keutzer,
Amir Gholami
Abstract:
LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuan…
▽ More
LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; and (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges. By applying our method to the LLaMA, Llama-2, Llama-3, and Mistral models, we achieve < 0.1 perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches. Our method enables serving LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system. We develop custom CUDA kernels for KVQuant, showing that we can achieve up to ~1.7x speedups, compared to baseline fp16 matrix-vector multiplications, for the LLaMA-7B model.
△ Less
Submitted 28 May, 2025; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Indirect Lossy Source Coding with Observed Source Reconstruction: Nonasymptotic Bounds and Second-Order Asymptotics
Authors:
Huiyuan Yang,
Yuxuan Shi,
Shuo Shao,
Xiaojun Yuan
Abstract:
This paper considers the joint compression of a pair of correlated sources, where the encoder is allowed to access only one of the sources. The objective is to recover both sources under separate distortion constraints for each source while minimizing the rate. This problem generalizes the indirect lossy source coding problem by also requiring the recovery of the observed source. In this paper, we…
▽ More
This paper considers the joint compression of a pair of correlated sources, where the encoder is allowed to access only one of the sources. The objective is to recover both sources under separate distortion constraints for each source while minimizing the rate. This problem generalizes the indirect lossy source coding problem by also requiring the recovery of the observed source. In this paper, we aim to study the nonasymptotic and second-order asymptotic properties of this problem. Specifically, we begin by deriving nonasymptotic achievability and converse bounds valid for general sources and distortion measures. The source dispersion (Gaussian approximation) is then determined through asymptotic analysis of the nonasymptotic bounds. We further examine the case of erased fair coin flips (EFCF) and provide its specific nonasymptotic achievability and converse bounds. Numerical results under the EFCF case demonstrate that our second-order asymptotic approximation closely approximates the optimum rate at appropriately large blocklengths.
△ Less
Submitted 5 November, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
A Superposition Code-Based Semantic Communication Approach with Quantifiable and Controllable Security
Authors:
Weixuan Chen,
Shuo Shao,
Qianqian Yang,
Zhaoyang Zhang,
Ping Zhang
Abstract:
This paper addresses the challenge of achieving security in semantic communication (SemCom) over a wiretap channel, where a legitimate receiver coexists with an eavesdropper experiencing a poorer channel condition. Despite previous efforts to secure SemCom against eavesdroppers, guarantee of approximately zero information leakage remains an open issue. In this work, we propose a secure SemCom appr…
▽ More
This paper addresses the challenge of achieving security in semantic communication (SemCom) over a wiretap channel, where a legitimate receiver coexists with an eavesdropper experiencing a poorer channel condition. Despite previous efforts to secure SemCom against eavesdroppers, guarantee of approximately zero information leakage remains an open issue. In this work, we propose a secure SemCom approach based on superposition codes, aiming to provide quantifiable and controllable security for digital SemCom systems. The proposed method employs a double-layered constellation map, where semantic information is associated with satellite constellation points and cloud center constellation points are randomly selected. By carefully allocating power between these two layers of constellation, we ensure that the symbol error probability (SEP) of the eavesdropper decoding satellite constellation points is nearly equivalent to random guessing, while maintaining a low SEP for the legitimate receiver to successfully decode the semantic information. Simulation results demonstrate that the peak signal-to-noise ratio (PSNR) and mean squared error (MSE) of the eavesdropper' s reconstructed data, under the proposed method, can range from decoding Gaussian-distributed random noise to approaching the variance of the data. This validates the effectiveness of our method in nearly achieving the experimental upper bound of security for digital SemCom systems when both eavesdroppers and legitimate users utilize identical decoding schemes. Furthermore, the proposed method consistently outperforms benchmark techniques, showcasing superior data security and robustness against eavesdropping.
△ Less
Submitted 29 May, 2025; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Non-invertible symmetries and LSM-type constraints on a tensor product Hilbert space
Authors:
Nathan Seiberg,
Sahand Seifnashri,
Shu-Heng Shao
Abstract:
We discuss the exact non-invertible Kramers-Wannier symmetry of 1+1d lattice models on a tensor product Hilbert space of qubits. This symmetry is associated with a topological defect and a conserved operator, and the latter can be presented as a matrix product operator. Importantly, unlike its continuum counterpart, the symmetry algebra involves lattice translations. Consequently, it is not descri…
▽ More
We discuss the exact non-invertible Kramers-Wannier symmetry of 1+1d lattice models on a tensor product Hilbert space of qubits. This symmetry is associated with a topological defect and a conserved operator, and the latter can be presented as a matrix product operator. Importantly, unlike its continuum counterpart, the symmetry algebra involves lattice translations. Consequently, it is not described by a fusion category. In the presence of this defect, the symmetry algebra involving parity/time-reversal is realized projectively, which is reminiscent of an anomaly. Different Hamiltonians with the same lattice non-invertible symmetry can flow in their continuum limits to infinitely many different fusion categories (with different Frobenius-Schur indicators), including, as a special case, the Ising CFT. The non-invertible symmetry leads to a constraint similar to that of Lieb-Schultz-Mattis, implying that the system cannot have a unique gapped ground state. It is either in a gapless phase or in a gapped phase with three (or a multiple of three) ground states, associated with the spontaneous breaking of the lattice non-invertible symmetry.
△ Less
Submitted 17 May, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Rethinking Centered Kernel Alignment in Knowledge Distillation
Authors:
Zikai Zhou,
Yunhang Shen,
Shitong Shao,
Linrui Gong,
Shaohui Lin
Abstract:
Knowledge distillation has emerged as a highly effective method for bridging the representation discrepancy between large-scale models and lightweight models. Prevalent approaches involve leveraging appropriate metrics to minimize the divergence or distance between the knowledge extracted from the teacher model and the knowledge learned by the student model. Centered Kernel Alignment (CKA) is wide…
▽ More
Knowledge distillation has emerged as a highly effective method for bridging the representation discrepancy between large-scale models and lightweight models. Prevalent approaches involve leveraging appropriate metrics to minimize the divergence or distance between the knowledge extracted from the teacher model and the knowledge learned by the student model. Centered Kernel Alignment (CKA) is widely used to measure representation similarity and has been applied in several knowledge distillation methods. However, these methods are complex and fail to uncover the essence of CKA, thus not answering the question of how to use CKA to achieve simple and effective distillation properly. This paper first provides a theoretical perspective to illustrate the effectiveness of CKA, which decouples CKA to the upper bound of Maximum Mean Discrepancy~(MMD) and a constant term. Drawing from this, we propose a novel Relation-Centered Kernel Alignment~(RCKA) framework, which practically establishes a connection between CKA and MMD. Furthermore, we dynamically customize the application of CKA based on the characteristics of each task, with less computational source yet comparable performance than the previous methods. The extensive experiments on the CIFAR-100, ImageNet-1k, and MS-COCO demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs for image classification and object detection, validating the effectiveness of our approaches. Our code is available in https://github.com/Klayand/PCKA
△ Less
Submitted 30 April, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
A younger Universe implied by satellite pair correlations from SDSS observations of massive galaxy groups
Authors:
Qing Gu,
Qi Guo,
Marius Cautun,
Shi Shao,
Wenxiang Pei,
Wenting Wang,
Liang Gao,
Jie Wang
Abstract:
Many of the satellites of galactic-mass systems such as the Miky Way, Andromeda and Centaurus A show evidence of coherent motions to a larger extent than most of the systems predicted by the standard cosmological model. It is an open question if correlations in satellite orbits are present in systems of different masses. Here , we report an analysis of the kinematics of satellite galaxies around m…
▽ More
Many of the satellites of galactic-mass systems such as the Miky Way, Andromeda and Centaurus A show evidence of coherent motions to a larger extent than most of the systems predicted by the standard cosmological model. It is an open question if correlations in satellite orbits are present in systems of different masses. Here , we report an analysis of the kinematics of satellite galaxies around massive galaxy groups. Unlike what is seen in Milky Way analogues, we find an excess of diametrically opposed pairs of satellites that have line-of-sight velocity offsets from the central galaxy of the same sign. This corresponds to a $\pmb{6.0σ}$ ($\pmb{p}$-value $\pmb{=\ 9.9\times10^{-10}}$) detection of non-random satellite motions. Such excess is predicted by up-to-date cosmological simulations but the magnitude of the effect is considerably lower than in observations. The observational data is discrepant at the $\pmb{4.1σ}$ and $\pmb{3.6σ}$ level with the expectations of the Millennium and the Illustris TNG300 cosmological simulations, potentially indicating that massive galaxy groups assembled later in the real Universe. The detection of velocity correlations of satellite galaxies and tension with theoretical predictions is robust against changes in sample selection. Using the largest sample to date, our findings demonstrate that the motions of satellite galaxies represent a challenge to the current cosmological model.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
From Zero-Freeness to Strong Spatial Mixing via a Christoffel-Darboux Type Identity
Authors:
Shuai Shao,
Xiaowei Ye
Abstract:
We present a unifying proof to derive the strong spatial mixing (SSM) property for the general 2-spin system from zero-free regions of its partition function. Our proof works for the multivariate partition function over all three complex parameters $(β, γ, λ)$, and we allow the zero-free regions of $β, γ$ or $λ$ to be of arbitrary shapes. Our main technical contribution is to establish a Christoff…
▽ More
We present a unifying proof to derive the strong spatial mixing (SSM) property for the general 2-spin system from zero-free regions of its partition function. Our proof works for the multivariate partition function over all three complex parameters $(β, γ, λ)$, and we allow the zero-free regions of $β, γ$ or $λ$ to be of arbitrary shapes. Our main technical contribution is to establish a Christoffel-Darboux type identity for the 2-spin system on trees so that we are able to handle zero-free regions of the three different parameters $β, γ$ or $λ$ in a unified way. We use Riemann mapping theorem to deal with zere-free regions of arbitrary shapes.
Our result comprehensively turns all existing zero-free regions (to our best knowledge) of the partition function of the 2-spin system where pinned vertices are allowed into the SSM property. As a consequence, we obtain novel SSM properties for the 2-spin system beyond the direct argument for SSM based on tree recurrence. Moreover, we extend our result to handle the 2-spin system with non-uniform external fields. As an application, we obtain a new SSM property and two new forms of spatial mixing property, namely plus and minus spatial mixing for the non-uniform ferromagnetic Ising model from the celebrated Lee-Yang circle theorem.
△ Less
Submitted 8 February, 2025; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Dynamic Capital Requirements for Markov Decision Processes
Authors:
William B. Haskell,
Abhishek Gupta,
Shiping Shao
Abstract:
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences…
▽ More
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences in terms of a single set that we call the risk frontier which characterizes acceptable triples. We then propose the generalized capital requirement (GCR) which evaluates the risk of a payoff stream by minimizing the premium schedule over acceptable triples. We apply this model to a risk-aware decision maker (DM) who controls a Markov decision process (MDP) and wants to find a policy to minimize the GCR of its payoff stream. The resulting GCR-MDP recovers many well-known risk-aware MDPs as special cases. To make this approach computationally viable, we obtain the temporal decomposition of the GCR in terms of the risk frontier. Then, we connect the temporal decomposition with the notion of an information state to compactly capture the dependence of DM's risk preferences on the problem history, where augmented dynamic programming can be used to compute an optimal policy. We report numerical experiments for the GCR-minimizing newsvendor.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Deep Learning Based Superposition Coded Modulation for Hierarchical Semantic Communications over Broadcast Channels
Authors:
Yufei Bo,
Shuo Shao,
Meixia tao
Abstract:
We consider multi-user semantic communications over broadcast channels. While most existing works consider that each receiver requires either the same or independent semantic information, this paper explores the scenario where the semantic information desired by different receivers is different but correlated. In particular, we investigate semantic communications over Gaussian broadcast channels w…
▽ More
We consider multi-user semantic communications over broadcast channels. While most existing works consider that each receiver requires either the same or independent semantic information, this paper explores the scenario where the semantic information desired by different receivers is different but correlated. In particular, we investigate semantic communications over Gaussian broadcast channels where the transmitter has a common observable source but the receivers wish to recover hierarchical semantic information in adaptation to their channel conditions. Inspired by the capacity achieving property of superposition codes, we propose a deep learning based superposition coded modulation (DeepSCM) scheme. Specifically, the hierarchical semantic information is first extracted and encoded into basic and enhanced feature vectors. A linear minimum mean square error (LMMSE) decorrelator is then developed to obtain a refinement from the enhanced features that is uncorrelated with the basic features. Finally, the basic features and their refinement are superposed for broadcasting after probabilistic modulation. Experiments are conducted for two-receiver image semantic broadcasting with coarse and fine classification as hierarchical semantic tasks. DeepSCM outperforms the benchmarking coded-modulation scheme without a superposition structure, especially with large channel disparity and high order modulation. It also approaches the performance upperbound as if there were only one receiver.
△ Less
Submitted 12 June, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
DCFL: Non-IID awareness Data Condensation aided Federated Learning
Authors:
Shaohan Sha,
YaFeng Sun
Abstract:
Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets. The challenge lies in the fact that the client side private data may not be identically and independently distributed, significantly impacting the accuracy of the global model. Existing methods commonly address the…
▽ More
Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets. The challenge lies in the fact that the client side private data may not be identically and independently distributed, significantly impacting the accuracy of the global model. Existing methods commonly address the Non-IID challenge by focusing on optimization, client selection and data complement. However, most approaches tend to overlook the perspective of the private data itself due to privacy constraints.Intuitively, statistical distinctions among private data on the client side can help mitigate the Non-IID degree. Besides, the recent advancements in dataset condensation technology have inspired us to investigate its potential applicability in addressing Non-IID issues while maintaining privacy. Motivated by this, we propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients. The private data from clients within the same group is complementary and their condensed data is accessible to all clients in the group. Additionally, CKA-guided client selection strategy, filtering mechanisms, and data enhancement techniques are incorporated to efficiently and precisely utilize the condensed data, enhance model performance, and minimize communication time. Experimental results demonstrate that DCFL achieves competitive performance on popular federated learning benchmarks including MNIST, FashionMNIST, SVHN, and CIFAR-10 with existing FL protocol.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.