-
On whether quantum theory needs complex numbers: the foil theories perspective
Authors:
Yìlè Yīng,
Maria Ciudad Alañón,
Daniel Centeno,
Jacopo Surace,
Marina Maciel Ansanelli,
Ruizhi Liu,
David Schmid,
Robert W. Spekkens
Abstract:
Recent work by Renou et al. (2021) has led to some controversy concerning the question of whether quantum theory requires complex numbers for its formulation. We promote the view that the main result of that work is best understood not as a claim about the relative merits of different representations of quantum theory, but rather as a claim about the possibility of experimentally adjudicating betw…
▽ More
Recent work by Renou et al. (2021) has led to some controversy concerning the question of whether quantum theory requires complex numbers for its formulation. We promote the view that the main result of that work is best understood not as a claim about the relative merits of different representations of quantum theory, but rather as a claim about the possibility of experimentally adjudicating between standard quantum theory and an alternative theory -- a foil theory -- known as real-amplitude quantum theory (RQT). In particular, the claim is that this adjudication can be achieved given only an assumption about the causal structure of the experiment. Here, we aim to shed some light on why this is possible, by reconceptualizing the comparison of the two theories as an instance of a broader class of such theory comparisons. By recasting RQT as the subtheory of quantum theory that arises by symmetrizing with respect to the collective action of a time-reversal symmetry, we can compare it to other subtheories that arise by symmetrization, but for different symmetries. If the symmetry has a unitary representation, the resulting foil theory is termed a twirled quantum world, and if it does not (as is the case in RQT), the resulting foil theory is termed a swirled quantum world. We show that, in contrast to RQT, there is no possibility of distinguishing any twirled quantum world from quantum theory given only an assumption about causal structure. We also define analogues of twirling and swirling for an arbitrary generalized probabilistic theory and identify certain necessary conditions on a causal structure for it to be able to support a causal compatibility gap between the theory and its symmetrized version. We draw out the implications of these analyses for the question of how a lack of a shared reference frame state features into the possibility of such a gap.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Copenhagenish interpretations of quantum mechanics
Authors:
David Schmid,
Yìlè Yīng,
Matthew Leifer
Abstract:
We define a class of Copenhagenish interpretations encompassing modern interpretations that follow the Copenhagen spirit. These interpretations are characterized by four postulates: Observers Observe, Universality, Anti-$ψ$-ontology, and Completeness. We explain why such interpretations are not equivalent to the textbook (or orthodox) interpretation, nor to the view that one should shut up and cal…
▽ More
We define a class of Copenhagenish interpretations encompassing modern interpretations that follow the Copenhagen spirit. These interpretations are characterized by four postulates: Observers Observe, Universality, Anti-$ψ$-ontology, and Completeness. We explain why such interpretations are not equivalent to the textbook (or orthodox) interpretation, nor to the view that one should shut up and calculate, nor to strict operationalism. We then discuss what lessons are implied for Copenhagenish interpretations by the measurement problem, the Wigner's friend thought experiment, and the simple variants of the Wigner's friend thought experiment that we term Wigner's enemy, stalkee, and penpal. In particular, we discuss how Copenhagenish interpretations give multiple distinct descriptions of each experiment, where these descriptions are each individually true, yet cannot be combined into any single description. To make such interpretations consistent, then, one requires epistemological constraints forbidding certain perspectives from being combined. We discuss these constraints, their motivations, and some of the challenges they introduce.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
Adaptive Location Hierarchy Learning for Long-Tailed Mobility Prediction
Authors:
Yu Wang,
Junshu Dai,
Yuchen Ying,
Yuxuan Liang,
Tongya Zheng,
Mingli Song
Abstract:
Human mobility prediction is crucial for applications ranging from location-based recommendations to urban planning, which aims to forecast users' next location visits based on historical trajectories. Despite the severe long-tailed distribution of locations, the problem of long-tailed mobility prediction remains largely underexplored. Existing long-tailed learning methods primarily focus on rebal…
▽ More
Human mobility prediction is crucial for applications ranging from location-based recommendations to urban planning, which aims to forecast users' next location visits based on historical trajectories. Despite the severe long-tailed distribution of locations, the problem of long-tailed mobility prediction remains largely underexplored. Existing long-tailed learning methods primarily focus on rebalancing the skewed distribution at the data, model, or class level, neglecting to exploit the spatiotemporal semantics of locations. To address this gap, we propose the first plug-and-play framework for long-tailed mobility prediction in an exploitation and exploration manner, named \textbf{A}daptive \textbf{LO}cation \textbf{H}ier\textbf{A}rchy learning (ALOHA). First, we construct city-tailored location hierarchy based on Large Language Models (LLMs) by exploiting Maslow's theory of human motivation to design Chain-of-Thought (CoT) prompts that captures spatiotemporal semantics. Second, we optimize the location hierarchy predictions by Gumbel disturbance and node-wise adaptive weights within the hierarchical tree structure. Experiments on state-of-the-art models across six datasets demonstrate the framework's consistent effectiveness and generalizability, which strikes a well balance between head and tail locations. Weight analysis and ablation studies reveal the optimization differences of each component for head and tail locations. Furthermore, in-depth analyses of hierarchical distance and case study demonstrate the effective semantic guidance from the location hierarchy. Our code will be made publicly available.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Quantifiers and witnesses for the nonclassicality of measurements and of states
Authors:
Yujie Zhang,
Yìlè Yīng,
David Schmid
Abstract:
In a recent work, arXiv:2503.05884, we proposed a unified notion of nonclassicality that applies to arbitrary processes in quantum theory, including individual quantum states, measurements, channels, set of these, etc. This notion is derived from the principle of generalized noncontextuality, but in a novel manner that applies to individual processes rather than full experiments or theories. Here,…
▽ More
In a recent work, arXiv:2503.05884, we proposed a unified notion of nonclassicality that applies to arbitrary processes in quantum theory, including individual quantum states, measurements, channels, set of these, etc. This notion is derived from the principle of generalized noncontextuality, but in a novel manner that applies to individual processes rather than full experiments or theories. Here, we provide novel certificates and measures for characterizing and quantifying the nonclassicality inherent in states, measurements, and sets thereof, using semidefinite programming techniques. These are theory-dependent, complementing theory-independent methods based on noncontextuality inequalities. We provide explicit applications of these ideas to many illustrative examples.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Reassessing the boundary between classical and nonclassical for individual quantum processes
Authors:
Yujie Zhang,
David Schmid,
Yìlè Yīng,
Robert W. Spekkens
Abstract:
There is a received wisdom about where to draw the boundary between classical and nonclassical for various types of quantum processes. For instance, for multipartite states, it is the divide between separable and entangled, for channels, the divide between entanglement-breaking and not, for sets of measurements, the divide between compatible and incompatible, and for assemblages, the divide betwee…
▽ More
There is a received wisdom about where to draw the boundary between classical and nonclassical for various types of quantum processes. For instance, for multipartite states, it is the divide between separable and entangled, for channels, the divide between entanglement-breaking and not, for sets of measurements, the divide between compatible and incompatible, and for assemblages, the divide between steerable and unsteerable. However, no unified justification of these placements of the classical-nonclassical divide has been proposed. That is, although each might be motivated by some notion of what it means to be classically explainable, it is not the same notion for all of them. One well-motivated notion of classical explainability is the one based on generalized noncontextuality: a set of circuits is classically explainable if the statistics they generate can be realized by a generalized-noncontextual ontological model. In this work, we show that this notion can be leveraged to define a classical-nonclassical divide for individual quantum processes of arbitrary type. A set of measurements is judged to be classical if and only if a particular set of circuits -- the one obtained by contracting these measurements with every possible quantum state -- is classically explainable in the sense just articulated. We begin the task of characterizing where the classical-nonclassical divide lies according to this proposal for a variety of different types of processes. In particular, we show that all of the following are judged to be nonclassical: every entangled state, every set of incompatible measurements, every non-entanglement-breaking channel, every steerable assemblage. However, it also judges certain subsets of the complementary classes to be nonclassical, i.e., certain separable states, compatible sets of measurements, entanglement-breaking channels, and unsteerable assemblages.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
A Navigation System for ROV's inspection on Fish Net Cage
Authors:
Zhikang Ge,
Fang Yang,
Wenwu Lu,
Peng Wei,
Yibin Ying,
Chen Peng
Abstract:
Autonomous Remotely Operated Vehicles (ROVs) offer a promising solution for automating fishnet inspection, reducing labor dependency, and improving operational efficiency. In this paper, we modify an off-the-shelf ROV, the BlueROV2, into a ROS-based framework and develop a localization module, a path planning system, and a control framework. For real-time, local localization, we employ the open-so…
▽ More
Autonomous Remotely Operated Vehicles (ROVs) offer a promising solution for automating fishnet inspection, reducing labor dependency, and improving operational efficiency. In this paper, we modify an off-the-shelf ROV, the BlueROV2, into a ROS-based framework and develop a localization module, a path planning system, and a control framework. For real-time, local localization, we employ the open-source TagSLAM library. Additionally, we propose a control strategy based on a Nominal Feedback Controller (NFC) to achieve precise trajectory tracking. The proposed system has been implemented and validated through experiments in a controlled laboratory environment, demonstrating its effectiveness for real-world applications.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Hysteretic responses of nanomechanical resonators based on crumpled few-layer graphene
Authors:
Heng Lu,
Chen Yang,
Ce Zhang,
YuBin Zhang,
FengNan Chen,
Yue Ying,
Zhuo-Zhi Zhang,
Xiang-Xiang Song,
Guang-Wei Deng,
Ying Yan,
Joel Moser
Abstract:
Manipulating two-dimensional materials occasionally results in crumpled membranes. Their complicated morphologies feature an abundance of folds, creases and wrinkles that make each crumpled membrane unique. Here, we prepare four nanomechanical resonators based on crumpled membranes of few-layer graphene and measure their static response and the spectrum of their dynamic response. We tune both resp…
▽ More
Manipulating two-dimensional materials occasionally results in crumpled membranes. Their complicated morphologies feature an abundance of folds, creases and wrinkles that make each crumpled membrane unique. Here, we prepare four nanomechanical resonators based on crumpled membranes of few-layer graphene and measure their static response and the spectrum of their dynamic response. We tune both responses with a dc voltage applied between the membrane and an underlying gate electrode. Surprisingly, we find that all four resonators exhibit hysteretic responses as the gate voltage is increased and then decreased. Concomitant discontinuities in the static response and in the vibrational resonant frequencies indicate a sudden change in the shape and in the tensile strain of the membranes. We also find that the hystereses can be removed and regular responses can be restored by annealing the resonators. We hypothesize that the hysteretic nature of the responses may originate from an interplay between the rugged morphology of the membranes and adsorbates trapped within the confine of the folds.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Learn from Foundation Model: Fruit Detection Model without Manual Annotation
Authors:
Yanan Wang,
Zhenghao Fei,
Ruichen Li,
Yibin Ying
Abstract:
Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (S…
▽ More
Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D (Segmentation-Description-Matching-Distilling), demonstrates strong performance across various fruit detection tasks object detection, semantic segmentation, and instance segmentation) without manual annotation. It nearly matches the performance of models trained with abundant labels. Notably, SDM-D outperforms open-set detection methods such as Grounding SAM and YOLO-World on all tested fruit detection datasets. Additionally, we introduce MegaFruits, a comprehensive fruit segmentation dataset encompassing over 25,000 images, and all code and datasets are made publicly available at https://github.com/AgRoboticsResearch/SDM-D.git.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks
Authors:
Zhaoxi Zhang,
Yueliang Ying
Abstract:
Computed tomography (CT) provides highly detailed three-dimensional (3D) medical images but is costly, time-consuming, and often inaccessible in intraoperative settings (Organization et al. 2011). Recent advancements have explored reconstructing 3D chest volumes from sparse 2D X-rays, such as single-view or orthogonal double-view images. However, current models tend to process 2D images in a plana…
▽ More
Computed tomography (CT) provides highly detailed three-dimensional (3D) medical images but is costly, time-consuming, and often inaccessible in intraoperative settings (Organization et al. 2011). Recent advancements have explored reconstructing 3D chest volumes from sparse 2D X-rays, such as single-view or orthogonal double-view images. However, current models tend to process 2D images in a planar manner, prioritizing visual realism over structural accuracy. In this work, we introduce DuoLift Generative Adversarial Networks (DuoLift-GAN), a novel architecture with dual branches that independently elevate 2D images and their features into 3D representations. These 3D outputs are merged into a unified 3D feature map and decoded into a complete 3D chest volume, enabling richer 3D information capture. We also present a masked loss function that directs reconstruction towards critical anatomical regions, improving structural accuracy and visual quality. This paper demonstrates that DuoLift-GAN significantly enhances reconstruction accuracy while achieving superior visual realism compared to existing methods.
△ Less
Submitted 11 December, 2024; v1 submitted 12 November, 2024;
originally announced November 2024.
-
First-in-human spinal cord tumor imaging with fast adaptive focus tracking robotic-OCT
Authors:
Bin He,
Yuzhe Ying,
Yejiong Shi,
Zhe Meng,
Zichen Yin,
Zhengyu Chen,
Zhangwei Hu,
Ruizhi Xue,
Linkai Jing,
Yang Lu,
Zhenxing Sun,
Weitao Man,
Youtu Wu,
Dan Lei,
Ning Zhang,
Guihuai Wang,
Ping Xue
Abstract:
Current surgical procedures for spinal cord tumors lack in vivo high-resolution, high-speed multifunctional imaging systems, posing challenges for precise tumor resection and intraoperative decision-making. This study introduces the Fast Adaptive Focus Tracking Robotic Optical Coherence Tomography (FACT-ROCT) system,designed to overcome these obstacles by providing real-time, artifact-free multifu…
▽ More
Current surgical procedures for spinal cord tumors lack in vivo high-resolution, high-speed multifunctional imaging systems, posing challenges for precise tumor resection and intraoperative decision-making. This study introduces the Fast Adaptive Focus Tracking Robotic Optical Coherence Tomography (FACT-ROCT) system,designed to overcome these obstacles by providing real-time, artifact-free multifunctional imaging of spinal cord tumors during surgery. By integrating cross-scanning, adaptive focus tracking and robotics, the system addresses motion artifacts and resolution degradation from tissue movement, achieving wide-area, high-resolution imaging. We conducted intraoperative imaging on 21 patients, including 13 with spinal gliomas and 8 with other tumors. This study marks the first demonstration of OCT in situ imaging of human spinal cord tumors, providing micrometer-scale in vivo structural images and demonstrating FACT-ROCT's potential to differentiate various tumor types in real-time. Analysis of the attenuation coefficients of spinal gliomas revealed increased heterogeneity with higher malignancy grades. So, we proposed the standard deviation of the attenuation coefficient as a physical marker, achieving over 90% accuracy in distinguishing high- from low-grade gliomas intraoperatively at a threshold. FACT-ROCT even enabled extensive in vivo microvascular imaging of spinal cord tumors, covering 70 mm * 13 mm * 10 mm within 2 minutes. Quantitative vascular tortuosity comparisons confirmed greater tortuosity in higher-grade tumors. The ability to perform extensive vascular imaging and real-time tumor grading during surgery provides critical information for surgical strategy, such as minimizing intraoperative bleeding and optimizing tumor resection while preserving functional tissue.
△ Less
Submitted 29 October, 2024; v1 submitted 29 October, 2024;
originally announced October 2024.
-
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Authors:
Chengyu Du,
Jinyi Han,
Yizhou Ying,
Aili Chen,
Qianyu He,
Haokun Zhao,
Sirui Xia,
Haoran Guo,
Jiaqing Liang,
Zulong Chen,
Liangyue Li,
Yanghua Xiao
Abstract:
Recent advancements in large language models (LLMs) have demonstrated that progressive refinement, rather than providing a single answer, results in more accurate and thoughtful outputs. However, existing methods often rely heavily on supervision signals to evaluate previous responses, making it difficult to assess output quality in more open-ended scenarios effectively. Additionally, these method…
▽ More
Recent advancements in large language models (LLMs) have demonstrated that progressive refinement, rather than providing a single answer, results in more accurate and thoughtful outputs. However, existing methods often rely heavily on supervision signals to evaluate previous responses, making it difficult to assess output quality in more open-ended scenarios effectively. Additionally, these methods are typically designed for specific tasks, which limits their generalization to new domains. To address these limitations, we propose Progressive Thought Refinement (PTR), a framework that enables LLMs to refine their responses progressively. PTR operates in two phases: (1) Thought data construction stage: We propose a weak and strong model collaborative selection strategy to build a high-quality progressive refinement dataset to ensure logical consistency from thought to answers, and the answers are gradually refined in each round. (2) Thought-Mask Fine-Tuning Phase: We design a training structure to mask the "thought" and adjust loss weights to encourage LLMs to refine prior thought, teaching them to implicitly understand "how to improve" rather than "what is correct." Experimental results show that PTR significantly enhances LLM performance across ten diverse tasks (avg. from 49.6% to 53.5%) without task-specific fine-tuning. Notably, in more open-ended tasks, LLMs also demonstrate substantial improvements in the quality of responses beyond mere accuracy, suggesting that PTR truly teaches LLMs to self-improve over time.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Authors:
Bokun Wang,
Yunwen Lei,
Yiming Ying,
Tianbao Yang
Abstract:
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive…
▽ More
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning. To address the challenge of computing the integral in the partition function for each anchor data, we leverage the multiple importance sampling (MIS) technique for robust Monte Carlo integration, which can recover InfoNCE-based contrastive loss as a special case. Within this probabilistic modeling framework, we conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning and derive insights for developing better approaches by reducing the error of Monte Carlo integration. To this end, we propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS through convex optimization, yielding a new contrastive objective for self-supervised representation learning. Moreover, we design an efficient algorithm for solving the proposed objective. We empirically compare our algorithm to representative baselines on the contrastive image-language pretraining task. Experimental results on the CC3M and CC12M datasets demonstrate the superior overall performance of our algorithm. Our code is available at https://github.com/bokun-wang/NUCLR.
△ Less
Submitted 5 March, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models
Authors:
Wenhao Yu,
Jie Peng,
Yueliang Ying,
Sai Li,
Jianmin Ji,
Yanyong Zhang
Abstract:
The integration of large language models (LLMs) with robotics has significantly advanced robots' abilities in perception, cognition, and task planning. The use of natural language interfaces offers a unified approach for expressing the capability differences of heterogeneous robots, facilitating communication between them, and enabling seamless task allocation and collaboration. Currently, the uti…
▽ More
The integration of large language models (LLMs) with robotics has significantly advanced robots' abilities in perception, cognition, and task planning. The use of natural language interfaces offers a unified approach for expressing the capability differences of heterogeneous robots, facilitating communication between them, and enabling seamless task allocation and collaboration. Currently, the utilization of LLMs to achieve decentralized multi-heterogeneous robot collaborative tasks remains an under-explored area of research. In this paper, we introduce a novel framework that utilizes LLMs to achieve decentralized collaboration among multiple heterogeneous robots. Our framework supports three robot categories, mobile robots, manipulation robots, and mobile manipulation robots, working together to complete tasks such as exploration, transportation, and organization. We developed a rich set of textual feedback mechanisms and chain-of-thought (CoT) prompts to enhance task planning efficiency and overall system performance. The mobile manipulation robot can adjust its base position flexibly, ensuring optimal conditions for grasping tasks. The manipulation robot can comprehend task requirements, seek assistance when necessary, and handle objects appropriately. Meanwhile, the mobile robot can explore the environment extensively, map object locations, and communicate this information to the mobile manipulation robot, thus improving task execution efficiency. We evaluated the framework using PyBullet, creating scenarios with three different room layouts and three distinct operational tasks. We tested various LLM models and conducted ablation studies to assess the contributions of different modules. The experimental results confirm the effectiveness and necessity of our proposed framework.
△ Less
Submitted 25 September, 2024; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Connecting extended Wigner's friend arguments and noncontextuality
Authors:
Laurens Walleghem,
Yìlè Yīng,
Rafael Wagner,
David Schmid
Abstract:
The Local Friendliness argument is an extended Wigner's friend no-go theorem that provides strong constraints on the nature of reality -- stronger even than those imposed by Bell's theorem or by noncontextuality arguments. In this work, we prove a variety of connections between Local Friendliness scenarios and Kochen-Specker noncontextuality. Specifically, we first show how one can derive new Loca…
▽ More
The Local Friendliness argument is an extended Wigner's friend no-go theorem that provides strong constraints on the nature of reality -- stronger even than those imposed by Bell's theorem or by noncontextuality arguments. In this work, we prove a variety of connections between Local Friendliness scenarios and Kochen-Specker noncontextuality. Specifically, we first show how one can derive new Local Friendliness inequalities using known tools and results from the literature on Kochen-Specker noncontextuality. In doing so, we provide a new derivation for some of the facets of the Local Friendliness polytope, and we prove that this polytope is equal to the Bell polytope in a wide range of extended Wigner's friend scenarios with multipartite agents and sequential measurements. We then show how any possibilistic Kochen-Specker argument can be mathematically translated into a related proof of the Local Friendliness no-go theorem. In particular, we construct a novel kind of Local Friendliness scenario where a friend implements several compatible measurements (or joint measurements of these) in between the superobserver's operations on them. We illustrate this with the well-known 5-cycle and Peres-Mermin contextuality arguments.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Exceptional point and hysteresis trajectories in cold Rydberg atomic gases
Authors:
Jun Zhang,
En-Ze Li,
Ya-Jun Wang,
Bang Liu,
Li-Hua Zhang,
Zheng-Yuan Zhang,
Shi-Yao Shao,
Qing Li,
Han-Chao Chen,
Yu Ma,
Tian-Yu Han,
Qi-Feng Wang,
Jia-Dou Nan,
Yi-Ming Ying,
Dong-Yang Zhu,
Bao-Sen Shi,
Dong-Sheng Ding
Abstract:
The interplay between strong long-range interactions and the coherent driving contribute to the formation of complex patterns, symmetry, and novel phases of matter in many-body systems. However, long-range interactions may induce an additional dissipation channel, resulting in non-Hermitian many-body dynamics and the emergence of exceptional points in spectrum. Here, we report experimental observa…
▽ More
The interplay between strong long-range interactions and the coherent driving contribute to the formation of complex patterns, symmetry, and novel phases of matter in many-body systems. However, long-range interactions may induce an additional dissipation channel, resulting in non-Hermitian many-body dynamics and the emergence of exceptional points in spectrum. Here, we report experimental observation of interaction-induced exceptional points in cold Rydberg atomic gases, revealing the breaking of charge-conjugation parity symmetry. By measuring the transmission spectrum under increasing and decreasing probe intensity, the interaction-induced hysteresis trajectories are observed, which give rise to non-Hermitian dynamics. We record the area enclosed by hysteresis loops and investigate the dynamics of hysteresis loops. The reported exceptional points and hysteresis trajectories in cold Rydberg atomic gases provide valuable insights into the underlying non-Hermitian physics in many-body systems, allowing us to study the interplay between long-range interactions and non-Hermiticity.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Twirled worlds: symmetry-induced failures of tomographic locality
Authors:
Daniel Centeno,
Marco Erba,
David Schmid,
John H. Selby,
Robert W. Spekkens,
Sina Soltani,
Jacopo Surace,
Alex Wilce,
Yìlè Yīng
Abstract:
Tomographic locality is a principle commonly used in the program of finding axioms that pick out quantum theory within the landscape of possible theories. The principle asserts the sufficiency of local measurements for achieving a tomographic characterization of any bipartite state. In this work, we explore the meaning of the principle of tomographic locality by developing a simple scheme for gene…
▽ More
Tomographic locality is a principle commonly used in the program of finding axioms that pick out quantum theory within the landscape of possible theories. The principle asserts the sufficiency of local measurements for achieving a tomographic characterization of any bipartite state. In this work, we explore the meaning of the principle of tomographic locality by developing a simple scheme for generating a wide variety of theories that violate the principle. In this scheme, one starts with a tomographically local theory -- which can be classical, quantum or post-quantum -- and a physical symmetry, and one restricts the processes in the theory to all and only those that are covariant with respect to the collective action of that symmetry. We refer to the resulting theories as twirled worlds. We show that failures of tomographic locality are ubiquitous in twirled worlds. From the possibility of such failures in classical twirled worlds, we argue that the failure of tomographic locality (i.e., tomographic nonlocality) does not imply ontological holism. Our results also demonstrate the need for researchers seeking to axiomatize quantum theory to take a stand on the question of whether there are superselection rules that have a fundamental status.
△ Less
Submitted 4 October, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Conceptual and formal groundwork for the study of resource dependence relations
Authors:
Yìlè Yīng,
Tomáš Gonda,
Robert Spekkens
Abstract:
A resource theory imposes a preorder over states, with one state being above another if the first can be converted to the second by a free operation, and where the set of free operations defines the notion of resourcefulness under study. In general, the location of a state in the preorder of one resource theory can constrain its location in the preorder of a different resource theory. It follows t…
▽ More
A resource theory imposes a preorder over states, with one state being above another if the first can be converted to the second by a free operation, and where the set of free operations defines the notion of resourcefulness under study. In general, the location of a state in the preorder of one resource theory can constrain its location in the preorder of a different resource theory. It follows that there can be nontrivial dependence relations between different notions of resourcefulness. In this article, we lay out the conceptual and formal groundwork for the study of resource dependence relations. In particular, we note that the relations holding among a set of monotones that includes a complete set for each resource theory provides a full characterization of resource dependence relations. As an example, we consider three resource theories concerning the about-face asymmetry properties of a qubit along three mutually orthogonal axes on the Bloch ball, where about-face symmetry refers to a representation of $\mathbb{Z}_2$, consisting of the identity map and a $π$ rotation about the given axis. This example is sufficiently simple that we are able to derive a complete set of monotones for each resource theory and to determine all of the relations that hold among these monotones, thereby completely solving the problem of determining resource dependence relations. Nonetheless, we show that even in this simplest of examples, these relations are already quite nuanced.
△ Less
Submitted 12 July, 2024; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics
Authors:
Zichen Yin,
Shuwei Zhang,
Bin He,
Houpu Yang,
Zhengyu Chen,
Zhangwei Hu,
Yejiong Shi,
Ruizhi Xue,
Panqi Yang,
Yuzhe Ying,
Chengming Wang,
Shu Wang,
Ping Xue
Abstract:
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll…
▽ More
Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Kirkwood-Dirac representations beyond quantum states (and their relation to noncontextuality)
Authors:
David Schmid,
Roberto D. Baldijão,
Yìlè Yīng,
Rafael Wagner,
John H. Selby
Abstract:
Kirkwood-Dirac representations of quantum states are increasingly finding use in many areas within quantum theory. Usually, representations of this sort are only applied to provide a representation of quantum states (as complex functions over some set). We show how standard Kirkwood-Dirac representations can be extended to a fully compositional representation of all of quantum theory (including ch…
▽ More
Kirkwood-Dirac representations of quantum states are increasingly finding use in many areas within quantum theory. Usually, representations of this sort are only applied to provide a representation of quantum states (as complex functions over some set). We show how standard Kirkwood-Dirac representations can be extended to a fully compositional representation of all of quantum theory (including channels, measurements and so on), and prove that this extension satisfies the essential features of functoriality (namely, that the representation commutes with composition of channels), linearity, and quasistochasticity. Interestingly, the representation of a POVM element is uniquely picked out to be the collection of weak values for it relative to the bases defining the representation. We then prove that if one can find any Kirkwood-Dirac representation that is everywhere real and nonnegative for a given experimental scenario or fragment of quantum theory, then the scenario or fragment is consistent with the principle of generalized noncontextuality, a key notion of classicality in quantum foundations. We also show that the converse does not hold: even if one verifies that all Kirkwood-Dirac representations (as defined herein) of an experiment require negativity or imaginarity, one cannot generally conclude that the experiment witnesses contextuality.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Fast and label-free 3D virtual H&E histology via active modulation-assisted dynamic full-field OCT
Authors:
Zichen Yin,
Bin He,
Yuzhe Ying,
Shuwei Zhang,
Panqi Yang,
Zhengyu Chen,
Zhangwei Hu,
Yejiong Shi,
Ruizhi Xue,
Chengming Wang,
Shu Wang,
Guihuai Wang,
Ping Xue
Abstract:
Pathological features are the gold standard for tumor diagnosis, guiding treatment and prognosis. However, standard histopathological process is labor-intensive and time-consuming, while frozen sections have lower accuracy. Dynamic full-field optical coherence tomography (D-FFOCT) offers rapid histologic information by measuring the subcellular dynamics of fresh, unprocessed tissues. However, D-FF…
▽ More
Pathological features are the gold standard for tumor diagnosis, guiding treatment and prognosis. However, standard histopathological process is labor-intensive and time-consuming, while frozen sections have lower accuracy. Dynamic full-field optical coherence tomography (D-FFOCT) offers rapid histologic information by measuring the subcellular dynamics of fresh, unprocessed tissues. However, D-FFOCT images suffer from abrupt shifts in hue and brightness, which is confusing for pathologists and diminish their interpretability and reliability. Here, we present active phase modulation-assisted D-FFOCT (APMD-FFOCT) to improve the imaging stability and enhance the contrast of static tissues. This enables us to further employ an unsupervised deep learning to convert APMD-FFOCT images into virtual hematoxylin and eosin (H&E) stained images for the first time. Three-dimensional (3D) virtual H&E-stained images have been obtained at a scanning rate of 1 frame per second, as demonstrated in cancer diagnosis for human central nervous system and breast. The results prove that this new method will play a unique and important role in intraoperative histology.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Multi-View Active Sensing for Human-Robot Interaction via Hierarchically Connected Tree
Authors:
Yuanjiong Ying,
Xian Huang,
Wei Dong
Abstract:
Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing…
▽ More
Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing strategy based on a hierarchically connected tree structure is proposed to fuse localized visual information. Constituting the tree model are the nodes representing keypoints and the edges representing keyparts, which are consistently interconnected to preserve the structural constraints during multi-source fusion. Utilizing RGB-D data and HRNet, the 3D positions of keypoints are analytically estimated, and their presence is inferred through a sliding widow of confidence scores. Subsequently, the point clouds of reliable keyparts are extracted by drawing occlusion-resistant masks, enabling fine registration between data clouds and cylindrical model following the hierarchical order. Experimental results demonstrate that our method enhances keypart recognition recall from 69.20% to 90.10%, compared to employing a single static camera. Furthermore, in overcoming challenges related to localized and occluded perception, the robotic arm's obstacle avoidance capabilities are effectively improved.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
CEASE: Collision-Evaluation-based Active Sense System for Collaborative Robotic Arms
Authors:
Xian Huang,
Yuanjiong Ying,
Wei Dong
Abstract:
Collision detection via visual fences can significantly enhance the safety of collaborative robotic arms. Existing work typically performs such detection based on pre-deployed stationary cameras outside the robotic arm's workspace. These stationary cameras can only provide a restricted detection range and constrain the mobility of the robotic system. To cope with this issue, we propose an active s…
▽ More
Collision detection via visual fences can significantly enhance the safety of collaborative robotic arms. Existing work typically performs such detection based on pre-deployed stationary cameras outside the robotic arm's workspace. These stationary cameras can only provide a restricted detection range and constrain the mobility of the robotic system. To cope with this issue, we propose an active sense method enabling a wide range of collision risk evaluation in dynamic scenarios. First, an active vision mechanism is implemented by equipping cameras with additional degrees of rotation. Considering the uncertainty in the active sense, we design a state confidence envelope to uniformly characterize both known and potential dynamic obstacles. Subsequently, using the observation-based uncertainty evolution, collision risk is evaluated by the prediction of obstacle envelopes. On this basis, a Markov decision process was employed to search for an optimal observation sequence of the active sense system, which enlarges the field of observation and reduces uncertainties in the state estimation of surrounding obstacles. Simulation and real-world experiments consistently demonstrate a 168% increase in the observation time coverage of typical dynamic humanoid obstacles compared to the method using stationary cameras, which underscores our system's effectiveness in collision risk tracking and enhancing the safety of robotic arms.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Towards Accurate Post-training Quantization for Reparameterized Models
Authors:
Luoming Zhang,
Yefei He,
Wen Fei,
Zhenyu Lou,
Weijia Wu,
YangWei Ying,
Hong Zhou
Abstract:
Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impac…
▽ More
Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework comprises two main components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance. The code is available at \url{https://github.com/ilur98/DLMC-QUANT}.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Authors:
Haolan Zhan,
Zhuang Li,
Xiaoxi Kang,
Tao Feng,
Yuncheng Hua,
Lizhen Qu,
Yi Ying,
Mei Rianto Chandra,
Kelly Rosalin,
Jureynolds Jureynolds,
Suraj Sharma,
Shilin Qu,
Linhao Luo,
Lay-Ki Soon,
Zhaleh Semnani Azad,
Ingrid Zukerman,
Gholamreza Haffari
Abstract:
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as…
▽ More
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors:
Hanpu Shen,
Cheng-Long Wang,
Zihang Xiang,
Yiming Ying,
Di Wang
Abstract:
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded…
▽ More
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example.
In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Extended Wigner's friend paradoxes do not require nonlocal correlations
Authors:
Laurens Walleghem,
Rafael Wagner,
Yìlè Yīng,
David Schmid
Abstract:
Extended Wigner's friend no-go theorems provide a modern lens for investigating the measurement problem, by making precise the challenges that arise when one attempts to model agents as dynamical quantum systems. Most such no-go theorems studied to date, such as the Frauchiger-Renner argument and the Local Friendliness argument, are explicitly constructed using quantum correlations that violate Be…
▽ More
Extended Wigner's friend no-go theorems provide a modern lens for investigating the measurement problem, by making precise the challenges that arise when one attempts to model agents as dynamical quantum systems. Most such no-go theorems studied to date, such as the Frauchiger-Renner argument and the Local Friendliness argument, are explicitly constructed using quantum correlations that violate Bell inequalities. In this work, we show that such correlations are not necessary for having extended Wigner's friend paradoxes, by constructing a no-go theorem utilizing a proof of the failure of noncontextuality. The argument hinges on a novel metaphysical assumption (which we term Commutation Irrelevance) that is a natural extension of a key assumption going into the Frauchiger and Renner's no-go theorem.
△ Less
Submitted 24 January, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Relating Wigner's Friend Scenarios to Nonclassical Causal Compatibility, Monogamy Relations, and Fine Tuning
Authors:
Yìlè Yīng,
Marina Maciel Ansanelli,
Andrea Di Biagio,
Elie Wolfe,
David Schmid,
Eric Gama Cavalcanti
Abstract:
Nonclassical causal modeling was developed in order to explain violations of Bell inequalities while adhering to relativistic causal structure and faithfulness -- that is, avoiding fine-tuned causal explanations. Recently, a no-go theorem that can be viewed as being stronger than Bell's theorem has been derived, based on extensions of the Wigner's friend thought experiment: the Local Friendliness…
▽ More
Nonclassical causal modeling was developed in order to explain violations of Bell inequalities while adhering to relativistic causal structure and faithfulness -- that is, avoiding fine-tuned causal explanations. Recently, a no-go theorem that can be viewed as being stronger than Bell's theorem has been derived, based on extensions of the Wigner's friend thought experiment: the Local Friendliness (LF) no-go theorem. Here we show that the LF no-go theorem poses formidable challenges for the field of causal modeling, even when nonclassical and/or cyclic causal explanations are considered. We first recast the LF inequalities, one of the key elements of the LF no-go theorem, as special cases of monogamy relations stemming from a statistical marginal problem. We then further recast LF inequalities as causal compatibility inequalities stemming from a nonclassical causal marginal problem, for a causal structure implied by well-motivated causal-metaphysical assumptions. We find that the LF inequalities emerge from this causal structure even when one allows the latent causes of observed events to admit post-quantum descriptions, such as in a generalized probabilistic theory or in an even more exotic theory. We further prove that no nonclassical causal model can explain violations of LF inequalities without violating the No Fine-Tuning principle. Finally, we note that these obstacles cannot be overcome even if one appeals to cyclic causal models, and we discuss potential directions for further extensions of the causal modeling framework.
△ Less
Submitted 25 September, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
Outlier Robust Adversarial Training
Authors:
Shu Hu,
Zhenhuan Yang,
Xin Wang,
Yiming Ying,
Siwei Lyu
Abstract:
Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are…
▽ More
Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
A review and analysis of six extended Wigner's friend arguments
Authors:
David Schmid,
Yìlè Yīng,
Matthew Leifer
Abstract:
The Wigner's friend thought experiment was intended to illustrate the difficulty one has in describing an agent as a quantum system when that agent performs a measurement. While it does pose a challenge to the orthodox interpretation of quantum theory, most modern interpretations have no trouble in resolving the difficulty. Recently, a number of extensions of Wigner's ideas have been proposed. We…
▽ More
The Wigner's friend thought experiment was intended to illustrate the difficulty one has in describing an agent as a quantum system when that agent performs a measurement. While it does pose a challenge to the orthodox interpretation of quantum theory, most modern interpretations have no trouble in resolving the difficulty. Recently, a number of extensions of Wigner's ideas have been proposed. We provide a gentle introduction to six such arguments, modifying the specifics of many of them so that they are as simple and unified as possible. In particular, we show that all of the arguments hinge on assumptions about correlations between measurement outcomes that are not accessible to any observer, even in principle. We then provide a critical analysis of each argument, focusing especially on how well one can motivate the required assumptions regarding these inaccessible correlations. Although we argue that some of these assumptions are not entirely well-motivated, all of the arguments do shed light on the nature of quantum theory, especially when concerning the description of agents and their measurements.
△ Less
Submitted 10 September, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Elastic scattering and total reaction cross sections of $^{6}$Li studied with a microscopic continuum discretized coupled channels model
Authors:
Wendi Chen,
D. Y. Pang,
Hairui Guo,
Ye Tao,
Weili Sun,
Yangjun Ying
Abstract:
We present a systematic study of $^{6}$Li elastic scattering and total reaction cross sections at incident energies around the Coulomb barrier within the continuum discretized coupled-channels (CDCC) framework, where $^{6}$Li is treated in an $α$+$d$ two-body model. Collisions with $^{27}$Al, $^{64}$Zn, $^{138}$Ba and $^{208}$Pa are analyzed. The microscopic optical potentials (MOP) based on Skyrm…
▽ More
We present a systematic study of $^{6}$Li elastic scattering and total reaction cross sections at incident energies around the Coulomb barrier within the continuum discretized coupled-channels (CDCC) framework, where $^{6}$Li is treated in an $α$+$d$ two-body model. Collisions with $^{27}$Al, $^{64}$Zn, $^{138}$Ba and $^{208}$Pa are analyzed. The microscopic optical potentials (MOP) based on Skyrme nucleon-nucleon interaction for $α$ and $d$ are adopted in CDCC calculations and satisfactory agreement with the experimental data is obtained without any adjustment on MOPs. For comparison, the $α$ and $d$ global phenomenological optical potentials (GOP) are also used in CDCC analysis and a reduction no less than 50$\%$ on the surface imaginary part of deuteron GOP is required for describing the data. In all cases, the $^6$Li breakup effect is significant and provides repulsive correction to the folding model potential. The reduction on the surface imaginary part of GOP of deuteron reveals a strong suppression of the reaction probability of deuteron as a component of $^{6}$Li as compared with that of a free deuteron. A further investigation is made by taking the $d$ breakup process into account equivalently within the dynamic polarization potential approach and it shows that $d$ behaves like a tightly bound nucleus in $^{6}$Li induced reactions.
△ Less
Submitted 18 October, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms
Authors:
Ming Yang,
Xiyuan Wei,
Tianbao Yang,
Yiming Ying
Abstract:
Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on un…
▽ More
Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.
△ Less
Submitted 21 November, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Continuum-discretized coupled-channel calculations for $^{6}$Li fusion reactions with closed channels
Authors:
Wendi Chen,
D. Y. Pang,
Hairui Guo,
Ye Tao,
Weili Sun,
Yangjun Ying
Abstract:
Fusion reactions induced by the weakly bound nucleus $^{6}$Li with targets $^{28}$Si, $^{64}$Ni, $^{144}$Sm and $^{209}$Bi at energies around the Coulomb barrier are investigated within a three-body model where $^{6}$Li is described with an $α+ d$ cluster model. The total fusion (TF) cross sections are calculated with the continuum-discretized coupled-channel (CDCC) method and the complete fusion…
▽ More
Fusion reactions induced by the weakly bound nucleus $^{6}$Li with targets $^{28}$Si, $^{64}$Ni, $^{144}$Sm and $^{209}$Bi at energies around the Coulomb barrier are investigated within a three-body model where $^{6}$Li is described with an $α+ d$ cluster model. The total fusion (TF) cross sections are calculated with the continuum-discretized coupled-channel (CDCC) method and the complete fusion (CF) cross sections are extracted through the sum-rule model. The calculations demonstrate that (i) for the TF cross section calculations, the continuum states up to 40 MeV are found to be necessary, which corresponds to the inclusion of closed channels for light and medium mass targets, such as $^{28}$Si, $^{59}$Co and $^{144}$Sm, (ii) the converged CDCC results for TF cross section at energies above the Coulomb barrier are almost the same as single channel results in which the continuum coupling effect is neglected, and (iii) the continuum coupling strongly influences partial wave fusion cross sections and the closed channels play a significant role in the improvement of the description of the CF cross sections at energies below the Coulomb barrier for the $^6$Li+$^{28}$Si, $^{59}$Co and $^{144}$Sm systems.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
Authors:
Lisha Chen,
Heshan Fernando,
Yiming Ying,
Tianyi Chen
Abstract:
Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting a…
▽ More
Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
△ Less
Submitted 5 October, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
Authors:
Puyu Wang,
Yunwen Lei,
Di Wang,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stabil…
▽ More
Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stability,richards2021stability} by conducting a comprehensive stability and generalization analysis of GD for multi-layer NNs. For two-layer NNs, our results are established under general network scaling parameters, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, we derive the excess risk rate of $O(1/\sqrt{n})$ for GD algorithms in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate of $O(1/\sqrt{n})$. Moreover, we demonstrate that as the scaling parameter increases or the network complexity decreases, less over-parameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of $O(1/n)$ for GD in both two-layer and three-layer NNs.
△ Less
Submitted 29 September, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Fairness-aware Differentially Private Collaborative Filtering
Authors:
Zhenhuan Yang,
Yingqiang Ge,
Congzhe Su,
Dingxian Wang,
Xiaoting Zhao,
Yiming Ying
Abstract:
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochas…
▽ More
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochastic gradient descent (DP-SGD), results in a disparate impact on user groups with respect to different user engagement levels. This, in turn, causes the original unfair model to become even more biased against inactive users. To address the above issues, we propose \textbf{DP-Fair}, a two-stage framework for collaborative filtering based algorithms. Specifically, it combines differential privacy mechanisms with fairness constraints to protect user privacy while ensuring fair recommendations. The experimental results, based on Amazon datasets, and user history logs collected from Etsy, one of the largest e-commerce platforms, demonstrate that our proposed method exhibits superior performance in terms of both overall accuracy and user group fairness on both shallow and deep recommendation models compared to vanilla DP-SGD.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Generalization Analysis for Contrastive Representation Learning
Authors:
Yunwen Lei,
Tianbao Yang,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is…
▽ More
Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is necessary to guarantee good generalization of contrastive learning in downstream tasks. In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive Rademacher complexity bounds to get improved generalization bounds.
△ Less
Submitted 27 February, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Data-driven extraction of the substructure of quark and gluon jets in proton-proton and heavy-ion collisions
Authors:
Yueyang Ying
Abstract:
The modification of quark- and gluon-initiated jets in the quark-gluon plasma produced in heavy-ion collisions is a long-standing question that has not yet received a definitive answer from experiments. In particular, the size of the modifications in the quark-gluon plasma differs between theoretical models. Therefore a fully data-driven technique is crucial for an unbiased extraction of the quark…
▽ More
The modification of quark- and gluon-initiated jets in the quark-gluon plasma produced in heavy-ion collisions is a long-standing question that has not yet received a definitive answer from experiments. In particular, the size of the modifications in the quark-gluon plasma differs between theoretical models. Therefore a fully data-driven technique is crucial for an unbiased extraction of the quark and gluon jet spectra and substructure. Corroborating past results, I demonstrate the capability of a fully data-driven technique called topic modeling in separating quark and gluon contributions to jet observables. The data-driven topic separation results can further be used to extract jet substructures, such as jet shapes and jet fragmentation function, and their respective QGP modifications. In addition, I propose the use of machine learning constructed observables and demonstrate the potential to increase separability for the input observable. This proof-of-concept study is based on proton-proton and heavy-ion collision events from the PYQUEN generator with statistics accessible in Run 4 of the Large Hadron Collider. These results suggest the potential for an experimental determination of quark- and gluon-jet spectra, their substructures, and their modification in the QGP.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
SRIBO: An Efficient and Resilient Single-Range and Inertia Based Odometry for Flying Robots
Authors:
Wei Dong,
Zheyuan Mei,
Yuanjiong Ying,
Sijia Chen,
Yichen ie,
Xiangyang Zhu
Abstract:
Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a posi…
▽ More
Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a positioning approach is still not widely accepted in real-world applications. To address this issue, this work first investigates the dissipative nature of flying robots considering aerial drag effects and re-formulates the corresponding positioning problem, which guarantees observability almost surely. On this basis, a dimension-reduced wriggling estimator is proposed accordingly. This estimator slides the estimation horizon in a stepping manner, and output matrices can be approximately evaluated based on the historical estimation sequence. The computational complexity is then further reduced via a dimension-reduction approach using polynomial fittings. In this way, the states of robots can be estimated via linear programming in a sufficiently long interval, and the degree of observability is thereby further enhanced because an adequate redundancy of measurements is available for each estimation. Subsequently, the estimator's convergence and numerical stability are proven theoretically. Finally, both indoor and outdoor experiments verify that the proposed estimator can achieve decimeter-level precision at hundreds of hertz per second, and it is resilient to sensors' failures. Hopefully, this study can provide a new practical approach for self-localization as well as relative positioning of cooperative agents with low-cost and lightweight sensors.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Sliding nanomechanical resonators
Authors:
Yue Ying,
Zhuo-Zhi Zhang,
Joel Moser,
Zi-Jia Su,
Xiang-Xiang Song,
Guo-Ping Guo
Abstract:
The motion of a vibrating object is determined by the way it is held. This simple observation has long inspired string instrument makers to create new sounds by devising elegant string clamping mechanisms, whereby the distance between the clamping points is modulated as the string vibrates. At the nanoscale, the simplest way to emulate this principle would be to controllably make nanoresonators sl…
▽ More
The motion of a vibrating object is determined by the way it is held. This simple observation has long inspired string instrument makers to create new sounds by devising elegant string clamping mechanisms, whereby the distance between the clamping points is modulated as the string vibrates. At the nanoscale, the simplest way to emulate this principle would be to controllably make nanoresonators slide across their clamping points, which would effectively modulate their vibrating length. Here, we report measurements of flexural vibrations in nanomechanical resonators that reveal such a sliding motion. Surprisingly, the resonant frequency of vibrations draws a loop as a tuning gate voltage is cycled. This behavior indicates that sliding is accompanied by a delayed frequency response of the resonators, making their dynamics richer than that of resonators with fixed clamping points. Our work elucidates the dynamics of nanomechanical resonators with unconventional boundary conditions, and offers opportunities for studying friction at the nanoscale from resonant frequency measurements.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks
Authors:
Yunwen Lei,
Rong Jin,
Yiming Ying
Abstract:
While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of…
▽ More
While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Stability and Generalization for Markov Chain Stochastic Gradient Methods
Authors:
Puyu Wang,
Yunwen Lei,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learni…
▽ More
Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Differentially Private Stochastic Gradient Descent with Low-Noise
Authors:
Puyu Wang,
Yunwen Lei,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and ut…
▽ More
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and utility (measured by excess risk bounds) performances of differentially private stochastic gradient descent (SGD) algorithms in the setting of stochastic convex optimization. Specifically, we examine the pointwise problem in the low-noise setting for which we derive sharper excess risk bounds for the differentially private SGD algorithm. In the pairwise learning setting, we propose a simple differentially private SGD algorithm based on gradient perturbation. Furthermore, we develop novel utility bounds for the proposed algorithm, proving that it achieves optimal excess risk rates even for non-smooth losses. Notably, we establish fast learning rates for privacy-preserving pairwise learning under the low-noise condition, which is the first of its kind.
△ Less
Submitted 14 July, 2023; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Minimax AUC Fairness: Efficient Algorithm with Provable Convergence
Authors:
Zhenhuan Yang,
Yan Lok Ko,
Kush R. Varshney,
Yiming Ying
Abstract:
The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metri…
▽ More
The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.
△ Less
Submitted 28 November, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Data-driven extraction of the substructure of quark and gluon jets in proton-proton and heavy-ion collisions
Authors:
Yueyang Ying,
Jasmine Brewer,
Yi Chen,
Yen-Jie Lee
Abstract:
The different modifications of quark- and gluon-initiated jets in the quark-gluon plasma (QGP) produced in heavy-ion collisions is a long-standing question that has not yet received a definitive answer from experiments. In particular, the relative sizes of the modification of quark and gluon jets differ between theoretical models. Therefore, a fully data-driven technique is crucial for an unbiased…
▽ More
The different modifications of quark- and gluon-initiated jets in the quark-gluon plasma (QGP) produced in heavy-ion collisions is a long-standing question that has not yet received a definitive answer from experiments. In particular, the relative sizes of the modification of quark and gluon jets differ between theoretical models. Therefore, a fully data-driven technique is crucial for an unbiased extraction of the quark and gluon jet spectra and substructure. We perform a proof-of-concept study based on proton-proton and heavy-ion collision events from the \textsc{Pyquen} generator with statistics accessible in Run 4 of the Large Hadron Collider. We use a statistical technique called topic modeling to separate quark and gluon contributions to jet observables. We demonstrate that jet substructure observables, such as the jet shape and jet fragmentation function, can be extracted using this data-driven method. These values can then be used to obtain the modification of quark and gluon jet substructures in the QGP. We also perform the topic separation on smeared input data to demonstrate that the approach is robust to fluctuations arising from a QGP background. These results suggest the potential for an experimental determination of quark and gluon jet spectra and their substructure.
△ Less
Submitted 31 January, 2023; v1 submitted 1 April, 2022;
originally announced April 2022.
-
AUC Maximization in the Era of Big Data and AI: A Survey
Authors:
Tianbao Yang,
Yiming Ying
Abstract:
Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, s…
▽ More
Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
△ Less
Submitted 3 August, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Differentially Private SGDA for Minimax Problems
Authors:
Zhenhuan Yang,
Shu Hu,
Yunwen Lei,
Kush R. Varshney,
Siwei Lyu,
Yiming Ying
Abstract:
Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish th…
▽ More
Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.
△ Less
Submitted 29 July, 2022; v1 submitted 22 January, 2022;
originally announced January 2022.
-
Magnetoelectricity in two-dimensional materials
Authors:
Yìlè Yīng,
Ulrich Zülicke
Abstract:
Since the initial isolation of few-layer graphene, a plethora of two-dimensional atomic crystals has become available, covering almost all known materials types including metals, semiconductors, superconductors, ferro- and antiferromagnets. These advances have augmented the already existing variety of two-dimensional materials that are routinely realized by quantum confinement in bulk-semiconducto…
▽ More
Since the initial isolation of few-layer graphene, a plethora of two-dimensional atomic crystals has become available, covering almost all known materials types including metals, semiconductors, superconductors, ferro- and antiferromagnets. These advances have augmented the already existing variety of two-dimensional materials that are routinely realized by quantum confinement in bulk-semiconductor heterostructures. This review focuses on the type of material for which two-dimensional realizations are still being actively sought: magnetoelectrics. We present an overview of current theoretical expectation and experimental progress towards fabricating low-dimensional versions of such materials that can be magnetized by electric charges and polarized electrically by an applied magnetic field - unusual electromagnetic properties that could be the basis for various useful applications. The interplay between spatial confinement and magnetoelectricity is illustrated using the paradigmatic example of magnetic-monopole fields generated by electric charges in or near magnetoelectric media. For the purpose of this discussion, the image-charge method familiar from electrostatics is extended to solve the boundary-value problem for a magnetoelectric medium in the finite-width slab geometry using image dyons, i.e., point objects having both electric and magnetic charges. We discuss salient features of the magnetoelectrically induced fields arising in the thin-width limit.
△ Less
Submitted 29 June, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity
Authors:
Dixian Zhu,
Yiming Ying,
Tianbao Yang
Abstract:
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unif…
▽ More
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unified framework to explain the classical cross-entropy (CE) loss and SVM loss and their variants, (ii) it includes a special family corresponding to the temperature-scaled CE loss, which is widely adopted but poorly understood; (iii) it allows us to achieve adaptivity to the uncertainty degree of label information at an instance level. Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. The code is open-sourced at \url{https://github.com/Optimization-AI/ICML2023_LDR}.
△ Less
Submitted 28 June, 2023; v1 submitted 29 December, 2021;
originally announced December 2021.
-
PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction
Authors:
Qingyu Wang,
Baojian Ma,
Wei Liu,
Mingzhao Lou,
Mingchuan Zhou,
Huanyu Jiang,
Yibin Ying
Abstract:
Stereo matching is an important task in computer vision which has drawn tremendous research attention for decades. While in terms of disparity accuracy, density and data size, public stereo datasets are difficult to meet the requirements of models. In this paper, we aim to address the issue between datasets and models and propose a large scale stereo dataset with high accuracy disparity ground tru…
▽ More
Stereo matching is an important task in computer vision which has drawn tremendous research attention for decades. While in terms of disparity accuracy, density and data size, public stereo datasets are difficult to meet the requirements of models. In this paper, we aim to address the issue between datasets and models and propose a large scale stereo dataset with high accuracy disparity ground truth named PlantStereo. We used a semi-automatic way to construct the dataset: after camera calibration and image registration, high accuracy disparity images can be obtained from the depth images. In total, PlantStereo contains 812 image pairs covering a diverse set of plants: spinach, tomato, pepper and pumpkin. We firstly evaluated our PlantStereo dataset on four different stereo matching methods. Extensive experiments on different models and plants show that compared with ground truth in integer accuracy, high accuracy disparity images provided by PlantStereo can remarkably improve the training effect of deep learning models. This paper provided a feasible and reliable method to realize plant surface dense reconstruction. The PlantStereo dataset and relative code are available at: https://www.github.com/wangqingyu985/PlantStereo
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
Authors:
Zhenhuan Yang,
Yunwen Lei,
Puyu Wang,
Tianbao Yang,
Yiming Ying
Abstract:
Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances…
▽ More
Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue. In this paper, we propose simple stochastic and online gradient descent methods for pairwise learning. A notable difference from the existing studies is that we only pair the current instance with the previous one in building a gradient direction, which is efficient in both the storage and computational complexity. We develop novel stability results, optimization, and generalization error bounds for both convex and nonconvex as well as both smooth and nonsmooth problems. We introduce novel techniques to decouple the dependency of models and the previous instance in both the optimization and generalization analysis. Our study resolves an open question on developing meaningful generalization bounds for OGD using a buffering set with a very small fixed size. We also extend our algorithms and stability analysis to develop differentially private SGD algorithms for pairwise learning which significantly improves the existing results.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.