-
Fast Luminous Extragalactic Transients in the VLA Sky Survey: Implications for the rates of Accretion-Induced Collapse Events, Fast Blue Optical Transients and Gamma Ray Burst Afterglows
Authors:
Kritti Sharma,
Vikram Ravi,
Dillon Z. Dong,
Gregg Hallinan,
Casey Law,
Delina Levine,
Jean J. Somalwar,
Jessie Miller,
Nikita Kosogorov,
Steven T. Myers
Abstract:
Radio wavelengths offer a unique window into high-energy astrophysical phenomena that may be obscured or too rapidly evolving to be captured at other wavelengths. Leveraging data from the Very Large Array Sky Survey, we perform a systematic search for fast, luminous transients with characteristic timescales $\lesssim 3$ years in the nearby universe ($z \leq 0.3$). We report the discovery of five s…
▽ More
Radio wavelengths offer a unique window into high-energy astrophysical phenomena that may be obscured or too rapidly evolving to be captured at other wavelengths. Leveraging data from the Very Large Array Sky Survey, we perform a systematic search for fast, luminous transients with characteristic timescales $\lesssim 3$ years in the nearby universe ($z \leq 0.3$). We report the discovery of five such transients, and classify them based on their synchrotron emission energetics and host galaxy properties. From this sample, we derive observational constraints on the volumetric rates of certain corresponding transient classes. We limit the rates of accretion-induced collapse of white dwarfs with dense circumstellar medium interaction (and those producing pulsar wind nebulae) at $\lesssim 1.10_{-0.90}^{+2.60}$% ($\lesssim 0.20_{-0.10}^{+5.80}$%) of the local Type Ia supernova rate, respectively, broadly consistent with theoretical predictions. For AT2018cow-like radio-bright luminous fast blue optical transients, we estimate a rare occurrence rate of $\lesssim 0.02_{-0.01}^{+0.32}$% of the local core-collapse supernova rate. We constrain the local volumetric rates of long- and short-duration gamma-ray bursts (GRBs) to be $\lesssim 11.46_{-9.48}^{+26.28}$~Gpc$^{-3}$~yr$^{-1}$ and $\lesssim 80.88_{-66.90}^{+185.87}$~Gpc$^{-3}$~yr$^{-1}$, respectively. These estimates incorporate beaming corrections, with median detectable viewing angles derived from afterglow simulations of $\sim 0.4$ and $\sim 0.3$ radians for long- and short-duration GRBs. Our findings highlight the potential of radio surveys to uncover rare, energetic transients. We emphasize the critical role of coordinated multi-wavelength follow-up in fully characterizing these enigmatic events.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Even-degeneracy of a random graph
Authors:
Ting-Wei Chao,
Dingding Dong,
Zixuan Xu
Abstract:
A graph is even-degenerate if one can iteratively remove a vertex of even degree at each step until at most one edge remains. Recently, Janzer and Yip showed that the Erdős--Renyi random graph $G(n,1/2)$ is even-degenerate with high probability, and asked whether an analogous result holds for any general $G(n,p)$. In this paper, we answer this question for any constant $p\in (0,1)$ in affirmation…
▽ More
A graph is even-degenerate if one can iteratively remove a vertex of even degree at each step until at most one edge remains. Recently, Janzer and Yip showed that the Erdős--Renyi random graph $G(n,1/2)$ is even-degenerate with high probability, and asked whether an analogous result holds for any general $G(n,p)$. In this paper, we answer this question for any constant $p\in (0,1)$ in affirmation by proving that $G(n,p)$ is even-degenerate with high probability.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
Authors:
Dota Tianai Dong,
Yifan Luo,
Po-Ya Angela Wang,
Asli Ozyurek,
Paula Rubio-Fernandez
Abstract:
Multimodal language models (MLMs) increasingly communicate in human-like ways, yet their ability to use reference words remains largely overlooked despite their ubiquity in everyday communication. Our study addresses this gap by comparing human and MLM use of three word classes with increasing cognitive demands: vocabulary words, possessive pronouns (`mine' vs `yours'), and demonstrative pronouns…
▽ More
Multimodal language models (MLMs) increasingly communicate in human-like ways, yet their ability to use reference words remains largely overlooked despite their ubiquity in everyday communication. Our study addresses this gap by comparing human and MLM use of three word classes with increasing cognitive demands: vocabulary words, possessive pronouns (`mine' vs `yours'), and demonstrative pronouns (`this one' vs `that one'). Evaluating seven state-of-the-art MLMs against human participants, we observe a clear difficulty hierarchy: while MLMs approach human-level performance on the vocabulary task, they show substantial deficits with possessives and demonstratives. Our analysis reveals these difficulties stem from limitations in perspective-taking and spatial reasoning. Although prompt engineering improved model performance on possessive use, demonstrative use remained well below human-level competence. These findings provide theoretical and empirical evidence that producing grammatical forms requiring pragmatics and social cognition remains a clear challenge in current NLP systems.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
Finite-time stabilization of ladder multi-level quantum systems
Authors:
Zeping Su,
Sen Kuang,
Daoyi Dong
Abstract:
In this paper, a novel continuous non-smooth control strategy is proposed to achieve finite-time stabilization of ladder quantum systems. We first design a universal fractional-order control law for a ladder n-level quantum system using a distance-based Lyapunov function, and then apply the Filippov solution in the sense of differential inclusions and the LaSalle's invariance principle to prove th…
▽ More
In this paper, a novel continuous non-smooth control strategy is proposed to achieve finite-time stabilization of ladder quantum systems. We first design a universal fractional-order control law for a ladder n-level quantum system using a distance-based Lyapunov function, and then apply the Filippov solution in the sense of differential inclusions and the LaSalle's invariance principle to prove the existence and uniqueness of the solution of the ladder system under the continuous non-smooth control law. Both asymptotic stability and finite-time stability for the ladder system is rigorously established by applying Lyapunov stability theory and finite-time stability criteria. We also derive an upper bound of the time required for convergence to an eigenstate of the intrinsic Hamiltonian. Numerical simulations on a rubidium ladder three-level atomic system validate the effectiveness of the proposed method.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots
Authors:
Huiqiao Fu,
Haoyu Dong,
Wentao Xu,
Zhehao Zhou,
Guizhou Deng,
Kaiqiang Tang,
Daoyi Dong,
Chunlin Chen
Abstract:
Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller…
▽ More
Achieving animal-like agility is a longstanding goal in quadrupedal robotics. While recent studies have successfully demonstrated imitation of specific behaviors, enabling robots to replicate a broader range of natural behaviors in real-world environments remains an open challenge. Here we propose an integrated controller comprising a Basic Behavior Controller (BBC) and a Task-Specific Controller (TSC) which can effectively learn diverse natural quadrupedal behaviors in an enhanced simulator and efficiently transfer them to the real world. Specifically, the BBC is trained using a novel semi-supervised generative adversarial imitation learning algorithm to extract diverse behavioral styles from raw motion capture data of real dogs, enabling smooth behavior transitions by adjusting discrete and continuous latent variable inputs. The TSC, trained via privileged learning with depth images as input, coordinates the BBC to efficiently perform various tasks. Additionally, we employ evolutionary adversarial simulator identification to optimize the simulator, aligning it closely with reality. After training, the robot exhibits diverse natural behaviors, successfully completing the quadrupedal agility challenge at an average speed of 1.1 m/s and achieving a peak speed of 3.2 m/s during hurdling. This work represents a substantial step toward animal-like agility in quadrupedal robots, opening avenues for their deployment in increasingly complex real-world environments.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
Authors:
Enhao Huang,
Pengyu Sun,
Zixin Lin,
Alex Chen,
Joey Ouyang,
Hobert Wang,
Dong Dong,
Gang Zhao,
James Yi,
Frank Li,
Ziang Ling,
Lowes Yang
Abstract:
Large Language Models (LLMs) have achieved impressive performance in diverse natural language processing tasks, but specialized domains such as Web3 present new challenges and require more tailored evaluation. Despite the significant user base and capital flows in Web3, encompassing smart contracts, decentralized finance (DeFi), non-fungible tokens (NFTs), decentralized autonomous organizations (D…
▽ More
Large Language Models (LLMs) have achieved impressive performance in diverse natural language processing tasks, but specialized domains such as Web3 present new challenges and require more tailored evaluation. Despite the significant user base and capital flows in Web3, encompassing smart contracts, decentralized finance (DeFi), non-fungible tokens (NFTs), decentralized autonomous organizations (DAOs), on-chain governance, and novel token-economics, no comprehensive benchmark has systematically assessed LLM performance in this domain. To address this gap, we introduce the DMind Benchmark, a holistic Web3-oriented evaluation suite covering nine critical subfields: fundamental blockchain concepts, blockchain infrastructure, smart contract, DeFi mechanisms, DAOs, NFTs, token economics, meme concept, and security vulnerabilities. Beyond multiple-choice questions, DMind Benchmark features domain-specific tasks such as contract debugging and on-chain numeric reasoning, mirroring real-world scenarios. We evaluated 26 models, including ChatGPT, Claude, DeepSeek, Gemini, Grok, and Qwen, uncovering notable performance gaps in specialized areas like token economics and security-critical contract analysis. While some models excel in blockchain infrastructure tasks, advanced subfields remain challenging. Our benchmark dataset and evaluation pipeline are open-sourced on https://huggingface.co/datasets/DMindAI/DMind_Benchmark, reaching number one in Hugging Face's trending dataset charts within a week of release.
△ Less
Submitted 16 May, 2025; v1 submitted 18 April, 2025;
originally announced April 2025.
-
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Authors:
Shilin Zhang,
Zican Hu,
Wenhao Wu,
Xinyi Xie,
Jianxiang Tang,
Chunlin Chen,
Daoyi Dong,
Yu Cheng,
Zhenhong Sun,
Zhi Wang
Abstract:
Offline meta-RL usually tackles generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader so…
▽ More
Offline meta-RL usually tackles generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source of supervision. In the paper, we propose \textbf{T}ext-to-\textbf{D}ecision \textbf{A}gent (\textbf{T2DA}), a simple and scalable framework that supervises offline meta-RL with natural language. We first introduce a generalized world model to encode multi-task decision data into a dynamics-aware embedding space. Then, inspired by CLIP, we predict which textual description goes with which decision embedding, effectively bridging their semantic gap via contrastive language-decision pre-training and aligning the text embeddings to comprehend the environment dynamics. After training the text-conditioned generalist policy, the agent can directly realize zero-shot text-to-decision generation in response to language instructions. Comprehensive experiments on MuJoCo and Meta-World benchmarks show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines. Our code is available at https://github.com/NJU-RL/T2DA.
△ Less
Submitted 5 June, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers
Authors:
Chunyang Zhang,
Zhenhong Sun,
Zhicheng Zhang,
Junyan Wang,
Yu Zhang,
Dong Gong,
Huadong Mo,
Daoyi Dong
Abstract:
Text-to-image (T2I) generation models often struggle with multi-instance synthesis (MIS), where they must accurately depict multiple distinct instances in a single image based on complex prompts detailing individual features. Traditional MIS control methods for UNet architectures like SD v1.5/SDXL fail to adapt to DiT-based models like FLUX and SD v3.5, which rely on integrated attention between i…
▽ More
Text-to-image (T2I) generation models often struggle with multi-instance synthesis (MIS), where they must accurately depict multiple distinct instances in a single image based on complex prompts detailing individual features. Traditional MIS control methods for UNet architectures like SD v1.5/SDXL fail to adapt to DiT-based models like FLUX and SD v3.5, which rely on integrated attention between image and text tokens rather than text-image cross-attention. To enhance MIS in DiT, we first analyze the mixed attention mechanism in DiT. Our token-wise and layer-wise analysis of attention maps reveals a hierarchical response structure: instance tokens dominate early layers, background tokens in middle layers, and attribute tokens in later layers. Building on this observation, we propose a training-free approach for enhancing MIS in DiT-based models with hierarchical and step-layer-wise attention specialty tuning (AST). AST amplifies key regions while suppressing irrelevant areas in distinct attention maps across layers and steps, guided by the hierarchical structure. This optimizes multimodal interactions by hierarchically decoupling the complex prompts with instance-based sketches. We evaluate our approach using upgraded sketch-based layouts for the T2I-CompBench and customized complex scenes. Both quantitative and qualitative results confirm our method enhances complex layout generation, ensuring precise instance placement and attribute representation in MIS.
△ Less
Submitted 20 April, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
An improved quantum algorithm for linear autonomous differential equations via Padé approximation
Authors:
Dekuan Dong,
Yingzhou Li,
Jungong Xue
Abstract:
We propose a novel quantum algorithm for solving linear autonomous ordinary differential equations (ODEs) using the Padé approximation. For linear autonomous ODEs, the discretized solution can be represented by a product of matrix exponentials. The proposed algorithm approximates the matrix exponential by the diagonal Padé approximation, which is then encoded into a large, block-sparse linear syst…
▽ More
We propose a novel quantum algorithm for solving linear autonomous ordinary differential equations (ODEs) using the Padé approximation. For linear autonomous ODEs, the discretized solution can be represented by a product of matrix exponentials. The proposed algorithm approximates the matrix exponential by the diagonal Padé approximation, which is then encoded into a large, block-sparse linear system and solved via quantum linear system algorithms (QLSA). The detailed quantum circuit is given based on quantum oracle access to the matrix, the inhomogeneous term, and the initial state. The complexity of the proposed algorithm is analyzed. Compared to the method based on Taylor approximation, which approximates the matrix exponential using a $k$-th order Taylor series, the proposed algorithm improves the approximation order $k$ from two perspectives: 1) the explicit complexity dependency on $k$ is improved, and 2) a smaller $k$ suffices for the same precision. Numerical experiments demonstrate the advantages of the proposed algorithm comparing to other related algorithms.
△ Less
Submitted 21 April, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
In vivo mapping organellar metabolism by optical-boxcar enhanced fluorescence-detected mid-infrared photothermal microscopy
Authors:
Jianpeng Ao,
Jiaze Yin,
Haonan Lin,
Guangrui Ding,
Youchen Guan,
Bethany Weinberg,
Dashan Dong,
Qing Xia,
Zhongyue Guo,
Marzia Savini,
Biwen Gao,
Ji-Xin Cheng,
Meng C. Wang
Abstract:
Metabolism unfolds within specific organelles in eukaryotic cells. Lysosomes are highly metabolically active organelles, and their metabolic states dynamically influence signal transduction, cellular homeostasis, and organismal physiopathology. Despite the significance of lysosomal metabolism, a method for its in vivo measurement is currently lacking. Here, we report optical boxcar-enhanced, fluor…
▽ More
Metabolism unfolds within specific organelles in eukaryotic cells. Lysosomes are highly metabolically active organelles, and their metabolic states dynamically influence signal transduction, cellular homeostasis, and organismal physiopathology. Despite the significance of lysosomal metabolism, a method for its in vivo measurement is currently lacking. Here, we report optical boxcar-enhanced, fluorescence-detected mid-infrared photothermal microscopy, together with AI-assisted data denoising and spectral deconvolution, to map metabolic activity and composition of individual lysosomes in living cells and organisms. Using this method, we uncovered lipolysis and proteolysis heterogeneity across lysosomes within the same cell, as well as early-onset lysosomal dysfunction during organismal aging. Additionally, we discovered organelle-level metabolic changes associated with diverse lysosomal storage diseases. This method holds the broad potential to profile metabolic fingerprints of individual organelles within their native context and quantitatively assess their dynamic changes under different physiological and pathological conditions, providing a high-resolution chemical cellular atlas.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
A review on modelling, evaluation, and optimization of cyber-physical system reliability
Authors:
Moslem Uddin,
Huadong Mo,
Daoyi Dong
Abstract:
The aim of this study is to present an overview of current research on modelling, evaluation, and optimization methods for improving the reliability of Cyber-Physical System (CPS). Three major modelling approaches, namely analytical, simulation, and hybrid models, are discussed. Various evaluation techniques, including fault tree analysis, Markov models, and availability measures, are reviewed and…
▽ More
The aim of this study is to present an overview of current research on modelling, evaluation, and optimization methods for improving the reliability of Cyber-Physical System (CPS). Three major modelling approaches, namely analytical, simulation, and hybrid models, are discussed. Various evaluation techniques, including fault tree analysis, Markov models, and availability measures, are reviewed and compared. Optimization strategies for CPS reliability, including fault tolerance, dynamic reconfiguration, and resource allocation, are also reviewed and briefly discussed. Besides, emerging trends and research opportunities in this field are highlighted and explained. Finally, the possible challenges are outlined and then future research are directed for CPS. This study can provide a systematic and in-dept introduction to CPS for researchers, practitioners, and policymakers.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Integrated Energy Management for Operational Cost Optimization in Community Microgrids
Authors:
Moslem Uddin,
Huadong Mo,
Daoyi Dong
Abstract:
This study presents an integrated energy management strategy for cost optimization in multi-energy community microgrids (MGs). The proposed approach combines storage-based peak shaving, economic dispatch of diesel generators, and efficient utilization of renewable energy sources to enhance energy management in community MGs. The efficacy of the energy management system (EMS) was validated through…
▽ More
This study presents an integrated energy management strategy for cost optimization in multi-energy community microgrids (MGs). The proposed approach combines storage-based peak shaving, economic dispatch of diesel generators, and efficient utilization of renewable energy sources to enhance energy management in community MGs. The efficacy of the energy management system (EMS) was validated through a simulation case study for a rural Australian community. The results demonstrate that the proposed EMS effectively reduces the peak energy demand by up to 43%, lowers operational costs by 84.63% (from $189,939/year to $29,188/year), and achieves a renewable energy utilization of 92.3%, up from 47.8% in the base system. Furthermore, the levelized cost of energy was reduced by 14.21% to $0.163/kWh. The strategy ensures an uninterrupted power supply during grid outages by utilizing DGs and battery energy storage systems. The environmental benefits included a 196.4% reduction in CO2 emissions and 100% reductions in CO, unburned hydrocarbons, and particulate matter. These findings validate the feasibility of the proposed EMS in achieving cost-effective, reliable, and sustainable energy management in community MGs. These findings contribute to the field by introducing a novel approach and demonstrating the practical feasibility of multi-energy MGs.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Cost-Effective Design of Grid-tied Community Microgrid
Authors:
Moslem Uddin,
Huadong Mo,
Daoyi Dong
Abstract:
This study aims to develop a cost-effective microgrid design that optimally balances the economic feasibility, reliability, efficiency, and environmental impact in a grid-tied community microgrid. A multi-objective optimization framework is employed, integrating HOMER Pro for system sizing with deep reinforcement learning (DRL). Sensitivity analyses are conducted to evaluate the system performance…
▽ More
This study aims to develop a cost-effective microgrid design that optimally balances the economic feasibility, reliability, efficiency, and environmental impact in a grid-tied community microgrid. A multi-objective optimization framework is employed, integrating HOMER Pro for system sizing with deep reinforcement learning (DRL). Sensitivity analyses are conducted to evaluate the system performance under varying load demand and renewable energy fluctuations, while an economic sensitivity assessment examines the impact of electricity prices and capital costs on the Levelized Cost of Energy (LCOE). The proposed microgrid configuration achieves high reliability, satisfying 100% of the load, even under adverse weather conditions. The proposed framework attains an efficiency of 91.99% while maintaining a carbon footprint of 302,747 kg/year, which is approximately 95% lower than that of the grid system. The economic analysis indicates a net present cost (NPC) of $4.83M with a competitive LCOE of $0.208/kWh. In addition, the operation cost is $201,473 per year with a capital investment of $1.42M, rendering it a financially viable alternative to conventional grid-dependent systems.This work can be valuable in identifying effective solutions for supplying reliable and cost-effective power to regional and remote areas.
△ Less
Submitted 13 March, 2025; v1 submitted 10 March, 2025;
originally announced March 2025.
-
Ergodic Exploration over Meshable Surfaces
Authors:
Dayi Dong,
Albert Xu,
Geordan Gutow,
Howie Choset,
Ian Abraham
Abstract:
Robotic search and rescue, exploration, and inspection require trajectory planning across a variety of domains. A popular approach to trajectory planning for these types of missions is ergodic search, which biases a trajectory to spend time in parts of the exploration domain that are believed to contain more information. Most prior work on ergodic search has been limited to searching simple surfac…
▽ More
Robotic search and rescue, exploration, and inspection require trajectory planning across a variety of domains. A popular approach to trajectory planning for these types of missions is ergodic search, which biases a trajectory to spend time in parts of the exploration domain that are believed to contain more information. Most prior work on ergodic search has been limited to searching simple surfaces, like a 2D Euclidean plane or a sphere, as they rely on projecting functions defined on the exploration domain onto analytically obtained Fourier basis functions. In this paper, we extend ergodic search to any surface that can be approximated by a triangle mesh. The basis functions are approximated through finite element methods on a triangle mesh of the domain. We formally prove that this approximation converges to the continuous case as the mesh approximation converges to the true domain. We demonstrate that on domains where analytical basis functions are available (plane, sphere), the proposed method obtains equivalent results, and while on other domains (torus, bunny, wind turbine), the approach is versatile enough to still search effectively. Lastly, we also compare with an existing ergodic search technique that can handle complex domains and show that our method results in a higher quality exploration.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Machine Learning for Estimation and Control of Quantum Systems
Authors:
Hailan Ma,
Bo Qi,
Ian R. Petersen,
Re-Bing Wu,
Herschel Rabitz,
Daoyi Dong
Abstract:
The development of quantum technologies relies on creating and manipulating quantum systems of increasing complexity, with key applications in computation, simulation, and sensing. This poses severe challenges in efficient control, calibration, and validation of quantum states and their dynamics. Machine learning methods have emerged as powerful tools owing to their remarkable capability to learn…
▽ More
The development of quantum technologies relies on creating and manipulating quantum systems of increasing complexity, with key applications in computation, simulation, and sensing. This poses severe challenges in efficient control, calibration, and validation of quantum states and their dynamics. Machine learning methods have emerged as powerful tools owing to their remarkable capability to learn from data, and thus have been extensively utilized for different quantum tasks. This paper reviews several significant topics related to machine learning-aided quantum estimation and control. In particular, we discuss neural networks-based learning for quantum state estimation, gradient-based learning for optimal control of quantum systems, evolutionary computation for learning control of quantum systems, machine learning for quantum robust control, and reinforcement learning for quantum control. This review provides a brief background of key concepts recurring across many of these approaches with special emphasis on neural networks, evolutionary computation, and reinforcement learning.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Authors:
Yijie Tang,
Jiazhao Zhang,
Yuqing Lan,
Yulan Guo,
Dezun Dong,
Chenyang Zhu,
Kai Xu
Abstract:
Online zero-shot 3D instance segmentation of a progressively reconstructed scene is both a critical and challenging task for embodied applications. With the success of visual foundation models (VFMs) in the image domain, leveraging 2D priors to address 3D online segmentation has become a prominent research focus. Since segmentation results provided by 2D priors often require spatial consistency to…
▽ More
Online zero-shot 3D instance segmentation of a progressively reconstructed scene is both a critical and challenging task for embodied applications. With the success of visual foundation models (VFMs) in the image domain, leveraging 2D priors to address 3D online segmentation has become a prominent research focus. Since segmentation results provided by 2D priors often require spatial consistency to be lifted into final 3D segmentation, an efficient method for identifying spatial overlap among 2D masks is essential - yet existing methods rarely achieve this in real time, mainly limiting its use to offline approaches. To address this, we propose an efficient method that lifts 2D masks generated by VFMs into a unified 3D instance using a hashing technique. By employing voxel hashing for efficient 3D scene querying, our approach reduces the time complexity of costly spatial overlap queries from $O(n^2)$ to $O(n)$. Accurate spatial associations further enable 3D merging of 2D masks through simple similarity-based filtering in a zero-shot manner, making our approach more robust to incomplete and noisy data. Evaluated on the ScanNet and SceneNN benchmarks, our approach achieves state-of-the-art performance in online, zero-shot 3D instance segmentation with leading efficiency.
△ Less
Submitted 30 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Physics-Aware Inverse Design for Nanowire Single-Photon Avalanche Detectors via Deep Learning
Authors:
Boyang Zhang,
Zhe Li,
Zhongju Wang,
Yang Yu,
Hark Hoe Tan,
Chennupati Jagadish,
Daoyi Dong,
Lan Fu
Abstract:
Single-photon avalanche detectors (SPADs) have enabled various applications in emerging photonic quantum information technologies in recent years. However, despite many efforts to improve SPAD's performance, the design of SPADs remained largely an iterative and time-consuming process where a designer makes educated guesses of a device structure based on empirical reasoning and solves the semicondu…
▽ More
Single-photon avalanche detectors (SPADs) have enabled various applications in emerging photonic quantum information technologies in recent years. However, despite many efforts to improve SPAD's performance, the design of SPADs remained largely an iterative and time-consuming process where a designer makes educated guesses of a device structure based on empirical reasoning and solves the semiconductor drift-diffusion model for it. In contrast, the inverse problem, i.e., directly inferring a structure needed to achieve desired performance, which is of ultimate interest to designers, remains an unsolved problem. We propose a novel physics-aware inverse design workflow for SPADs using a deep learning model and demonstrate it with an example of finding the key parameters of semiconductor nanowires constituting the unit cell of an SPAD, given target photon detection efficiency. Our inverse design workflow is not restricted to the case demonstrated and can be applied to design conventional planar structure-based SPADs, photodetectors, and solar cells.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
A Computational Framework for Simulations of Dissipative Non-Adiabatic Dynamics on Hybrid Oscillator-Qubit Quantum Devices
Authors:
Nam P. Vu,
Daniel Dong,
Xiaohan Dan,
Ningyi Lyu,
Victor Batista,
Yuan Liu
Abstract:
We introduce a computational framework for simulating non-adiabatic vibronic dynamics on circuit quantum electrodynamics (cQED) platforms. Our approach leverages hybrid oscillator-qubit quantum hardware with mid-circuit measurements and resets, enabling the incorporation of environmental effects such as dissipation and dephasing. To demonstrate its capabilities, we simulate energy transfer dynamic…
▽ More
We introduce a computational framework for simulating non-adiabatic vibronic dynamics on circuit quantum electrodynamics (cQED) platforms. Our approach leverages hybrid oscillator-qubit quantum hardware with mid-circuit measurements and resets, enabling the incorporation of environmental effects such as dissipation and dephasing. To demonstrate its capabilities, we simulate energy transfer dynamics in a triad model of photosynthetic chromophores inspired by natural antenna systems. We specifically investigate the role of dissipation during the relaxation dynamics following photoexcitation, where electronic transitions are coupled to the evolution of quantum vibrational modes. Our results indicate that hybrid oscillator-qubit devices, operating with noise levels below the intrinsic dissipation rates of typical molecular antenna systems, can achieve the simulation fidelity required for practical computations on near-term and early fault-tolerant quantum computing platforms.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment
Authors:
Kartik Nagpal,
Dayi Dong,
Jean-Baptiste Bouvier,
Negar Mehr
Abstract:
Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contr…
▽ More
Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show Large Language Models demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Simultaneous estimations of quantum state and detector through multiple quantum processes
Authors:
Shuixin Xiao,
Weichao Liang,
Yuanlong Wang,
Daoyi Dong,
Ian R. Petersen,
Valery Ugrinovskii
Abstract:
The estimation of all the parameters in an unknown quantum state or measurement device, commonly known as quantum state tomography (QST) and quantum detector tomography (QDT), is crucial for comprehensively characterizing and controlling quantum systems. In this paper, we introduce a framework, in two different bases, that utilizes multiple quantum processes to simultaneously identify a quantum st…
▽ More
The estimation of all the parameters in an unknown quantum state or measurement device, commonly known as quantum state tomography (QST) and quantum detector tomography (QDT), is crucial for comprehensively characterizing and controlling quantum systems. In this paper, we introduce a framework, in two different bases, that utilizes multiple quantum processes to simultaneously identify a quantum state and a detector. We develop a closed-form algorithm for this purpose and prove that the mean squared error (MSE) scales as $O(1/N) $ for both QST and QDT, where $N $ denotes the total number of state copies. This scaling aligns with established patterns observed in previous works that addressed QST and QDT as independent tasks. Furthermore, we formulate the problem as a sum of squares (SOS) optimization problem with semialgebraic constraints, where the physical constraints of the state and detector are characterized by polynomial equalities and inequalities. The effectiveness of our proposed methods is validated through numerical examples.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Precise Quantum Control of Molecular Rotation Toward a Desired Orientation
Authors:
Qian-Qian Hong,
Daoyi Dong,
Niels E. Henriksen,
Franco Nori,
Jun He,
Chuan-Cun Shu
Abstract:
The lack of a direct map between control fields and desired control objectives poses a significant challenge in applying quantum control theory to quantum technologies. Here, we propose an analytical framework to precisely control a limited set of quantum states and construct desired coherent superpositions using a well-designed laser pulse sequence with optimal amplitudes, phases, and delays. Thi…
▽ More
The lack of a direct map between control fields and desired control objectives poses a significant challenge in applying quantum control theory to quantum technologies. Here, we propose an analytical framework to precisely control a limited set of quantum states and construct desired coherent superpositions using a well-designed laser pulse sequence with optimal amplitudes, phases, and delays. This theoretical framework that corresponds to a multi-level pulse-area theorem establishes a straightforward mapping between the control parameters of the pulse sequence and the amplitudes and phases of rotational states within a specific subspace. As an example, we utilize this approach to generate 15 distinct and desired rotational superpositions of ultracold polar molecules, leading to 15 desired field-free molecular orientations. By optimizing the superposition of the lowest 16 rotational states, we demonstrate that this approach can achieve a maximum orientation value of $|\langle\cosθ\rangle|_{\rm{max}}$ above 0.99, which is very close to the global optimal value of 1 that could be achieved in an infinite-dimensional state space. This work marks a significant advancement in achieving precise control over multi-level subsystems within molecules. It holds potential applications in molecular alignment and orientation, as well as in various interdisciplinary fields related to the precise quantum control of ultracold polar molecules, opening up considerable opportunities in molecular-based quantum techniques.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Learning-Based Design of LQG Controllers in Quantum Coherent Feedback
Authors:
Chunxiang Song,
Yanan Liu,
Guofeng Zhang,
Huadong Mo,
Daoyi Dong
Abstract:
In this paper, we propose a differential evolution (DE) algorithm specifically tailored for the design of Linear-Quadratic-Gaussian (LQG) controllers in quantum systems. Building upon the foundational DE framework, the algorithm incorporates specialized modules, including relaxed feasibility rules, a scheduled penalty function, adaptive search range adjustment, and the ``bet-and-run'' initializati…
▽ More
In this paper, we propose a differential evolution (DE) algorithm specifically tailored for the design of Linear-Quadratic-Gaussian (LQG) controllers in quantum systems. Building upon the foundational DE framework, the algorithm incorporates specialized modules, including relaxed feasibility rules, a scheduled penalty function, adaptive search range adjustment, and the ``bet-and-run'' initialization strategy. These enhancements improve the algorithm's exploration and exploitation capabilities while addressing the unique physical realizability requirements of quantum systems. The proposed method is applied to a quantum optical system, where three distinct controllers with varying configurations relative to the plant are designed. The resulting controllers demonstrate superior performance, achieving lower LQG performance indices compared to existing approaches. Additionally, the algorithm ensures that the designs comply with physical realizability constraints, guaranteeing compatibility with practical quantum platforms. The proposed approach holds significant potential for application to other linear quantum systems in performance optimization tasks subject to physically feasible constraints.
△ Less
Submitted 23 February, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning
Authors:
Ruotong Geng,
Mingyang Geng,
Shangwen Wang,
Haotian Wang,
Zhipeng Lin,
Dezun Dong
Abstract:
Large Language Models for Code (LLMs4Code) excel at code generation tasks, yielding promise to release developers from huge software development burdens. Nonetheless, these models have been shown to suffer from the significant privacy risks due to the potential leakage of sensitive information embedded during training, known as the memorization problem. Addressing this issue is crucial for ensurin…
▽ More
Large Language Models for Code (LLMs4Code) excel at code generation tasks, yielding promise to release developers from huge software development burdens. Nonetheless, these models have been shown to suffer from the significant privacy risks due to the potential leakage of sensitive information embedded during training, known as the memorization problem. Addressing this issue is crucial for ensuring privacy compliance and upholding user trust, but till now there is a dearth of dedicated studies in the literature that focus on this specific direction. Recently, machine unlearning has emerged as a promising solution by enabling models to "forget" sensitive information without full retraining, offering an efficient and scalable approach compared to traditional data cleaning methods. In this paper, we empirically evaluate the effectiveness of unlearning techniques for addressing privacy concerns in LLMs4Code.Specifically, we investigate three state-of-the-art unlearning algorithms and three well-known open-sourced LLMs4Code, on a benchmark that takes into consideration both the privacy data to be forgotten as well as the code generation capabilites of these models. Results show that it is feasible to mitigate the privacy concerns of LLMs4Code through machine unlearning while maintain their code generation capabilities at the same time. We also dissect the forms of privacy protection/leakage after unlearning and observe that there is a shift from direct leakage to indirect leakage, which underscores the need for future studies addressing this risk.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion
Authors:
Shengyuan Liu,
Zhen Chen,
Qiushi Yang,
Weihao Yu,
Di Dong,
Jiancong Hu,
Yixuan Yuan
Abstract:
Automated diagnostic systems (ADS) have shown significant potential in the early detection of polyps during endoscopic examinations, thereby reducing the incidence of colorectal cancer. However, due to high annotation costs and strict privacy concerns, acquiring high-quality endoscopic images poses a considerable challenge in the development of ADS. Despite recent advancements in generating synthe…
▽ More
Automated diagnostic systems (ADS) have shown significant potential in the early detection of polyps during endoscopic examinations, thereby reducing the incidence of colorectal cancer. However, due to high annotation costs and strict privacy concerns, acquiring high-quality endoscopic images poses a considerable challenge in the development of ADS. Despite recent advancements in generating synthetic images for dataset expansion, existing endoscopic image generation algorithms failed to accurately generate the details of polyp boundary regions and typically required medical priors to specify plausible locations and shapes of polyps, which limited the realism and diversity of the generated images. To address these limitations, we present Polyp-Gen, the first full-automatic diffusion-based endoscopic image generation framework. Specifically, we devise a spatial-aware diffusion training scheme with a lesion-guided loss to enhance the structural context of polyp boundary regions. Moreover, to capture medical priors for the localization of potential polyp areas, we introduce a hierarchical retrieval-based sampling strategy to match similar fine-grained spatial features. In this way, our Polyp-Gen can generate realistic and diverse endoscopic images for building reliable ADS. Extensive experiments demonstrate the state-of-the-art generation quality, and the synthetic images can improve the downstream polyp detection task. Additionally, our Polyp-Gen has shown remarkable zero-shot generalizability on other datasets. The source code is available at https://github.com/CUHK-AIM-Group/Polyp-Gen.
△ Less
Submitted 29 January, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling
Authors:
Ziyi Ni,
Yifan Li,
Ning Yang,
Dou Shen,
Pin Lv,
Daxiang Dong
Abstract:
Solving complex reasoning tasks is a key real-world application of agents. Thanks to the pretraining of Large Language Models (LLMs) on code data, recent approaches like CodeAct successfully use code as LLM agents' action, achieving good results. However, CodeAct greedily generates the next action's code block by relying on fragmented thoughts, resulting in inconsistency and instability. Moreover,…
▽ More
Solving complex reasoning tasks is a key real-world application of agents. Thanks to the pretraining of Large Language Models (LLMs) on code data, recent approaches like CodeAct successfully use code as LLM agents' action, achieving good results. However, CodeAct greedily generates the next action's code block by relying on fragmented thoughts, resulting in inconsistency and instability. Moreover, CodeAct lacks action-related ground-truth (GT), making its supervision signals and termination conditions questionable in multi-turn interactions. To address these issues, we first introduce a simple yet effective end-to-end code generation paradigm, CodeProgram, which leverages code's systematic logic to align with global reasoning and enable cohesive problem-solving. Then, we propose Tree-of-Code (ToC), which self-grows CodeProgram nodes based on the executable nature of the code and enables self-supervision in a GT-free scenario. Experimental results on two datasets using ten popular zero-shot LLMs show ToC remarkably boosts accuracy by nearly 20% over CodeAct with less than 1/4 turns. Several LLMs even perform better on one-turn CodeProgram than on multi-turn CodeAct. To further investigate the trade-off between efficacy and efficiency, we test different ToC tree sizes and exploration mechanisms. We also highlight the potential of ToC's end-to-end data generation for supervised and reinforced fine-tuning.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution
Authors:
Ziyi Ni,
Yifan Li,
Daxiang Dong
Abstract:
The exceptional capabilities of large language models (LLMs) have substantially accelerated the rapid rise and widespread adoption of agents. Recent studies have demonstrated that generating Python code to consolidate LLM-based agents' actions into a unified action space (CodeAct) is a promising approach for developing real-world LLM agents. However, this step-by-step code generation approach ofte…
▽ More
The exceptional capabilities of large language models (LLMs) have substantially accelerated the rapid rise and widespread adoption of agents. Recent studies have demonstrated that generating Python code to consolidate LLM-based agents' actions into a unified action space (CodeAct) is a promising approach for developing real-world LLM agents. However, this step-by-step code generation approach often lacks consistency and robustness, leading to instability in agent applications, particularly for complex reasoning and out-of-domain tasks. In this paper, we propose a novel approach called Tree-of-Code (ToC) to tackle the challenges of complex problem planning and execution with an end-to-end mechanism. By integrating key ideas from both Tree-of-Thought and CodeAct, ToC combines their strengths to enhance solution exploration. In our framework, each final code execution result is treated as a node in the decision tree, with a breadth-first search strategy employed to explore potential solutions. The final outcome is determined through a voting mechanism based on the outputs of the nodes.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation
Authors:
Zhenhong Sun,
Yifu Wang,
Yonhon Ng,
Yunfei Duan,
Daoyi Dong,
Hongdong Li,
Pan Ji
Abstract:
Scene generation is crucial to many computer graphics applications. Recent advances in generative AI have streamlined sketch-to-image workflows, easing the workload for artists and designers in creating scene concept art. However, these methods often struggle for complex scenes with multiple detailed objects, sometimes missing small or uncommon instances. In this paper, we propose a Training-free…
▽ More
Scene generation is crucial to many computer graphics applications. Recent advances in generative AI have streamlined sketch-to-image workflows, easing the workload for artists and designers in creating scene concept art. However, these methods often struggle for complex scenes with multiple detailed objects, sometimes missing small or uncommon instances. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the existing ControlNet model, enabling effective handling of multi-instance generations, involving prompt balance, characteristics prominence, and dense tuning. Specifically, this approach enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances. It also includes a characteristics prominence module that highlights TopK indices in each channel, ensuring essential features are better represented based on token sketches. Additionally, it employs dense tuning to refine contour details in the attention map, compensating for instance-related regions. Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models. It consistently generates detailed, multi-instance 2D images, closely adhering to the input prompts and enhancing visual quality in complex multi-instance scenes. Code is available at https://github.com/chaos-sun/t3s2s.git.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Arbitrary Spectral Edge of Regular Graphs
Authors:
Dingding Dong,
Theo McKenzie
Abstract:
We prove that for each $d\geq 3$ and $k\geq 2$, the set of limit points of the first $k$ eigenvalues of sequences of $d$-regular graphs is
\[
\{(μ_1,\dots,μ_k): d=μ_1\geq \dots\geq μ_{k}\geq2\sqrt{d-1}\}.
\] The result for $k=2$ was obtained by Alon and Wei, and our result confirms a conjecture of theirs. Our proof uses an infinite random graph sampled from a distribution that generalizes th…
▽ More
We prove that for each $d\geq 3$ and $k\geq 2$, the set of limit points of the first $k$ eigenvalues of sequences of $d$-regular graphs is
\[
\{(μ_1,\dots,μ_k): d=μ_1\geq \dots\geq μ_{k}\geq2\sqrt{d-1}\}.
\] The result for $k=2$ was obtained by Alon and Wei, and our result confirms a conjecture of theirs. Our proof uses an infinite random graph sampled from a distribution that generalizes the random regular graph distribution. To control the spectral behavior of this infinite object, we show that Huang and Yau's proof of Friedman's theorem bounding the second eigenvalue of a random regular graph generalizes to this model. We also bound the trace of the non-backtracking operator, as was done in Bordenave's separate proof of Friedman's theorem.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Perturbed three-channel waveform synthesizer for efficient isolated attosecond pulse generation and characterization
Authors:
Dianhong Dong,
Hushan Wang,
Bing Xue,
Kotaro Imasaka,
Natuski Kanda,
Yuxi Fu,
Yasuo Nabekawa,
Eiji J. Takahashi
Abstract:
The generation of gigawatt-class isolated attosecond pulses (IAPs) is vital for attosecond pump-probe experiments. In such experiments, the temporal duration of IAPs must be determined quickly and accurately. In this study, we developed a perturbed three-channel waveform synthesizer for efficient IAPs generation and characterization at low repetition rates ( 10 Hz). Intense IAPs centered at photon…
▽ More
The generation of gigawatt-class isolated attosecond pulses (IAPs) is vital for attosecond pump-probe experiments. In such experiments, the temporal duration of IAPs must be determined quickly and accurately. In this study, we developed a perturbed three-channel waveform synthesizer for efficient IAPs generation and characterization at low repetition rates ( 10 Hz). Intense IAPs centered at photon energies of 60 eV (227 as duration) in Ar and 107 eV (128 as duration) in Ne were generated by the driving field from a three-channel waveform synthesizer and characterized using all-optical frequencyresolved optical gating (AO-FROG), which accelerated the measurement time to several minutes, providing fast feedback for the tunability of the IAP source. The peak power of the IAPs is higher than that reported in the literature.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Authors:
Xiaoye Qu,
Daize Dong,
Xuyang Hu,
Tong Zhu,
Weigao Sun,
Yu Cheng
Abstract:
Recently, inspired by the concept of sparsity, Mixture-of-Experts (MoE) models have gained increasing popularity for scaling model size while keeping the number of activated parameters constant. In this study, we thoroughly investigate the sparsity of the dense LLaMA model by constructing MoE for both the attention (i.e., Attention MoE) and MLP (i.e., MLP MoE) modules in the transformer blocks. Sp…
▽ More
Recently, inspired by the concept of sparsity, Mixture-of-Experts (MoE) models have gained increasing popularity for scaling model size while keeping the number of activated parameters constant. In this study, we thoroughly investigate the sparsity of the dense LLaMA model by constructing MoE for both the attention (i.e., Attention MoE) and MLP (i.e., MLP MoE) modules in the transformer blocks. Specifically, we investigate different expert construction methods and granularities under the same activation conditions to analyze the impact of sparsifying the model. Additionally, to comprehensively evaluate the model's capabilities across various domains (e.g., conversation, code, math) after sparsification, we apply sparsity to the instructed large language models (LLMs) and construct instructed MoE models. To counteract the performance degradation resulting from increased sparsity, we design a two-stage post-training strategy to enhance model performance. Experiments on the LLaMA3 model demonstrate the potential effectiveness of this approach for future developments of instructed MoE models. The source codes and models are available at: \url{https://github.com/OpenSparseLLMs/LLaMA-MoE-v2}.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
A Comprehensive Simulation Framework for CXL Disaggregated Memory
Authors:
Yanjing Wang,
Lizhou Wu,
Wentao Hong,
Yang Ou,
Zicong Wang,
Sunfeng Gao,
Jie Zhang,
Sheng Ma,
Dezun Dong,
Xingyun Qi,
Mingche Lai,
Nong Xiao
Abstract:
Compute eXpress Link (CXL) has emerged as a key enabler of memory disaggregation for future heterogeneous computing systems to expand memory on-demand and improve resource utilization. However, CXL is still in its infancy stage and lacks commodity products on the market, thus necessitating a reliable system-level simulation tool for research and development. In this paper, we propose CXL-DMSim, an…
▽ More
Compute eXpress Link (CXL) has emerged as a key enabler of memory disaggregation for future heterogeneous computing systems to expand memory on-demand and improve resource utilization. However, CXL is still in its infancy stage and lacks commodity products on the market, thus necessitating a reliable system-level simulation tool for research and development. In this paper, we propose CXL-DMSim, an open-source full-system simulator to simulate CXL disaggregated memory systems with high fidelity at a gem5-comparable simulation speed. CXL-DMSim incorporates a flexible CXL memory expander model along with its associated device driver, and CXL protocol support with CXLio and CXLmem. It can operate in both app-managed mode and kernel-managed mode, with the latter using a dedicated NUMA-compatible mechanism. The simulator has been rigorously verified against a real hardware testbed with both FPGA- and ASIC-based CXL memory devices, which demonstrates the qualification of CXL-DMSim in simulating the characteristics of various CXL memory devices at an average simulation error of 3.4%. The experimental results using LMbench and STREAM benchmarks suggest that the CXL-FPGA memory exhibits a ~2.88x higher latency than local DDR while the CXL-ASIC latency is ~2.18x; CXL-FPGA achieves 45-69% of local DDR memory bandwidth, whereas the number for CXL-ASIC is 82-83%. The study also reveals that CXL memory can significantly enhance the performance of memory-intensive applications, improved by 23x at most with limited local memory for Viper key-value database and approximately 60% in memory-bandwidth-sensitive scenarios such as MERCI. Moreover, the simulator's observability and expandability are showcased with detailed case-studies, highlighting its great potential for research on future CXL-interconnected hybrid memory pool.
△ Less
Submitted 8 March, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Mid-infrared Energy Deposition Spectroscopy
Authors:
Jiaze Yin,
Christian Pfluegl,
Chu C. Teng,
Rylie Bolarinho,
Guo Chen,
Xinrui Gong,
Dashan Dong,
Daryoosh Vakhshoori,
Ji-Xin Cheng
Abstract:
Photothermal microscopy is an emerging tool for measuring light-matter interactions with single-molecule sensitivity. It is generally believed that the spectral acquisition speed in photothermal microscopy is limited by the slow thermal diffusion process. Here, we demonstrate mid-infrared energy deposition (MIRED) spectroscopy, which offers both microsecond-scale temporal resolution and sub-micron…
▽ More
Photothermal microscopy is an emerging tool for measuring light-matter interactions with single-molecule sensitivity. It is generally believed that the spectral acquisition speed in photothermal microscopy is limited by the slow thermal diffusion process. Here, we demonstrate mid-infrared energy deposition (MIRED) spectroscopy, which offers both microsecond-scale temporal resolution and sub-micron spatial resolution. In this approach, the photothermal process is optically probed while the infrared pulses from a quantum cascade laser array are rapidly tuned. Based on Newton's law, the energy deposition corresponds to the first derivative of local temperature rise over time and provides the instantaneous infrared absorption. By employing time-resolved measurement of transient energy deposition, the upper limit for spectrum encoding shifts to the vibrational relaxation level, which occurs on the picosecond scale. This method significantly increases the detection bandwidth while maintaining the sensitivity and resolution advantages of photothermal detection.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
On monochromatic solutions to linear equations over the integers
Authors:
Dingding Dong,
Nitya Mani,
Huy Tuan Pham,
Jonathan Tidor
Abstract:
We study the number of monochromatic solutions to linear equations in a $2$-coloring of $\{1,\ldots,n\}$. We show that any nontrivial linear equation has a constant fraction of solutions that are monochromatic in any $2$-coloring of $\{1,\ldots,n\}$. We further study commonness of four-term equations and disprove a conjecture of Costello and Elvin by showing that, unlike over $\mathbb{F}_p$, the f…
▽ More
We study the number of monochromatic solutions to linear equations in a $2$-coloring of $\{1,\ldots,n\}$. We show that any nontrivial linear equation has a constant fraction of solutions that are monochromatic in any $2$-coloring of $\{1,\ldots,n\}$. We further study commonness of four-term equations and disprove a conjecture of Costello and Elvin by showing that, unlike over $\mathbb{F}_p$, the four-term equation $x_1 + 2x_2 - x_3 - 2x_4 = 0$ is uncommon over $\{1,\ldots,n\}$.
△ Less
Submitted 27 October, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Human-LLM Collaborative Construction of a Cantonese Emotion Lexicon
Authors:
Yusong Zhang,
Dong Dong,
Chi-tim Hung,
Leonard Heyerdahl,
Tamara Giles-Vernick,
Eng-kiong Yeoh
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in language understanding and generation. Advanced utilization of the knowledge embedded in LLMs for automated annotation has consistently been explored. This study proposed to develop an emotion lexicon for Cantonese, a low-resource language, through collaborative efforts between LLM and human annotators. By integrating emotio…
▽ More
Large Language Models (LLMs) have demonstrated remarkable capabilities in language understanding and generation. Advanced utilization of the knowledge embedded in LLMs for automated annotation has consistently been explored. This study proposed to develop an emotion lexicon for Cantonese, a low-resource language, through collaborative efforts between LLM and human annotators. By integrating emotion labels provided by LLM and human annotators, the study leveraged existing linguistic resources including lexicons in other languages and local forums to construct a Cantonese emotion lexicon enriched with colloquial expressions. The consistency of the proposed emotion lexicon in emotion extraction was assessed through modification and utilization of three distinct emotion text datasets. This study not only validates the efficacy of the constructed lexicon but also emphasizes that collaborative annotation between human and artificial intelligence can significantly enhance the quality of emotion labels, highlighting the potential of such partnerships in facilitating natural language processing tasks for low-resource languages.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation
Authors:
Kaiyuan Liu,
Jiahao Mei,
Hengyu Zhang,
Yihuai Zhang,
Xingjiao Wu,
Daoguo Dong,
Liang He
Abstract:
Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraph…
▽ More
Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
PINN-MG: A Multigrid-Inspired Hybrid Framework Combining Iterative Method and Physics-Informed Neural Networks
Authors:
Daiwei Dong,
Wei Suo,
Jiaqing Kou,
Weiwei Zhang
Abstract:
Iterative methods are widely used for solving partial differential equations (PDEs). However, the difficulty in eliminating global low-frequency errors significantly limits their convergence speed. In recent years, neural networks have emerged as a novel approach for solving PDEs, with studies revealing that they exhibit faster convergence for low-frequency components. Building on this complementa…
▽ More
Iterative methods are widely used for solving partial differential equations (PDEs). However, the difficulty in eliminating global low-frequency errors significantly limits their convergence speed. In recent years, neural networks have emerged as a novel approach for solving PDEs, with studies revealing that they exhibit faster convergence for low-frequency components. Building on this complementary frequency convergence characteristics of iterative methods and neural networks, we draw inspiration from multigrid methods and propose a hybrid solving framework that combining iterative methods and neural network-based solvers, termed PINN-MG (PMG). In this framework, the iterative method is responsible for eliminating local high-frequency oscillation errors, while Physics-Informed Neural Networks (PINNs) are employed to correct global low-frequency errors. Throughout the solving process, high- and low-frequency components alternately dominate the error, with each being addressed by the iterative method and PINNs respectively, thereby accelerating the convergence. We tested the proposed PMG framework on the linear Poisson equation and the nonlinear Helmholtz equation, and the results demonstrated significant acceleration of the PMG when built on Gauss-Seidel, pseudo-time, and GMRES methods. Furthermore, detailed analysis of the convergence process further validates the rationality of the framework. We proposed that the PMG framework is a hybrid solving approach that does not rely on training data, achieving an organic integration of neural network methods with iterative methods.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
The Symbiotic Recurrent Nova V745 Sco at Radio Wavelengths
Authors:
Isabella Molina,
Laura Chomiuk,
Justin D. Linford,
Elias Aydi,
Amy J. Mioduszewski,
Koji Mukai,
Kirill V. Sokolovsky,
Jay Strader,
Peter Craig,
Dillon Dong,
Chelsea E. Harris,
Miriam M. Nyamai,
Michael P. Rupen,
Jennifer L. Sokoloski,
Frederick M. Walter,
Jennifer H. S. Weston,
Montana N. Williams
Abstract:
V745 Sco is a Galactic symbiotic recurrent nova with nova eruptions in 1937, 1989 and 2014. We study the behavior of V745 Sco at radio wavelengths (0.6-37,GHz), covering both its 1989 and 2014 eruptions and informed by optical, X-ray, and $γ$-ray data. The radio light curves are synchrotron-dominated. Surprisingly, compared to expectations for synchrotron emission from explosive transients such as…
▽ More
V745 Sco is a Galactic symbiotic recurrent nova with nova eruptions in 1937, 1989 and 2014. We study the behavior of V745 Sco at radio wavelengths (0.6-37,GHz), covering both its 1989 and 2014 eruptions and informed by optical, X-ray, and $γ$-ray data. The radio light curves are synchrotron-dominated. Surprisingly, compared to expectations for synchrotron emission from explosive transients such as radio supernovae, the light curves spanning 0.6-37 GHz all peak around the same time ($\sim$18-26 days after eruption) and with similar flux densities (5-9 mJy).We model the synchrotron light curves as interaction of the nova ejecta with the red giant wind, but find that simple spherically symmetric models with wind-like circumstellar material (CSM) cannot explain the radio light curve. Instead, we conclude that the shock suddenly breaks out of a dense CSM absorbing screen around 20 days after eruption, and then expands into a relatively low density wind ($\dot{M}_{out} \approx 10^{-9}-10^{-8}$ M$_{\odot}$ yr$^{-1}$ for $v_w = 10$ km s$^{-1}$) out to $\sim$1 year post-eruption. The dense, close-in CSM may be an equatorial density enhancement or a more spherical red giant wind with $\dot{M}_{in} \approx [5-10] \times 10^{-7}$ M$_{\odot}$ yr$^{-1}$, truncated beyond several $\times 10^{14}$ cm. The outer lower-density CSM would not be visible in typical radio observations of Type Ia supernovae: V745 Sco cannot be ruled out as a Type Ia progenitor based on CSM constraints alone.Complementary constraints from the free-free radio optical depth and the synchrotron luminosity imply the shock is efficient at accelerating relativistic electrons and amplifying magnetic fields, with $ε_e$ and $ε_B \approx 0.01-0.1$.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Quantifying genuine tripartite entanglement by reshaping the state
Authors:
Dong-Dong Dong,
Li-Juan Li,
Xue-Ke Song,
Liu Ye,
Dong Wang
Abstract:
Although genuine multipartite entanglement (GME), as one quantum resource, is indispensable in quantum information processing, most of the existing measures cannot detect GME faithfully. In this paper, we present a novel GME measure, namely the minimum pairwise concurrence (MPC), by introducing pairwise entanglement, which characters the entanglement between two single-qubit subsystems of a multip…
▽ More
Although genuine multipartite entanglement (GME), as one quantum resource, is indispensable in quantum information processing, most of the existing measures cannot detect GME faithfully. In this paper, we present a novel GME measure, namely the minimum pairwise concurrence (MPC), by introducing pairwise entanglement, which characters the entanglement between two single-qubit subsystems of a multipartite system without tracing out the remaining qubit. The pairwise entanglement can be obtained by combining the entanglement of reduced subsystem and three-tangle. Compared with the existing measures, the MPC measure outperforms the previous ones in many aspects. Due to its fine properties, it thus is believed that the MPC could be one of good candidates in achieving potential quantum tasks and also facilitate the understanding for GME.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Preferential Occurrence of Fast Radio Bursts in Massive Star-Forming Galaxies
Authors:
Kritti Sharma,
Vikram Ravi,
Liam Connor,
Casey Law,
Stella Koch Ocker,
Myles Sherman,
Nikita Kosogorov,
Jakob Faber,
Gregg Hallinan,
Charlie Harnach,
Greg Hellbourg,
Rick Hobbs,
David Hodge,
Mark Hodges,
James Lamb,
Paul Rasmussen,
Jean Somalwar,
Sander Weinreb,
David Woody,
Joel Leja,
Shreya Anand,
Kaustav Kashyap Das,
Yu-Jing Qin,
Sam Rose,
Dillon Z. Dong
, et al. (2 additional authors not shown)
Abstract:
Fast Radio Bursts (FRBs) are millisecond-duration events detected from beyond the Milky Way. FRB emission characteristics favor highly magnetized neutron stars, or magnetars, as the sources, as evidenced by FRB-like bursts from a galactic magnetar, and the star-forming nature of FRB host galaxies. However, the processes that produce FRB sources remain unknown. Although galactic magnetars are often…
▽ More
Fast Radio Bursts (FRBs) are millisecond-duration events detected from beyond the Milky Way. FRB emission characteristics favor highly magnetized neutron stars, or magnetars, as the sources, as evidenced by FRB-like bursts from a galactic magnetar, and the star-forming nature of FRB host galaxies. However, the processes that produce FRB sources remain unknown. Although galactic magnetars are often linked to core-collapse supernovae (CCSNe), it's uncertain what determines which supernovae result in magnetars. The galactic environments of FRB sources can be harnessed to probe their progenitors. Here, we present the stellar population properties of 30 FRB host galaxies discovered by the Deep Synoptic Array. Our analysis shows a significant deficit of low-mass FRB hosts compared to the occurrence of star-formation in the universe, implying that FRBs are a biased tracer of star-formation, preferentially selecting massive star-forming galaxies. This bias may be driven by galaxy metallicity, which is positively correlated with stellar mass. Metal-rich environments may favor the formation of magnetar progenitors through stellar mergers, as higher metallicity stars are less compact and more likely to fill their Roche lobes, leading to unstable mass transfer. Although massive stars do not have convective interiors to generate strong magnetic fields by dynamo, merger remnants are thought to have the requisite internal magnetic-field strengths to result in magnetars. The preferential occurrence of FRBs in massive star-forming galaxies suggests that CCSN of merger remnants preferentially forms magnetars.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Random local access for sampling k-SAT solutions
Authors:
Dingding Dong,
Nitya Mani
Abstract:
We present a sublinear time algorithm that gives random local access to the uniform distribution over satisfying assignments to an arbitrary k-CNF formula $Φ$, at exponential clause density. Our algorithm provides memory-less query access to variable assignments, such that the output variable assignments consistently emulate a single global satisfying assignment whose law is close to the uniform d…
▽ More
We present a sublinear time algorithm that gives random local access to the uniform distribution over satisfying assignments to an arbitrary k-CNF formula $Φ$, at exponential clause density. Our algorithm provides memory-less query access to variable assignments, such that the output variable assignments consistently emulate a single global satisfying assignment whose law is close to the uniform distribution over satisfying assignments to $Φ$.
Such models were formally defined (for the more general task of locally sampling from exponentially sized sample spaces) in 2017 by Biswas, Rubinfeld, and Yodpinyanee, who studied the analogous problem for the uniform distribution over proper q-colorings. This model extends a long line of work over multiple decades that studies sublinear time algorithms for problems in theoretical computer science. Random local access and related models have been studied for a wide variety of natural Gibbs distributions and random graphical processes. Here, we establish feasiblity of random local access models for one of the most canonical such sample spaces, the set of satisfying assignments to a k-CNF formula.
△ Less
Submitted 7 April, 2025; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Detection of Radio Emission from Super-flaring Solar-Type Stars in the VLA Sky Survey
Authors:
Ivey Davis,
Gregg Hallinan,
Carlos Ayala,
Dillon Dong,
Steven Myers
Abstract:
Solar-type stars have been observed to flare at optical wavelengths to energies much higher than observed for the Sun. To date, no counterparts have been observed at longer wavelengths. We have searched the the VLA Sky Survey (VLASS) for radio emission associated with a sample of 150 single, solar-type stars previously been observed to exhibit superflares in the Transiting Exoplanet Survey Satelli…
▽ More
Solar-type stars have been observed to flare at optical wavelengths to energies much higher than observed for the Sun. To date, no counterparts have been observed at longer wavelengths. We have searched the the VLA Sky Survey (VLASS) for radio emission associated with a sample of 150 single, solar-type stars previously been observed to exhibit superflares in the Transiting Exoplanet Survey Satellite (TESS). Counterparts to six of these stars were present in VLASS as transient or highly variable radio sources. One of the stars is detected in all three epochs, exhibiting an unprecedented level of apparently persistent radio emission. The engine for this radio emission is unclear, but may be related to accretion, a binary companion, or the presence of large-scale magnetic field. Two stars show radio emission with >50 circular polarization fraction, indicating a coherent emission process likely being present. We find that the six VLASS-detected stars tend to have higher flare rates and higher flare energies of our TESS sample. This, in addition to the VLASS-detected stars adhering to the Gudel-Benz relation, suggest that the radio emission may be directly associated with superflares. These results confirm that the superflare phenomenon on solar-type stars extends to radio wavelengths, in this instance tracing particle acceleration. These data provide the first window on the luminosity function of radio superflares for solar-type stars and highlights the need for coordinated, multi-wavelength monitoring of such stars to fully illustrate the stellar flare-particle relation.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Fast State Stabilization using Deep Reinforcement Learning for Measurement-based Quantum Feedback Control
Authors:
Chunxiang Song,
Yanan Liu,
Daoyi Dong,
Hidehiro Yonezawa
Abstract:
The stabilization of quantum states is a fundamental problem for realizing various quantum technologies. Measurement-based-feedback strategies have demonstrated powerful performance, and the construction of quantum control signals using measurement information has attracted great interest. However, the interaction between quantum systems and the environment is inevitable, especially when measureme…
▽ More
The stabilization of quantum states is a fundamental problem for realizing various quantum technologies. Measurement-based-feedback strategies have demonstrated powerful performance, and the construction of quantum control signals using measurement information has attracted great interest. However, the interaction between quantum systems and the environment is inevitable, especially when measurements are introduced, which leads to decoherence. To mitigate decoherence, it is desirable to stabilize quantum systems faster, thereby reducing the time of interaction with the environment. In this paper, we utilize information obtained from measurement and apply deep reinforcement learning (DRL) algorithms, without explicitly constructing specific complex measurement-control mappings, to rapidly drive random initial quantum state to the target state. The proposed DRL algorithm has the ability to speed up the convergence to a target state, which shortens the interaction between quantum systems and their environments to protect coherence. Simulations are performed on two-qubit and three-qubit systems, and the results show that our algorithm can successfully stabilize random initial quantum system to the target entangled state, with a convergence time faster than traditional methods such as Lyapunov feedback control and several DRL algorithms with different reward functions. Moreover, it exhibits robustness against imperfect measurements and delays in system evolution.
△ Less
Submitted 20 January, 2025; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Adaptive BESS and Grid Setpoints Optimization: A Model-Free Framework for Efficient Battery Management under Dynamic Tariff Pricing
Authors:
Alaa Selim,
Huadong Mo,
Hemanshu Pota,
Daoyi Dong
Abstract:
This paper introduces an enhanced framework for managing Battery Energy Storage Systems (BESS) in residential communities. The non-convex BESS control problem is first addressed using a gradient-based optimizer, providing a benchmark solution. Subsequently, the problem is tackled using multiple Deep Reinforcement Learning (DRL) agents, with a specific emphasis on the off-policy Soft Actor-Critic (…
▽ More
This paper introduces an enhanced framework for managing Battery Energy Storage Systems (BESS) in residential communities. The non-convex BESS control problem is first addressed using a gradient-based optimizer, providing a benchmark solution. Subsequently, the problem is tackled using multiple Deep Reinforcement Learning (DRL) agents, with a specific emphasis on the off-policy Soft Actor-Critic (SAC) algorithm. This version of SAC incorporates reward refinement based on this non-convex problem, applying logarithmic scaling to enhance convergence rates. Additionally, a safety mechanism selects only feasible actions from the action space, aimed at improving the learning curve, accelerating convergence, and reducing computation times. Moreover, the state representation of this DRL approach now includes uncertainties quantified in the entropy term, enhancing the model's adaptability across various entropy types. This developed system adheres to strict limits on the battery's State of Charge (SOC), thus preventing breaches of SOC boundaries and extending the battery lifespan. The robustness of the model is validated across several Australian states' districts, each characterized by unique uncertainty distributions. By implementing the refined SAC, the SOC consistently surpasses 50 percent by the end of each day, enabling the BESS control to start smoothly for the next day with some reserve. Finally, this proposed DRL method achieves a mean reduction in optimization time by 50 percent and an average cost saving of 40 percent compared to the gradient-based optimization benchmark.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
UNR: Unified Notifiable RMA Library for HPC
Authors:
Guangnan Feng,
Jiabin Xie,
Dezun Dong,
Yutong Lu
Abstract:
Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the completion of RMA operations. Many previous works have proposed programming models and extensions to notify the communication peer, but they did not solve the multi-NIC aggregation, por…
▽ More
Remote Memory Access (RMA) enables direct access to remote memory to achieve high performance for HPC applications. However, most modern parallel programming models lack schemes for the remote process to detect the completion of RMA operations. Many previous works have proposed programming models and extensions to notify the communication peer, but they did not solve the multi-NIC aggregation, portability, hardware-software co-design, and usability problems. In this work, we proposed a Unified Notifiable RMA (UNR) library for HPC to address these challenges. In addition, we demonstrate the best practice of utilizing UNR within a real-world scientific application, PowerLLEL. We deployed UNR across four HPC systems, each with a different interconnect. The results show that PowerLLEL powered by UNR achieves up to a 36% acceleration on 1728 nodes of the Tianhe-Xingyi supercomputing system.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
TEAdapter: Supply abundant guidance for controllable text-to-music generation
Authors:
Jialing Zou,
Jiahao Mei,
Xudong Nan,
Jinghua Li,
Daoguo Dong,
Liang He
Abstract:
Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In…
▽ More
Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Authors:
Zhen Tan,
Daize Dong,
Xinyu Zhao,
Jie Peng,
Yu Cheng,
Tianlong Chen
Abstract:
In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model…
▽ More
In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Warming Up Cold-Start CTR Prediction by Learning Item-Specific Feature Interactions
Authors:
Yaqing Wang,
Hongming Piao,
Daxiang Dong,
Quanming Yao,
Jingbo Zhou
Abstract:
In recommendation systems, new items are continuously introduced, initially lacking interaction records but gradually accumulating them over time. Accurately predicting the click-through rate (CTR) for these items is crucial for enhancing both revenue and user experience. While existing methods focus on enhancing item ID embeddings for new items within general CTR models, they tend to adopt a glob…
▽ More
In recommendation systems, new items are continuously introduced, initially lacking interaction records but gradually accumulating them over time. Accurately predicting the click-through rate (CTR) for these items is crucial for enhancing both revenue and user experience. While existing methods focus on enhancing item ID embeddings for new items within general CTR models, they tend to adopt a global feature interaction approach, often overshadowing new items with sparse data by those with abundant interactions. Addressing this, our work introduces EmerG, a novel approach that warms up cold-start CTR prediction by learning item-specific feature interaction patterns. EmerG utilizes hypernetworks to generate an item-specific feature graph based on item characteristics, which is then processed by a Graph Neural Network (GNN). This GNN is specially tailored to provably capture feature interactions at any order through a customized message passing mechanism. We further design a meta learning strategy that optimizes parameters of hypernetworks and GNN across various item CTR prediction tasks, while only adjusting a minimal set of item-specific parameters within each task. This strategy effectively reduces the risk of overfitting when dealing with limited data. Extensive experiments on benchmark datasets validate that EmerG consistently performs the best given no, a few and sufficient instances of new items.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Authors:
Tong Zhu,
Xiaoye Qu,
Daize Dong,
Jiacheng Ruan,
Jingqi Tong,
Conghui He,
Yu Cheng
Abstract:
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B mod…
▽ More
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motivated by this limit, we investigate building MoE models from existing dense large language models. Specifically, based on the well-known LLaMA-2 7B model, we obtain an MoE model by: (1) Expert Construction, which partitions the parameters of original Feed-Forward Networks (FFNs) into multiple experts; (2) Continual Pre-training, which further trains the transformed MoE model and additional gate networks. In this paper, we comprehensively explore different methods for expert construction and various data sampling strategies for continual pre-training. After these stages, our LLaMA-MoE models could maintain language abilities and route the input tokens to specific experts with part of the parameters activated. Empirically, by training 200B tokens, LLaMA-MoE-3.5B models significantly outperform dense models that contain similar activation parameters. The source codes and models are available at https://github.com/pjlab-sys4nlp/llama-moe .
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Authors:
Tong Zhu,
Daize Dong,
Xiaoye Qu,
Jiacheng Ruan,
Wenliang Chen,
Yu Cheng
Abstract:
Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c…
▽ More
Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data cannot be effectively distinguished, leading to suboptimal model performance. To reduce the potential redundancies of datasets, we make the first attempt and propose a novel dynamic data mixture for MoE instruction tuning. Specifically, inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets. Finally, we propose to dynamically adjust the sampling weight of datasets by their inter-redundancies, thus maximizing global performance under a limited training budget. The experimental results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge \& reasoning tasks and open-ended queries. Code and models are available at https://github.com/Spico197/MoE-SFT .
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Optimal control of linear Gaussian quantum systems via quantum learning control
Authors:
Yu-Hong Liu,
Yexiong Zeng,
Qing-Shou Tan,
Daoyi Dong,
Franco Nori,
Jie-Qiao Liao
Abstract:
Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing f…
▽ More
Efficiently controlling linear Gaussian quantum (LGQ) systems is a significant task in both the study of fundamental quantum theory and the development of modern quantum technology. Here, we propose a general quantum-learning-control method for optimally controlling LGQ systems based on the gradient-descent algorithm. Our approach flexibly designs the loss function for diverse tasks by utilizing first- and second-order moments that completely describe the quantum state of LGQ systems. We demonstrate both deep optomechanical cooling and large optomechanical entanglement using this approach. Our approach enables the fast and deep ground-state cooling of a mechanical resonator within a short time, surpassing the limitations of sideband cooling in the continuous-wave driven strong-coupling regime. Furthermore, optomechanical entanglement could be generated remarkably fast and surpass several times the corresponding steady-state entanglement, even when the thermal phonon occupation reaches one hundred. This work will not only broaden the application of quantum learning control, but also open an avenue for optimal control of LGQ systems.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.