-
Impact of Static Friction on Sim2Real in Robotic Reinforcement Learning
Authors:
Xiaoyi Hu,
Qiao Sun,
Bailin He,
Haojie Liu,
Xueyi Zhang,
Chunpeng lu,
Jiangwei Zhong
Abstract:
In robotic reinforcement learning, the Sim2Real gap remains a critical challenge. However, the impact of Static friction on Sim2Real has been underexplored. Conventional domain randomization methods typically exclude Static friction from their parameter space. In our robotic reinforcement learning task, such conventional domain randomization approaches resulted in significantly underperforming rea…
▽ More
In robotic reinforcement learning, the Sim2Real gap remains a critical challenge. However, the impact of Static friction on Sim2Real has been underexplored. Conventional domain randomization methods typically exclude Static friction from their parameter space. In our robotic reinforcement learning task, such conventional domain randomization approaches resulted in significantly underperforming real-world models. To address this Sim2Real challenge, we employed Actuator Net as an alternative to conventional domain randomization. While this method enabled successful transfer to flat-ground locomotion, it failed on complex terrains like stairs. To further investigate physical parameters affecting Sim2Real in robotic joints, we developed a control-theoretic joint model and performed systematic parameter identification. Our analysis revealed unexpectedly high friction-torque ratios in our robotic joints. To mitigate its impact, we implemented Static friction-aware domain randomization for Sim2Real. Recognizing the increased training difficulty introduced by friction modeling, we proposed a simple and novel solution to reduce learning complexity. To validate this approach, we conducted comprehensive Sim2Sim and Sim2Real experiments comparing three methods: conventional domain randomization (without Static friction), Actuator Net, and our Static friction-aware domain randomization. All experiments utilized the Rapid Motor Adaptation (RMA) algorithm. Results demonstrated that our method achieved superior adaptive capabilities and overall performance.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Efficient and Universal Neural-Network Decoder for Stabilizer-Based Quantum Error Correction
Authors:
Gengyuan Hu,
Wanli Ouyang,
Chao-Yang Lu,
Chen Lin,
Han-Sen Zhong
Abstract:
Scaling quantum computing to practical applications necessitates reliable quantum error correction. Although numerous correction codes have been proposed, the overall correction efficiency critically limited by the decode algorithms. We introduce GraphQEC, a code-agnostic decoder leveraging machine-learning on the graph structure of stabilizer codes with linear time complexity. GraphQEC demonstrat…
▽ More
Scaling quantum computing to practical applications necessitates reliable quantum error correction. Although numerous correction codes have been proposed, the overall correction efficiency critically limited by the decode algorithms. We introduce GraphQEC, a code-agnostic decoder leveraging machine-learning on the graph structure of stabilizer codes with linear time complexity. GraphQEC demonstrates unprecedented accuracy and efficiency across all tested code families, including surface codes, color codes, and quantum low-density parity-check (QLDPC) codes. For instance, on a distance-12 QLDPC code, GraphQEC achieves a logical error rate of $9.55 \times 10^{-5}$, an 18-fold improvement over the previous best specialized decoder's $1.74 \times 10^{-3}$ under $p=0.005$ physical error rates, while maintaining $157μ$s/cycle decoding speed. Our approach represents the first universal solution for real-time quantum error correction across arbitrary stabilizer codes.
△ Less
Submitted 3 June, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-level Learning Efficiency
Authors:
Hao-Shu Fang,
Hengxu Yan,
Zhenyu Tang,
Hongjie Fang,
Chenxi Wang,
Cewu Lu
Abstract:
We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separa…
▽ More
We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separates the grasping process into two stages: first, a universal model maps scene geometry to intermediate contact-centric grasp representations, independent of specific robotic hands. Next, a unique grasp decision model is trained for each robotic hand through real-world trial and error, translating these representations into final grasp poses. Our results show a grasp success rate of 75-95\% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98\% with increased training objects. This adaptable method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
Authors:
Jiahui Wu,
Chengjie Lu,
Aitor Arrieta,
Shaukat Ali
Abstract:
Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios,…
▽ More
Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Optimizing Product Provenance Verification using Data Valuation Methods
Authors:
Raquib Bin Yousuf,
Hoang Anh Just,
Shengzhe Xu,
Brian Mayer,
Victor Deklerck,
Jakub Truszkowski,
John C. Simeone,
Jade Saunders,
Chang-Tien Lu,
Ruoxi Jia,
Naren Ramakrishnan
Abstract:
Determining and verifying product provenance remains a critical challenge in global supply chains, particularly as geopolitical conflicts and shifting borders create new incentives for misrepresentation of commodities, such as hiding the origin of illegally harvested timber or agriculture grown on illegally cleared land. Stable Isotope Ratio Analysis (SIRA), combined with Gaussian process regressi…
▽ More
Determining and verifying product provenance remains a critical challenge in global supply chains, particularly as geopolitical conflicts and shifting borders create new incentives for misrepresentation of commodities, such as hiding the origin of illegally harvested timber or agriculture grown on illegally cleared land. Stable Isotope Ratio Analysis (SIRA), combined with Gaussian process regression-based isoscapes, has emerged as a powerful tool for geographic origin verification. However, the effectiveness of these models is often constrained by data scarcity and suboptimal dataset selection. In this work, we introduce a novel data valuation framework designed to enhance the selection and utilization of training data for machine learning models applied in SIRA. By prioritizing high-informative samples, our approach improves model robustness and predictive accuracy across diverse datasets and geographies. We validate our methodology with extensive experiments, demonstrating its potential to significantly enhance provenance verification, mitigate fraudulent trade practices, and strengthen regulatory enforcement of global supply chains.
△ Less
Submitted 16 March, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
A New Framework for AGN Accretion and Jet Feedback in Numerical Simulations
Authors:
Ying-He Celeste Lü,
Paul M. Ricker
Abstract:
Accurate modeling of active galactic nucleus (AGN) feedback, especially due to relativistic jets, is crucial for understanding the cool-core problem in galaxy clusters. We present a new subgrid method to model accretion onto and feedback from AGN in hydrodynamical simulations of galaxy clusters. Instead of applying the traditional Bondi formalism, we use a sink particle algorithm in which the accr…
▽ More
Accurate modeling of active galactic nucleus (AGN) feedback, especially due to relativistic jets, is crucial for understanding the cool-core problem in galaxy clusters. We present a new subgrid method to model accretion onto and feedback from AGN in hydrodynamical simulations of galaxy clusters. Instead of applying the traditional Bondi formalism, we use a sink particle algorithm in which the accretion flux is measured directly through a control surface. A weighting kernel is used to reset the gas properties within the accretion radius at the end of each timestep. We implement feedback in the form of bipolar jets whose properties are tied to the accretion rate. The method is tested with a spherically symmetric Bondi gas flow problem and a Bondi-Hoyle-Lyttleton wind problem, with and without jet feedback. We discuss the reliability of this model by comparing our jet simulations with those in the literature, and we examine the dependence of test results on parameters such as the resolution and size of the jet injection region. We find that the sink particle model can account for the $α$ factor in accretion measurement, and the accretion radius must be resolved with at least two zones to produce realistic black hole accretion. We also show how under-resolving the AGN feedback region in simulations can impact the feedback energy deposited and the jet dynamics. The code described here is the framework for a feedback model, described in a companion paper, that will use accretion disk modeling to more self-consistently determine the feedback efficiency.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
AIdeation: Designing a Human-AI Collaborative Ideation System for Concept Designers
Authors:
Wen-Fan Wang,
Chien-Ting Lu,
Nil Ponsa Campanyà,
Bing-Yu Chen,
Mike Y. Chen
Abstract:
Concept designers in the entertainment industry create highly detailed, often imaginary environments for movies, games, and TV shows. Their early ideation phase requires intensive research, brainstorming, visual exploration, and combination of various design elements to form cohesive designs. However, existing AI tools focus on image generation from user specifications, lacking support for the uni…
▽ More
Concept designers in the entertainment industry create highly detailed, often imaginary environments for movies, games, and TV shows. Their early ideation phase requires intensive research, brainstorming, visual exploration, and combination of various design elements to form cohesive designs. However, existing AI tools focus on image generation from user specifications, lacking support for the unique needs and complexity of concept designers' workflows. Through a formative study with 12 professional designers, we captured their workflows and identified key requirements for AI-assisted ideation tools. Leveraging these insights, we developed AIdeation to support early ideation by brainstorming design concepts with flexible searching and recombination of reference images. A user study with 16 professional designers showed that AIdeation significantly enhanced creativity, ideation efficiency, and satisfaction (all p<.01) compared to current tools and workflows. A field study with 4 studios for 1 week provided insights into AIdeation's benefits and limitations in real-world projects. After the completion of the field study, two studios, covering films, television, and games, have continued to use AIdeation in their commercial projects to date, further validating AIdeation's improvement in ideation quality and efficiency.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Impact of Pressure and Apical Oxygen Vacancies on Superconductivity in La$_3$Ni$_2$O$_7$
Authors:
Chen Lu,
Ming Zhang,
Zhiming Pan,
Congjun Wu,
Fan Yang
Abstract:
The bilayer nickelate La$_3$Ni$_2$O$_7$ under pressure has recently emerged as a promising system for high-$T_c$ superconductivity (SC). In this work, we investigate the fate of the SC properties in La$_3$Ni$_2$O$_{7}$ under pressure, focusing on the effects of structural deformation and apical oxygen vacancies. Employing a low-energy effective $t$-$J_{\parallel}$-$J_{\perp}$ model for the…
▽ More
The bilayer nickelate La$_3$Ni$_2$O$_7$ under pressure has recently emerged as a promising system for high-$T_c$ superconductivity (SC). In this work, we investigate the fate of the SC properties in La$_3$Ni$_2$O$_{7}$ under pressure, focusing on the effects of structural deformation and apical oxygen vacancies. Employing a low-energy effective $t$-$J_{\parallel}$-$J_{\perp}$ model for the $3d_{x^2-y^2}$ orbitals within the slave-boson mean-field approach, we demonstrate that the SC pairing strength is significantly enhanced in the high-pressure tetragonal $I4/mmm$ phase compared to the ambient pressure orthorhombic $Amam$ phase. Furthermore, by simulating random configurations of apical oxygen vacancies, we show that oxygen vacancies suppress both pairing strength and superfluid density. These results underscore the critical role of pressure and oxygen stoichiometry in tuning the SC of La$_3$Ni$_2$O$_7$, providing key insights into optimizing its high-$T_c$ behavior.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation
Authors:
Shailik Sarkar,
Raquib Bin Yousuf,
Linhan Wang,
Brian Mayer,
Thomas Mortier,
Victor Deklerck,
Jakub Truszkowski,
John C. Simeone,
Marigold Norman,
Jade Saunders,
Chang-Tien Lu,
Naren Ramakrishnan
Abstract:
Illegal logging poses a significant threat to global biodiversity, climate stability, and depresses international prices for legal wood harvesting and responsible forest products trade, affecting livelihoods and communities across the globe. Stable isotope ratio analysis (SIRA) is rapidly becoming an important tool for determining the harvest location of traded, organic, products. The spatial patt…
▽ More
Illegal logging poses a significant threat to global biodiversity, climate stability, and depresses international prices for legal wood harvesting and responsible forest products trade, affecting livelihoods and communities across the globe. Stable isotope ratio analysis (SIRA) is rapidly becoming an important tool for determining the harvest location of traded, organic, products. The spatial pattern in stable isotope ratio values depends on factors such as atmospheric and environmental conditions and can thus be used for geographic origin identification. We present here the results of a deployed machine learning pipeline where we leverage both isotope values and atmospheric variables to determine timber harvest location. Additionally, the pipeline incorporates uncertainty estimation to facilitate the interpretation of harvest location determination for analysts. We present our experiments on a collection of oak (Quercus spp.) tree samples from its global range. Our pipeline outperforms comparable state-of-the-art models determining geographic harvest origin of commercially traded wood products, and has been used by European enforcement agencies to identify harvest location misrepresentation. We also identify opportunities for further advancement of our framework and how it can be generalized to help identify the origin of falsely labeled organic products throughout the supply chain.
△ Less
Submitted 16 March, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
A Zero-Inflated Poisson Latent Position Cluster Model
Authors:
Chaoyi Lu,
Riccardo Rastelli,
Nial Friel
Abstract:
The latent position network model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visu…
▽ More
The latent position network model (LPM) is a popular approach for the statistical analysis of network data. A central aspect of this model is that it assigns nodes to random positions in a latent space, such that the probability of an interaction between each pair of individuals or nodes is determined by their distance in this latent space. A key feature of this model is that it allows one to visualize nuanced structures via the latent space representation. The LPM can be further extended to the Latent Position Cluster Model (LPCM), to accommodate the clustering of nodes by assuming that the latent positions are distributed following a finite mixture distribution. In this paper, we extend the LPCM to accommodate missing network data and apply this to non-negative discrete weighted social networks. By treating missing data as ``unusual'' zero interactions, we propose a combination of the LPCM with the zero-inflated Poisson distribution. Statistical inference is based on a novel partially collapsed Markov chain Monte Carlo algorithm, where a Mixture-of-Finite-Mixtures (MFM) model is adopted to automatically determine the number of clusters and optimal group partitioning. Our algorithm features a truncated absorb-eject move, which is a novel adaptation of an idea commonly used in collapsed samplers, within the context of MFMs. Another aspect of our work is that we illustrate our results on 3-dimensional latent spaces, maintaining clear visualizations while achieving more flexibility than 2-dimensional models. The performance of this approach is illustrated via two carefully designed simulation studies, as well as four different publicly available real networks, where some interesting new perspectives are uncovered.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
Authors:
Xiaojie Xu,
Zongyuan Li,
Chang Lu,
Runnan Qi,
Yanan Ni,
Lumin Jiang,
Xiangbei Liu,
Xuebo Zhang,
Yongchun Fang,
Kuihua Huang,
Xian Guo,
Zhanghua Wu,
Zhenya Li
Abstract:
StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first o…
▽ More
StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
GVTNet: Graph Vision Transformer For Face Super-Resolution
Authors:
Chao Yang,
Yong Fan,
Cheng Lu,
Minghao Yuan,
Zhijing Yang
Abstract:
Recent advances in face super-resolution research have utilized the Transformer architecture. This method processes the input image into a series of small patches. However, because of the strong correlation between different facial components in facial images. When it comes to super-resolution of low-resolution images, existing algorithms cannot handle the relationships between patches well, resul…
▽ More
Recent advances in face super-resolution research have utilized the Transformer architecture. This method processes the input image into a series of small patches. However, because of the strong correlation between different facial components in facial images. When it comes to super-resolution of low-resolution images, existing algorithms cannot handle the relationships between patches well, resulting in distorted facial components in the super-resolution results. To solve the problem, we propose a transformer architecture based on graph neural networks called graph vision transformer network. We treat each patch as a graph node and establish an adjacency matrix based on the information between patches. In this way, the patch only interacts between neighboring patches, further processing the relationship of facial components. Quantitative and visualization experiments have underscored the superiority of our algorithm over state-of-the-art techniques. Through detailed comparisons, we have demonstrated that our algorithm possesses more advanced super-resolution capabilities, particularly in enhancing facial components. The PyTorch code is available at https://github.com/continueyang/GVTNet
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
DeltaDiff: A Residual-Guided Diffusion Model for Enhanced Image Super-Resolution
Authors:
Chao Yang,
Yong Fan,
Cheng Lu,
Zhijing Yang
Abstract:
Recently, the application of diffusion models in super-resolution tasks has become a popular research direction. Existing work is focused on fully migrating diffusion models to SR tasks. The diffusion model is proposed in the field of image generation, so in order to make the generated results diverse, the diffusion model combines random Gaussian noise and distributed sampling to increase the rand…
▽ More
Recently, the application of diffusion models in super-resolution tasks has become a popular research direction. Existing work is focused on fully migrating diffusion models to SR tasks. The diffusion model is proposed in the field of image generation, so in order to make the generated results diverse, the diffusion model combines random Gaussian noise and distributed sampling to increase the randomness of the model.
However, the essence of super-resolution tasks requires the model to generate high-resolution images with fidelity. Excessive addition of random factors can result in the model generating detailed information that does not belong to the HR image. To address this issue, we propose a new diffusion model called Deltadiff, which uses only residuals between images for diffusion, making the entire diffusion process more stable. The experimental results show that our method surpasses state-of-the-art models and generates results with better fidelity. Our code and model are publicly available at https://github.com/continueyang/DeltaDiff
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
GraphThought: Graph Combinatorial Optimization with Thought Generation
Authors:
Zixiao Huang,
Lifeng Guo,
Wenhao Li,
Junjie Sheng,
Chuyun Shen,
Haosheng Chen,
Bo Jin,
Changhong Lu,
Xiangfeng Wang
Abstract:
Graph combinatorial optimization (GCO) problems are central to domains like logistics and bioinformatics. While traditional solvers dominate, large language models (LLMs) offer new possibilities for structured reasoning, yet struggle with complex GCO tasks requiring rigorous combinatorial analysis and multi-step deduction, often producing hallucinated steps. We first formalize the Optimal Thoughts…
▽ More
Graph combinatorial optimization (GCO) problems are central to domains like logistics and bioinformatics. While traditional solvers dominate, large language models (LLMs) offer new possibilities for structured reasoning, yet struggle with complex GCO tasks requiring rigorous combinatorial analysis and multi-step deduction, often producing hallucinated steps. We first formalize the Optimal Thoughts Design (OTD) problem, which provides a structured guidance for producing high-quality intermediate reasoning steps. Building on this formulation, we introduce GraphThought, a novel framework that generates effective reasoning sequences through either heuristic-guided forward search or solver-aligned backward reasoning. By fine-tuning LLMs on these structured thought sequences, we develop Llama-GT, an 8B-parameter model that achieves state-of-the-art performance on the GraphArena benchmark, outperforming significantly larger models like DeepSeek-V3. Our results demonstrate that when scaffolded with structured reasoning priors, principled thought generation can significantly enhance LLM performance on GCO tasks without requiring increased model scale.
△ Less
Submitted 12 June, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time
Authors:
Zongyuan Li,
Chang Lu,
Xiaojie Xu,
Runnan Qi,
Yanan Ni,
Lumin Jiang,
Xiangbei Liu,
Xuebo Zhang,
Yongchun Fang,
Kuihua Huang,
Xian Guo
Abstract:
Since the emergence of the Large Language Model (LLM), LLM has been widely used in fields such as writing, translating, and searching. However, there is still great potential for LLM-based methods in handling complex tasks such as decision-making in the StarCraft II environment. To address problems such as lack of relevant knowledge and poor control over subtasks of varying importance, we propose…
▽ More
Since the emergence of the Large Language Model (LLM), LLM has been widely used in fields such as writing, translating, and searching. However, there is still great potential for LLM-based methods in handling complex tasks such as decision-making in the StarCraft II environment. To address problems such as lack of relevant knowledge and poor control over subtasks of varying importance, we propose a Hierarchical Expert Prompt (HEP) for LLM. Our method improves the understanding of game situations through expert-level tactical knowledge, improving the processing quality of tasks of varying importance through a hierarchical framework. Our approach defeated the highest level (Elite) standard built-in agent in TextStarCraft II for the first time and consistently outperformed the baseline method in other difficulties. Our experiments suggest that the proposed method is a practical solution for tackling complex decision-making challenges. The replay video can be viewed on https://www.bilibili.com/video/BV1uz42187EF and https://youtu.be/dO3PshWLV5M, and our codes have been open-sourced on https://github.com/luchang1113/HEP-LLM-play-StarCraftII.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Nonreciprocal Control of the Speed of Light Using Cavity Magnonics
Authors:
Jiguang Yao,
Chenyang Lu,
Xiaolong Fan,
Desheng Xue,
Greg E. Bridges,
C. -M. Hu
Abstract:
We demonstrate nonreciprocal control of the speed of light by sending a microwave pulse through a cavity magnonics device. In contrast to reciprocal group velocity controlled by conventional electromagnetically induced transparency (EIT) effect, incorporating dissipative magnon-photon coupling establishes a non-reciprocal EIT effect, allowing slow and fast light propagation in opposite directions…
▽ More
We demonstrate nonreciprocal control of the speed of light by sending a microwave pulse through a cavity magnonics device. In contrast to reciprocal group velocity controlled by conventional electromagnetically induced transparency (EIT) effect, incorporating dissipative magnon-photon coupling establishes a non-reciprocal EIT effect, allowing slow and fast light propagation in opposite directions at the same frequency with comparable amplitude. Remarkably, reversing the magnetic field enables a directional switch between non-reciprocal fast and slow light. This discovery may offer new possibilities for pulse time regulation in microwave signal communications, neuromorphic computing, and quantum signal processing.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
SkyRover: A Modular Simulator for Cross-Domain Pathfinding
Authors:
Wenhui Ma,
Wenhao Li,
Bo Jin,
Changhong Lu,
Xiangfeng Wang
Abstract:
Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc. However, existing simulators often focus on a single domain, limiting cross-domain study. This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF). SkyRover supports realistic agent dynamics, configurable 3D envi…
▽ More
Unmanned Aerial Vehicles (UAVs) and Automated Guided Vehicles (AGVs) increasingly collaborate in logistics, surveillance, inspection tasks and etc. However, existing simulators often focus on a single domain, limiting cross-domain study. This paper presents the SkyRover, a modular simulator for UAV-AGV multi-agent pathfinding (MAPF). SkyRover supports realistic agent dynamics, configurable 3D environments, and convenient APIs for external solvers and learning methods. By unifying ground and aerial operations, it facilitates cross-domain algorithm design, testing, and benchmarking. Experiments highlight SkyRover's capacity for efficient pathfinding and high-fidelity simulations in UAV-AGV coordination. Project is available at https://sites.google.com/view/mapf3d/home.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Automated Capability Discovery via Foundation Model Self-Exploration
Authors:
Cong Lu,
Shengran Hu,
Jeff Clune
Abstract:
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of these abilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort t…
▽ More
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of these abilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers a diverse spectrum of surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically generates thousands of distinct tasks, which are then clustered to reveal dozens of broader capability areas and failure modes, that would be challenging for any single team to uncover. We further validate our method's automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models' ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code and evaluation logs are open-sourced at https://github.com/conglu1997/ACD.
△ Less
Submitted 9 June, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Position reconstruction and surface background model for the PandaX-4T detector
Authors:
Zhicheng Qian,
Linhui Gu,
Chen Cheng,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou
, et al. (78 additional authors not shown)
Abstract:
We report the position reconstruction methods and surface background model for the PandaX-4T dark matter direct search experiment. This work develops two position reconstruction algorithms: template matching (TM) method and photon acceptance function (PAF) method. Both methods determine the horizontal position of events based on the light pattern of secondary scintillation collected by the light s…
▽ More
We report the position reconstruction methods and surface background model for the PandaX-4T dark matter direct search experiment. This work develops two position reconstruction algorithms: template matching (TM) method and photon acceptance function (PAF) method. Both methods determine the horizontal position of events based on the light pattern of secondary scintillation collected by the light sensors. After a comprehensive evaluation of resolution, uniformity, and robustness, the PAF method was selected for position reconstruction, while the TM method was employed for verification. The PAF method achieves a bulk event resolution of 1.0 mm and a surface event resolution of 4.4 mm for a typical $S2$ signal with a bottom charge of 1500 PE (about 14 keV). The uniformity is around 20\%. Robustness studies reveal average deviations of 5.1 mm and 8.8 mm for the commissioning run (Run0) and the first science run (Run1), respectively, due to the deactivation of certain PMTs. A data-driven surface background model is developed based on the PAF method. The surface background is estimated to be $0.09 \pm 0.06$ events for Run0 (0.54 tonne$\cdot$year) and $0.17 \pm 0.11$ events for Run1 (1.00 tonne$\cdot$year).
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Emergent Response Planning in LLMs
Authors:
Zhichen Dong,
Zhanhui Zhou,
Zhixuan Liu,
Chao Yang,
Chaochao Lu
Abstract:
In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., r…
▽ More
In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., response length, reasoning steps), $\textit{content attributes}$ (e.g., character choices in storywriting, multiple-choice answers at the end of response), and $\textit{behavior attributes}$ (e.g., answer confidence, factual consistency). In addition to identifying response planning, we explore how it scales with model size across tasks and how it evolves during generation. The findings that LLMs plan ahead for the future in their hidden representations suggest potential applications for improving transparency and generation control.
△ Less
Submitted 6 June, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?
Authors:
Yujin Han,
Andi Han,
Wei Huang,
Chaochao Lu,
Difan Zou
Abstract:
Despite the remarkable success of diffusion models (DMs) in data generation, they exhibit specific failure cases with unsatisfactory outputs. We focus on one such limitation: the ability of DMs to learn hidden rules between image features. Specifically, for image data with dependent features ($\mathbf{x}$) and ($\mathbf{y}$) (e.g., the height of the sun ($\mathbf{x}$) and the length of the shadow…
▽ More
Despite the remarkable success of diffusion models (DMs) in data generation, they exhibit specific failure cases with unsatisfactory outputs. We focus on one such limitation: the ability of DMs to learn hidden rules between image features. Specifically, for image data with dependent features ($\mathbf{x}$) and ($\mathbf{y}$) (e.g., the height of the sun ($\mathbf{x}$) and the length of the shadow ($\mathbf{y}$)), we investigate whether DMs can accurately capture the inter-feature rule ($p(\mathbf{y}|\mathbf{x})$). Empirical evaluations on mainstream DMs (e.g., Stable Diffusion 3.5) reveal consistent failures, such as inconsistent lighting-shadow relationships and mismatched object-mirror reflections. Inspired by these findings, we design four synthetic tasks with strongly correlated features to assess DMs' rule-learning abilities. Extensive experiments show that while DMs can identify coarse-grained rules, they struggle with fine-grained ones. Our theoretical analysis demonstrates that DMs trained via denoising score matching (DSM) exhibit constant errors in learning hidden rules, as the DSM objective is not compatible with rule conformity. To mitigate this, we introduce a common technique - incorporating additional classifier guidance during sampling, which achieves (limited) improvements. Our analysis reveals that the subtle signals of fine-grained rules are challenging for the classifier to capture, providing insights for future exploration.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning
Authors:
Nikolaos Tsagkas,
Andreas Sochopoulos,
Duolikun Danier,
Sethu Vijayakumar,
Chris Xiaoxuan Lu,
Oisin Mac Aodha
Abstract:
The integration of pre-trained visual representations (PVRs) into visuo-motor robot learning has emerged as a promising alternative to training visual encoders from scratch. However, PVRs face critical challenges in the context of policy learning, including temporal entanglement and an inability to generalise even in the presence of minor scene perturbations. These limitations hinder performance i…
▽ More
The integration of pre-trained visual representations (PVRs) into visuo-motor robot learning has emerged as a promising alternative to training visual encoders from scratch. However, PVRs face critical challenges in the context of policy learning, including temporal entanglement and an inability to generalise even in the presence of minor scene perturbations. These limitations hinder performance in tasks requiring temporal awareness and robustness to scene changes. This work identifies these shortcomings and proposes solutions to address them. First, we augment PVR features with temporal perception and a sense of task completion, effectively disentangling them in time. Second, we introduce a module that learns to selectively attend to task-relevant local features, enhancing robustness when evaluated on out-of-distribution scenes. Our experiments demonstrate significant performance improvements, particularly in PVRs trained with masking objectives, and validate the effectiveness of our enhancements in addressing PVR-specific limitations.
△ Less
Submitted 5 May, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Search for Double Beta Decay of $^{136}$Xe to the $0^+_1$ Excited State of $^{136}$Ba with PandaX-4T
Authors:
PandaX Collaboration,
Lingyin Luo,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingji Fang,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou
, et al. (76 additional authors not shown)
Abstract:
We perform a search of double beta decay of $^{136}$Xe to the excited state, $0^+_1$, of $^{136}$Ba (2$νββ$-0$_1^+$), using the dual-phase xenon detector of PandaX-4T with the first 94.9-day commissioning data. The multi-site events are reconstructed up to the MeV energy scale, which helps to improve the background model significantly. The background contribution from the stainless steel platform…
▽ More
We perform a search of double beta decay of $^{136}$Xe to the excited state, $0^+_1$, of $^{136}$Ba (2$νββ$-0$_1^+$), using the dual-phase xenon detector of PandaX-4T with the first 94.9-day commissioning data. The multi-site events are reconstructed up to the MeV energy scale, which helps to improve the background model significantly. The background contribution from the stainless steel platform outside PandaX-4T cryostat is evaluated for the first time. No significant evidence for 2$νββ$-$0_1^+$ is observed, resulting in a lower limit on half-life of $7.5 \times 10^{22}$ yr at the 90% confidence level. This is the first experimental limit on such a rare decay in a natural xenon-based detector.
△ Less
Submitted 7 March, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Uniform estimates: from Yau to Kolodziej
Authors:
Vincent Guedj,
Chinh H. Lu
Abstract:
In this note we provide a new and efficient approach to uniform estimates for solutions to complex Monge-Ampere equations, as well as for solutions to geometric PDE's that satisfy a determinantal majorization.
In this note we provide a new and efficient approach to uniform estimates for solutions to complex Monge-Ampere equations, as well as for solutions to geometric PDE's that satisfy a determinantal majorization.
△ Less
Submitted 18 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Safety Alignment Depth in Large Language Models: A Markov Chain Perspective
Authors:
Ching-Chia Kao,
Chia-Mu Yu,
Chun-Shien Lu,
Chu-Song Chen
Abstract:
Large Language Models (LLMs) are increasingly adopted in high-stakes scenarios, yet their safety mechanisms often remain fragile. Simple jailbreak prompts or even benign fine-tuning can bypass these protocols, underscoring the need to understand where and how they fail. Recent findings suggest that vulnerabilities emerge when alignment is confined to only the initial output tokens. Unfortunately,…
▽ More
Large Language Models (LLMs) are increasingly adopted in high-stakes scenarios, yet their safety mechanisms often remain fragile. Simple jailbreak prompts or even benign fine-tuning can bypass these protocols, underscoring the need to understand where and how they fail. Recent findings suggest that vulnerabilities emerge when alignment is confined to only the initial output tokens. Unfortunately, even with the introduction of deep safety alignment, determining the optimal safety depth remains an unresolved challenge. By leveraging the equivalence between autoregressive language models and Markov chains, this paper offers the first theoretical result on how to identify the ideal depth for safety alignment, and demonstrates how permutation-based data augmentation can tighten these bounds. Crucially, we reveal a fundamental interaction between alignment depth and ensemble width-indicating that broader ensembles can compensate for shallower alignments. These insights provide a theoretical foundation for designing more robust, scalable safety strategies that complement existing alignment approaches, opening new avenues for research into safer, more reliable LLMs.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
On uniqueness of solutions to complex Monge-Ampère mean field equations
Authors:
Chinh H. Lu,
Trong-Thuc Phung
Abstract:
We establish the uniqueness of solutions to complex Monge-Ampère mean field equations when the temperature parameter is small. In the local setting of bounded hyperconvex domains, our result partially confirms a conjecture by Berman and Berndtsson. Our approach also extends to the global context of compact complex manifolds.
We establish the uniqueness of solutions to complex Monge-Ampère mean field equations when the temperature parameter is small. In the local setting of bounded hyperconvex domains, our result partially confirms a conjecture by Berman and Berndtsson. Our approach also extends to the global context of compact complex manifolds.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Adversarial Masked Autoencoder Purifier with Defense Transferability
Authors:
Yuan-Chih Chen,
Chun-Shien Lu
Abstract:
The study of adversarial defense still struggles to combat with advanced adversarial attacks. In contrast to most prior studies that rely on the diffusion model for test-time defense to remarkably increase the inference time, we propose Masked AutoEncoder Purifier (MAEP), which integrates Masked AutoEncoder (MAE) into an adversarial purifier framework for test-time purification. While MAEP achieve…
▽ More
The study of adversarial defense still struggles to combat with advanced adversarial attacks. In contrast to most prior studies that rely on the diffusion model for test-time defense to remarkably increase the inference time, we propose Masked AutoEncoder Purifier (MAEP), which integrates Masked AutoEncoder (MAE) into an adversarial purifier framework for test-time purification. While MAEP achieves promising adversarial robustness, it particularly features model defense transferability and attack generalization without relying on using additional data that is different from the training dataset. To our knowledge, MAEP is the first study of adversarial purifier based on MAE. Extensive experimental results demonstrate that our method can not only maintain clear accuracy with only a slight drop but also exhibit a close gap between the clean and robust accuracy. Notably, MAEP trained on CIFAR10 achieves state-of-the-art performance even when tested directly on ImageNet, outperforming existing diffusion-based models trained specifically on ImageNet.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Reliable Density Functional Theory Predictions of Bandgaps for Materials
Authors:
Chenxi Lu,
Musen Li,
Michael J. Ford,
Rika Kobayashi,
Roger Amos,
Jeffrey R. Reimers
Abstract:
We consider methods for optimizing the bandgap calculation of 3D materials, considering 340 sample materials. Examined are the effects of the choice of the pseudopotential to describe core electrons, the plane-wave basis set cutoff energy, and the Brillouin zone integration. Cost-saving calculations in which the structure is optimized using reduced-quality Brillouin zone integrations and cutoff en…
▽ More
We consider methods for optimizing the bandgap calculation of 3D materials, considering 340 sample materials. Examined are the effects of the choice of the pseudopotential to describe core electrons, the plane-wave basis set cutoff energy, and the Brillouin zone integration. Cost-saving calculations in which the structure is optimized using reduced-quality Brillouin zone integrations and cutoff energies were found to lead to experimentally significant errors exceeding 0.1 eV in 18% of cases using the PBE functional and 21% of cases using PBE0. Such cost-savings approaches are therefore not recommended for general applications. Also, the current practice of using unoptimized grids to perform the Brillouin-zone integrations in bandgap calculations is found to be unreliable for 16% of materials using PBE and for 23% using PBE0. A k-space optimization scheme is introduced that interpolates extensive PBE results to determine a generally useful approach that when used in PBE0 calculations is found to be inadequate for only 1.6% of the materials studied.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm
Authors:
Hanrui Wang,
Ching-Chun Chang,
Chun-Shien Lu,
Christopher Leckie,
Isao Echizen
Abstract:
A critical requirement for deep learning models is ensuring their robustness against adversarial attacks. These attacks commonly introduce noticeable perturbations, compromising the visual fidelity of adversarial examples. Another key challenge is that while white-box algorithms can generate effective adversarial perturbations, they require access to the model gradients, limiting their practicalit…
▽ More
A critical requirement for deep learning models is ensuring their robustness against adversarial attacks. These attacks commonly introduce noticeable perturbations, compromising the visual fidelity of adversarial examples. Another key challenge is that while white-box algorithms can generate effective adversarial perturbations, they require access to the model gradients, limiting their practicality in many real-world scenarios. Existing attack mechanisms struggle to achieve similar efficacy without access to these gradients. In this paper, we introduce GreedyPixel, a novel pixel-wise greedy algorithm designed to generate high-quality adversarial examples using only query-based feedback from the target model. GreedyPixel improves computational efficiency in what is typically a brute-force process by perturbing individual pixels in sequence, guided by a pixel-wise priority map. This priority map is constructed by ranking gradients obtained from a surrogate model, providing a structured path for perturbation. Our results demonstrate that GreedyPixel achieves attack success rates comparable to white-box methods without the need for gradient information, and surpasses existing algorithms in black-box settings, offering higher success rates, reduced computational time, and imperceptible perturbations. These findings underscore the advantages of GreedyPixel in terms of attack efficacy, time efficiency, and visual quality.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Escaping Barren Plateau: Co-Exploration of Quantum Circuit Parameters and Architectures
Authors:
Yipei Liu,
Yuhong Song,
Jinyang Li,
Qiang Guan,
Cheng-chang Lu,
Youzuo Lin,
Weiwen Jiang
Abstract:
Barren plateaus (BP), characterized by exponentially vanishing gradients that hinder the training of variational quantum circuits (VQC), present a pervasive and critical challenge in applying variational quantum algorithms to real-world applications. It is widely recognized that the BP problem becomes more pronounced with an increase in the number of parameters. This work demonstrates that the BP…
▽ More
Barren plateaus (BP), characterized by exponentially vanishing gradients that hinder the training of variational quantum circuits (VQC), present a pervasive and critical challenge in applying variational quantum algorithms to real-world applications. It is widely recognized that the BP problem becomes more pronounced with an increase in the number of parameters. This work demonstrates that the BP problem manifests at different scales depending on the specific application, highlighting the absence of a universal VQC ansatz capable of resolving the BP issue across all applications. Consequently, there is an imminent need for an automated tool to design and optimize VQC architectures tailored to specific applications. To close the gap, this paper takes Variational Quantum Eigensolvers (VQEs) as a vehicle, and we propose a novel quantum circuit parameter and architecture co-exploration framework, namely AntiBP. Experimental results demonstrate that AntiBP effectively avoids the BP issue for circuits that are not under-parameterized in noise-free environments. Furthermore, AntiBP significantly outperforms baseline VQEs in noisy environments.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Authors:
DeepSeek-AI,
Daya Guo,
Dejian Yang,
Haowei Zhang,
Junxiao Song,
Ruoyu Zhang,
Runxin Xu,
Qihao Zhu,
Shirong Ma,
Peiyi Wang,
Xiao Bi,
Xiaokang Zhang,
Xingkai Yu,
Yu Wu,
Z. F. Wu,
Zhibin Gou,
Zhihong Shao,
Zhuoshu Li,
Ziyi Gao,
Aixin Liu,
Bing Xue,
Bingxuan Wang,
Bochao Wu,
Bei Feng,
Chengda Lu
, et al. (175 additional authors not shown)
Abstract:
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters…
▽ More
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Pre-Trained Large Language Model Based Remaining Useful Life Transfer Prediction of Bearing
Authors:
Laifa Tao,
Zhengduo Zhao,
Xuesong Wang,
Bin Li,
Wenchao Zhan,
Xuanyuan Su,
Shangyu Li,
Qixuan Huang,
Haifei Liu,
Chen Lu,
Zhixuan Lian
Abstract:
Accurately predicting the remaining useful life (RUL) of rotating machinery, such as bearings, is essential for ensuring equipment reliability and minimizing unexpected industrial failures. Traditional data-driven deep learning methods face challenges in practical settings due to inconsistent training and testing data distributions and limited generalization for long-term predictions.
Accurately predicting the remaining useful life (RUL) of rotating machinery, such as bearings, is essential for ensuring equipment reliability and minimizing unexpected industrial failures. Traditional data-driven deep learning methods face challenges in practical settings due to inconsistent training and testing data distributions and limited generalization for long-term predictions.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and Implementation
Authors:
Yuang Chen,
Chang Wu,
Fangyu Zhang,
Chengdi Lu,
Yongsheng Huang,
Hancheng Lu
Abstract:
As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services. However, it also presents significant challenges stemming from dispersed node locations and intricate network topologies. In this paper, we have proposed a topology-aware MSA characterized by a three-tier network traffic model encompas…
▽ More
As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services. However, it also presents significant challenges stemming from dispersed node locations and intricate network topologies. In this paper, we have proposed a topology-aware MSA characterized by a three-tier network traffic model encompassing the service, microservices, and edge node layers. This model meticulously characterizes the complex dependencies between edge network topologies and microservices, mapping microservice deployment onto link traffic to accurately estimate communication delay. Building upon this model, we have formulated a weighted sum communication delay optimization problem considering different types of services. Then, a novel topology-aware and individual-adaptive microservices deployment (TAIA-MD) scheme is proposed to solve the problem efficiently, which accurately senses the network topology and incorporates an individual-adaptive mechanism in a genetic algorithm to accelerate the convergence and avoid local optima. Extensive simulations show that, compared to the existing deployment schemes, TAIA-MD improves the communication delay performance by approximately 30% to 60% and effectively enhances the overall network performance. Furthermore, we implement the TAIA-MD scheme on a practical microservice physical platform. The experimental results demonstrate that TAIA-MD achieves superior robustness in withstanding link failures and network fluctuations.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation
Authors:
Yuezhang Lv,
Yunzhou Zhang,
Chao Lu,
Jiajun Zhu,
Song Wu
Abstract:
Accurate spatiotemporal calibration is a prerequisite for multisensor fusion. However, sensors are typically asynchronous, and there is no overlap between the fields of view of cameras and LiDARs, posing challenges for intrinsic and extrinsic parameter calibration. To address this, we propose a calibration pipeline based on continuous-time and bundle adjustment (BA) capable of simultaneous intrins…
▽ More
Accurate spatiotemporal calibration is a prerequisite for multisensor fusion. However, sensors are typically asynchronous, and there is no overlap between the fields of view of cameras and LiDARs, posing challenges for intrinsic and extrinsic parameter calibration. To address this, we propose a calibration pipeline based on continuous-time and bundle adjustment (BA) capable of simultaneous intrinsic and extrinsic calibration (6 DOF transformation and time offset). We do not require overlapping fields of view or any calibration board. Firstly, we establish data associations between cameras using Structure from Motion (SFM) and perform self-calibration of camera intrinsics. Then, we establish data associations between LiDARs through adaptive voxel map construction, optimizing for extrinsic calibration within the map. Finally, by matching features between the intensity projection of LiDAR maps and camera images, we conduct joint optimization for intrinsic and extrinsic parameters. This pipeline functions in texture-rich structured environments, allowing simultaneous calibration of any number of cameras and LiDARs without the need for intricate sensor synchronization triggers. Experimental results demonstrate our method's ability to fulfill co-visibility and motion constraints between sensors without accumulating errors.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
The Race to Efficiency: A New Perspective on AI Scaling Laws
Authors:
Chien-Ping Lu
Abstract:
As large-scale AI models expand, training becomes costlier and sustaining progress grows harder. Classical scaling laws (e.g., Kaplan et al. (2020), Hoffmann et al. (2022)) predict training loss from a static compute budget yet neglect time and efficiency, prompting the question: how can we balance ballooning GPU fleets with rapidly improving hardware and algorithms? We introduce the relative-loss…
▽ More
As large-scale AI models expand, training becomes costlier and sustaining progress grows harder. Classical scaling laws (e.g., Kaplan et al. (2020), Hoffmann et al. (2022)) predict training loss from a static compute budget yet neglect time and efficiency, prompting the question: how can we balance ballooning GPU fleets with rapidly improving hardware and algorithms? We introduce the relative-loss equation, a time- and efficiency-aware framework that extends classical AI scaling laws. Our model shows that, without ongoing efficiency gains, advanced performance could demand millennia of training or unrealistically large GPU fleets. However, near-exponential progress remains achievable if the "efficiency-doubling rate" parallels Moore's Law. By formalizing this race to efficiency, we offer a quantitative roadmap for balancing front-loaded GPU investments with incremental improvements across the AI stack. Empirical trends suggest that sustained efficiency gains can push AI scaling well into the coming decade, providing a new perspective on the diminishing returns inherent in classical scaling.
△ Less
Submitted 8 January, 2025; v1 submitted 3 January, 2025;
originally announced January 2025.
-
Sequencing Silicates in the IRS Debris Disk Catalog I: Methodology for Unsupervised Clustering
Authors:
Cicero X. Lu,
Tushar Mittal,
Christine H. Chen,
Alexis Y. Li,
Kadin Worthen,
B. A. Sargent,
Carey M. Lisse,
G. C. Sloan,
Dean C. Hines,
Dan M. Watson,
Isabel Rebollido,
Bin B. Ren,
Joel D. Green
Abstract:
Debris disks, which consist of dust, planetesimals, planets, and gas, offer a unique window into the mineralogical composition of their parent bodies, especially during the critical phase of terrestrial planet formation spanning 10 to a few hundred million years. Observations from the $\textit{Spitzer}$ Space Telescope have unveiled thousands of debris disks, yet systematic studies remain scarce,…
▽ More
Debris disks, which consist of dust, planetesimals, planets, and gas, offer a unique window into the mineralogical composition of their parent bodies, especially during the critical phase of terrestrial planet formation spanning 10 to a few hundred million years. Observations from the $\textit{Spitzer}$ Space Telescope have unveiled thousands of debris disks, yet systematic studies remain scarce, let alone those with unsupervised clustering techniques. This study introduces $\texttt{CLUES}$ (CLustering UnsupErvised with Sequencer), a novel, non-parametric, fully-interpretable machine-learning spectral analysis tool designed to analyze and classify the spectral data of debris disks. $\texttt{CLUES}$ combines multiple unsupervised clustering methods with multi-scale distance measures to discern new groupings and trends, offering insights into compositional diversity and geophysical processes within these disks. Our analysis allows us to explore a vast parameter space in debris disk mineralogy and also offers broader applications in fields such as protoplanetary disks and solar system objects. This paper details the methodology, implementation, and initial results of $\texttt{CLUES}$, setting the stage for more detailed follow-up studies focusing on debris disk mineralogy and demographics.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
DMSA: A Decentralized Microservice Architecture for Edge Networks
Authors:
Yuang Chen,
Chengdi Lu,
Yongsheng Huang,
Chang Wu,
Fengqian Guo,
Hancheng Lu,
Chang Wen Chen
Abstract:
The dispersed node locations and complex topologies of edge networks, combined with intricate dynamic microservice dependencies, render traditional centralized microservice architectures (MSAs) unsuitable. In this paper, we propose a decentralized microservice architecture (DMSA), which delegates scheduling functions from the control plane to edge nodes. DMSA redesigns and implements three core mo…
▽ More
The dispersed node locations and complex topologies of edge networks, combined with intricate dynamic microservice dependencies, render traditional centralized microservice architectures (MSAs) unsuitable. In this paper, we propose a decentralized microservice architecture (DMSA), which delegates scheduling functions from the control plane to edge nodes. DMSA redesigns and implements three core modules of microservice discovery, monitoring, and scheduling for edge networks to achieve precise awareness of instance deployments, low monitoring overhead and measurement errors, and accurate dynamic scheduling, respectively. Particularly, DMSA has customized a microservice scheduling scheme that leverages multi-port listening and zero-copy forwarding to guarantee high data forwarding efficiency. Moreover, a dynamic weighted multi-level load balancing algorithm is proposed to adjust scheduling dynamically with consideration of reliability, priority, and response delay. Finally, we have implemented a physical verification platform for DMSA. Extensive empirical results demonstrate that compared to state-of-the-art and traditional scheduling schemes, DMSA effectively counteracts link failures and network fluctuations, improving the service response delay and execution success rate by approximately $60\% \sim 75\%$ and $10\%\sim15\%$, respectively.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Comprehensive Measurement of the Reactor Antineutrino Spectrum and Flux at Daya Bay
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the precise measurement of reactor antineutrino spectrum and flux based on the full data set of 4.7 million inverse-beta-decay (IBD) candidates collected at Daya Bay near detectors. Expressed in terms of the IBD yield per fission, the antineutrino spectra from all reactor fissile isotopes and the specific $\mathrm{^{235}U}$ and $\mathrm{^{239}Pu}$ isotopes are measured with 1.3…
▽ More
This Letter reports the precise measurement of reactor antineutrino spectrum and flux based on the full data set of 4.7 million inverse-beta-decay (IBD) candidates collected at Daya Bay near detectors. Expressed in terms of the IBD yield per fission, the antineutrino spectra from all reactor fissile isotopes and the specific $\mathrm{^{235}U}$ and $\mathrm{^{239}Pu}$ isotopes are measured with 1.3$\%$, 3$\%$ and 8$\%$ uncertainties respectively near the 3 MeV spectrum peak in reconstructed energy, reaching the best precision in the world. The total antineutrino flux and isotopic $\mathrm{^{235}U}$ and $\mathrm{^{239}Pu}$ fluxes are precisely measured to be $5.84\pm0.07$, $6.16\pm0.12$ and $4.16\pm0.21$ in units of $10^{-43} \mathrm{cm^2/fission}$. These measurements are compared with the Huber-Mueller (HM) model, the reevaluated conversion model based on the Kurchatov Institute (KI) measurement and the latest Summation Model (SM2023). The Daya Bay flux shows good consistency with KI and SM2023 models, but disagrees with HM model. The Daya Bay spectrum, however, disagrees with all model predictions.
△ Less
Submitted 22 May, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
Final-state rescattering mechanism of bottom-baryon decays
Authors:
Zhu-Ding Duan,
Jian-Peng Wang,
Run-Hui Li,
Cai-Dian Lv,
Fu-Sheng Yu
Abstract:
We perform an analysis on the non-leptonic two-body weak decays of $Λ^{0}_{b}$ within the framework of the final-state rescattering mechanism. The strong phases can be obtained by realizing complete hadronic triangle loop integrations. Then the CP violation and decay asymmetry parameters can be predicted. In this work, we focus on the exclusive decays of $Λ^{0}_{b}\to pπ^{-}/K^{-}/ρ^{-}/K^{*-}$ an…
▽ More
We perform an analysis on the non-leptonic two-body weak decays of $Λ^{0}_{b}$ within the framework of the final-state rescattering mechanism. The strong phases can be obtained by realizing complete hadronic triangle loop integrations. Then the CP violation and decay asymmetry parameters can be predicted. In this work, we focus on the exclusive decays of $Λ^{0}_{b}\to pπ^{-}/K^{-}/ρ^{-}/K^{*-}$ and $Λφ$ and achieve numerical predictions for many observables, including branching ratios, direct and partial-wave CP asymmetries, and decay asymmetry parameters. The results are very consistent with the current data, showing the validity of the final-state rescattering mechanism for $b$-baryon decays. It is therefore expected to be applied to predict CP asymmetries in many other channels of $b$-baryon decays.
△ Less
Submitted 29 December, 2024;
originally announced December 2024.
-
Holographic fermions in the Dyonic Gubser-Rocha black hole
Authors:
Cheng-Yuan Lu,
Xian-Hui Ge,
Sang-Jin Sin
Abstract:
We investigate the fermionic properties of a dyonic Gubser-Rocha model in the context of gauge/gravity duality. This model incorporates both a magnetic field and momentum relaxation. We have derived this model's scaling exponent, revealing the influence of the magnetic field and momentum relaxation on low-energy physics. As the magnetic field strength and momentum relaxation increase, the spectral…
▽ More
We investigate the fermionic properties of a dyonic Gubser-Rocha model in the context of gauge/gravity duality. This model incorporates both a magnetic field and momentum relaxation. We have derived this model's scaling exponent, revealing the influence of the magnetic field and momentum relaxation on low-energy physics. As the magnetic field strength and momentum relaxation increase, the spectral function of the dual field changes significantly. Specifically, we observe variations in the scaling exponent, Fermi momentum, and dispersion relations as the magnetic field increases, highlighting the system's transition from a Fermi liquid to a non-Fermi liquid, and eventually to an insulating state. Our analysis of the magneto-scattering rate reveals that it is nearly zero in the Fermi liquid region, increases significantly in the non-Fermi liquid region, and ultimately arrives at a maximum value in the insulating state.
△ Less
Submitted 28 December, 2024;
originally announced December 2024.
-
Search for Solar Boosted Dark Matter Particles at the PandaX-4T Experiment
Authors:
Guofang Shen,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke Han,
Changda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (78 additional authors not shown)
Abstract:
We present a novel constraint on light dark matter utilizing $1.54$ tonne$\cdot$year of data acquired from the PandaX-4T dual-phase xenon time projection chamber. This constraint is derived through detecting electronic recoil signals resulting from the interaction with solar-enhanced dark matter flux. Low-mass dark matter particles, lighter than a few MeV/$c^2$, can scatter with the thermal electr…
▽ More
We present a novel constraint on light dark matter utilizing $1.54$ tonne$\cdot$year of data acquired from the PandaX-4T dual-phase xenon time projection chamber. This constraint is derived through detecting electronic recoil signals resulting from the interaction with solar-enhanced dark matter flux. Low-mass dark matter particles, lighter than a few MeV/$c^2$, can scatter with the thermal electrons in the Sun. Consequently, with higher kinetic energy, the boosted dark matter component becomes detectable via contact scattering with xenon electrons, resulting in a few keV energy deposition that exceeds the threshold of PandaX-4T. We calculate the expected recoil energy in PandaX-4T considering the Sun's acceleration and the detection capabilities of the xenon detector. The first experimental search results using the xenon detector yield the most stringent cross-section of $3.51 \times 10^{-39}~\mathrm{cm}^2$ at $0.08~\mathrm{MeV}$/$c^2$ for a solar boosted dark matter mass ranging from $0.02$ to $10~ \mathrm{MeV}$/$c^2$, achieving a 23 fold improvement compared with earlier experimental studies.
△ Less
Submitted 12 May, 2025; v1 submitted 27 December, 2024;
originally announced December 2024.
-
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Authors:
Xiaoyang Liu,
Boran Wen,
Xinpeng Liu,
Zizheng Zhou,
Hongwei Fan,
Cewu Lu,
Lizhuang Ma,
Yulong Chen,
Yong-Lu Li
Abstract:
Spatio-temporal Human-Object Interaction (ST-HOI) understanding aims at detecting HOIs from videos, which is crucial for activity understanding. However, existing whole-body-object interaction video benchmarks overlook the truth that open-world objects are diverse, that is, they usually provide limited and predefined object classes. Therefore, we introduce a new open-world benchmark: Grounding Int…
▽ More
Spatio-temporal Human-Object Interaction (ST-HOI) understanding aims at detecting HOIs from videos, which is crucial for activity understanding. However, existing whole-body-object interaction video benchmarks overlook the truth that open-world objects are diverse, that is, they usually provide limited and predefined object classes. Therefore, we introduce a new open-world benchmark: Grounding Interacted Objects (GIO) including 1,098 interacted objects class and 290K interacted object boxes annotation. Accordingly, an object grounding task is proposed expecting vision systems to discover interacted objects. Even though today's detectors and grounding methods have succeeded greatly, they perform unsatisfactorily in localizing diverse and rare objects in GIO. This profoundly reveals the limitations of current vision systems and poses a great challenge. Thus, we explore leveraging spatio-temporal cues to address object grounding and propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos. Our method demonstrates significant superiority in extensive experiments compared to current baselines. Data and code will be publicly available at https://github.com/DirtyHarryLYL/HAKE-AVA.
△ Less
Submitted 23 February, 2025; v1 submitted 27 December, 2024;
originally announced December 2024.
-
DeepSeek-V3 Technical Report
Authors:
DeepSeek-AI,
Aixin Liu,
Bei Feng,
Bing Xue,
Bingxuan Wang,
Bochao Wu,
Chengda Lu,
Chenggang Zhao,
Chengqi Deng,
Chenyu Zhang,
Chong Ruan,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Dongjie Ji,
Erhang Li,
Fangyun Lin,
Fucong Dai,
Fuli Luo,
Guangbo Hao,
Guanting Chen,
Guowei Li,
H. Zhang,
Han Bao
, et al. (175 additional authors not shown)
Abstract:
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for loa…
▽ More
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
△ Less
Submitted 18 February, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Exploring semi-relativistic $p$-wave dark matter annihilation in minimal Higgs portal near supermassive black hole
Authors:
Chih-Ting Lu,
Xiao-Yi Luo,
Zi-Qing Xia
Abstract:
We conduct a comprehensive analysis of potential annihilation processes of light dark matter (DM) in minimal Higgs portal models near supermassive black hole (Sgr A$^{\star}$) in the Galactic Center, considering interactions between DM particles mediated by either a light scalar or pseudoscalar with couplings $ c_s $ and $ c_p $. Accelerated by the supermassive black hole, DM particles can reach v…
▽ More
We conduct a comprehensive analysis of potential annihilation processes of light dark matter (DM) in minimal Higgs portal models near supermassive black hole (Sgr A$^{\star}$) in the Galactic Center, considering interactions between DM particles mediated by either a light scalar or pseudoscalar with couplings $ c_s $ and $ c_p $. Accelerated by the supermassive black hole, DM particles can reach velocities up to half the speed of light, significantly enhancing the $ p $-wave annihilation cross-section, allowing forbidden annihilation channels within specific mass ranges, and producing unique gamma-ray spectral signals. Utilizing gamma-ray observation from Fermi Large Area Telescope (Fermi-LAT) in the direction of Sgr $A^{\star}$, we constrain light DM parameter in the mass range of $ 0.3-10 \, \text{GeV} $ . Our results indicate that the couplings $ c_s $ and $ c_p $ are constrained to the order of $ 10^{-5} $, corresponding to a DM annihilation cross-section as low as $ 10^{-38} $$ {\rm cm}^3/{\rm s}$. In the future, the Very Large Gamma-ray Space Telescope (VLAST), with a larger detection area and broader detection range from $1$ MeV to $1$ TeV, will enhance our ability to probe sub-GeV DM and offer the opportunity to further study the forbidden annihilation scenario.
△ Less
Submitted 13 January, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Leptophilic Axion-like Particles at Forward Detectors
Authors:
Xu-Hui Jiang,
Chih-Ting Lu
Abstract:
Leptophilic axion-like particles (ALPs) exhibit rich phenomenology, focusing exclusively on interactions between an ALP and Standard Model (SM) leptons. Through integration by parts, it is shown that both the three-point interaction, $a\bar\ell\ell$ and the four-point interaction, $a\ell^-νW^+$, play significant roles, making the flavor portal particularly compelling. For ALPs with masses ranging…
▽ More
Leptophilic axion-like particles (ALPs) exhibit rich phenomenology, focusing exclusively on interactions between an ALP and Standard Model (SM) leptons. Through integration by parts, it is shown that both the three-point interaction, $a\bar\ell\ell$ and the four-point interaction, $a\ell^-νW^+$, play significant roles, making the flavor portal particularly compelling. For ALPs with masses ranging from $\mathcal O(1)$ MeV to $\mathcal O(1)$ GeV, they can contribute to exotic hadron decays. Suppressed couplings naturally extend the ALP lifetime, presenting opportunities for detection at forward detectors. In this study, we explore ALPs with both electrophilic and muonphilic scenarios. We propose an inclusive search for various hadrons that undergo exotic decays at the Large Hadron Collider (LHC). In the electrophilic scenario, long-lived ALPs are searched in the Forward Search Experiment (FASER) and its upgrading phase, FASER II. In the muonphilic scenario, where the ALP lifetime is significantly reduced due to its coupling to muons, we further investigate its detection potential at LHCb and its high-luminosity upgrade. Several benchmarks are analyzed, including electroweak-preserving, electroweak-violating and left-right softly asymmetric models, to demonstrate possible experimental constraints.
△ Less
Submitted 28 February, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Study of rare top quark decays into a jet plus a charged pseudo-scalar meson
Authors:
Long-Shun Lu,
Lei-Yi Li,
Cai-Dian Lü
Abstract:
The semi-inclusive decay processes of a top quark into a charged pseudo-scalar meson and a jet are studied within the framework of QCD factorization. The leading power of the decay matrix elements can be factorized into heavy-to-light quark transition current and a hadron matrix element up to next-to-leading order QCD corrections. We calculate one-loop virtual corrections together with real gluon…
▽ More
The semi-inclusive decay processes of a top quark into a charged pseudo-scalar meson and a jet are studied within the framework of QCD factorization. The leading power of the decay matrix elements can be factorized into heavy-to-light quark transition current and a hadron matrix element up to next-to-leading order QCD corrections. We calculate one-loop virtual corrections together with real gluon emission corrections at the αs order. The numerical results of the branching ratios are presented for the sum of two-body and three-body decays. We also study the energy cut-off dependence of the gluon jet. These processes are hopeful to be detected in the near future experiments, which can serve as probes for new physics.
△ Less
Submitted 1 April, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Boosted fusion gates above the percolation threshold for scalable graph-state generation
Authors:
Yong-Peng Guo,
Geng-Yan Zou,
Xing Ding,
Qi-Hang Zhang,
Mo-Chi Xu,
Run-Ze Liu,
Jun-Yi Zhao,
Zhen-Xuan Ge,
Li-Chao Peng,
Ke-Mi Xu,
Yi-Yang Lou,
Zhen Ning,
Lin-Jun Wang,
Hui Wang,
Yong-Heng Huo,
Yu-Ming He,
Chao-Yang Lu,
Jian-Wei Pan
Abstract:
Fusing small resource states into a larger, fully connected graph-state is essential for scalable photonic quantum computing. Theoretical analysis reveals that this can only be achieved when the success probability of the fusion gate surpasses a specific percolation threshold of 58.98% by using three-photon GHZ states as resource states. However, such an implementation of a fusion gate has never b…
▽ More
Fusing small resource states into a larger, fully connected graph-state is essential for scalable photonic quantum computing. Theoretical analysis reveals that this can only be achieved when the success probability of the fusion gate surpasses a specific percolation threshold of 58.98% by using three-photon GHZ states as resource states. However, such an implementation of a fusion gate has never been experimentally realized before. Here, we successfully demonstrate a boosted fusion gate with a theoretical success probability of 75%, using deterministically generated auxiliary states. The success probability is experimentally measured to be 71.0(7)%. We further demonstrate the effectiveness of the boosted fusion gate by fusing two Bell states with a fidelity of 67(2)%. Our work paves a crucial path toward scalable linear optical quantum computing.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
Authors:
Siyuan Bian,
Chenghao Xu,
Yuliang Xiu,
Artur Grigorev,
Zhen Liu,
Cewu Lu,
Michael J. Black,
Yao Feng
Abstract:
We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text…
▽ More
We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions, all within an interactive dialogue. These sewing patterns can then be draped on a 3D body and animated. This is achieved by finetuning a VLM to directly generate a JSON file that includes both textual descriptions of garment types and styles, as well as continuous numerical attributes. This JSON file is then used to create sewing patterns through a programming parametric model. To support this, we refine the existing programming model, GarmentCode, by expanding its garment type coverage and simplifying its structure for efficient VLM fine-tuning. Additionally, we construct a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs through an automated data pipeline. Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to simplify workflows in fashion and gaming applications. Code and data are available at https://chatgarment.github.io/ .
△ Less
Submitted 3 April, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Automating the Search for Artificial Life with Foundation Models
Authors:
Akarsh Kumar,
Chris Lu,
Louis Kirsch,
Yujin Tang,
Kenneth O. Stanley,
Phillip Isola,
David Ha
Abstract:
With the recent Nobel Prize awarded for radical advances in protein discovery, foundation models (FMs) for exploring large combinatorial spaces promise to revolutionize many scientific fields. Artificial Life (ALife) has not yet integrated FMs, thus presenting a major opportunity for the field to alleviate the historical burden of relying chiefly on manual design and trial-and-error to discover th…
▽ More
With the recent Nobel Prize awarded for radical advances in protein discovery, foundation models (FMs) for exploring large combinatorial spaces promise to revolutionize many scientific fields. Artificial Life (ALife) has not yet integrated FMs, thus presenting a major opportunity for the field to alleviate the historical burden of relying chiefly on manual design and trial-and-error to discover the configurations of lifelike simulations. This paper presents, for the first time, a successful realization of this opportunity using vision-language FMs. The proposed approach, called Automated Search for Artificial Life (ASAL), (1) finds simulations that produce target phenomena, (2) discovers simulations that generate temporally open-ended novelty, and (3) illuminates an entire space of interestingly diverse simulations. Because of the generality of FMs, ASAL works effectively across a diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. A major result highlighting the potential of this technique is the discovery of previously unseen Lenia and Boids lifeforms, as well as cellular automata that are open-ended like Conway's Game of Life. Additionally, the use of FMs allows for the quantification of previously qualitative phenomena in a human-aligned way. This new paradigm promises to accelerate ALife research beyond what is possible through human ingenuity alone.
△ Less
Submitted 16 May, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Generalizable Articulated Object Perception with Superpoints
Authors:
Qiaojun Yu,
Ce Hao,
Xibin Yuan,
Li Zhang,
Liu Liu,
Yukang Huo,
Rohit Agarwal,
Cewu Lu
Abstract:
Manipulating articulated objects with robotic arms is challenging due to the complex kinematic structure, which requires precise part segmentation for efficient manipulation. In this work, we introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that ef…
▽ More
Manipulating articulated objects with robotic arms is challenging due to the complex kinematic structure, which requires precise part segmentation for efficient manipulation. In this work, we introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that efficiently groups points based on their geometric and semantic similarities, resulting in clearer part boundaries. Furthermore, by leveraging the segmentation capabilities of the 2D foundation model SAM, we identify the centers of pixel regions and select corresponding superpoints as candidate query points. Integrating a query-based transformer decoder further enhances our method's ability to achieve precise part segmentation. Experimental results on the GAPartNet dataset show that our method outperforms existing state-of-the-art approaches in cross-category part segmentation, achieving AP50 scores of 77.9% for seen categories (4.4% improvement) and $39.3\%$ for unseen categories (11.6% improvement), with superior results in 5 out of 9 part categories for seen objects and outperforming all previous methods across all part categories for unseen objects.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.