Search | arXiv e-print repository

Privacy-preserving Decision-focused Learning for Multi-energy Systems

Authors: Yangze Zhou, Ruiyang Yao, Dalin Qin, Yixiong Jia, Yi Wang

Abstract: Decision-making for multi-energy system (MES) dispatch depends on accurate load forecasting. Traditionally, load forecasting and decision-making for MES are implemented separately. Forecasting models are typically trained to minimize forecasting errors, overlooking their impact on downstream decision-making. To address this, decision-focused learning (DFL) has been studied to minimize decision-mak… ▽ More Decision-making for multi-energy system (MES) dispatch depends on accurate load forecasting. Traditionally, load forecasting and decision-making for MES are implemented separately. Forecasting models are typically trained to minimize forecasting errors, overlooking their impact on downstream decision-making. To address this, decision-focused learning (DFL) has been studied to minimize decision-making costs instead. However, practical adoption of DFL in MES faces significant challenges: the process requires sharing sensitive load data and model parameters across multiple sectors, raising serious privacy issues. To this end, we propose a privacy-preserving DFL framework tailored for MES. Our approach introduces information masking to safeguard private data while enabling recovery of decision variables and gradients required for model training. To further enhance security for DFL, we design a safety protocol combining matrix decomposition and homomorphic encryption, effectively preventing collusion and unauthorized data access. Additionally, we developed a privacy-preserving load pattern recognition algorithm, enabling the training of specialized DFL models for heterogeneous load patterns. Theoretical analysis and comprehensive case studies, including real-world MES data, demonstrate that our framework not only protects privacy but also consistently achieves lower average daily dispatch costs compared to existing methods. △ Less

Submitted 23 October, 2025; originally announced October 2025.

Comments: 10 pages, 7 figures

arXiv:2510.20686 [pdf, ps, other]

Classical Noise Inversion: A Practical and Optimal framework for Robust Quantum Applications

Authors: Dayue Qin, Ying Li, You Zhou

Abstract: Quantum error mitigation is a critical technology for extracting reliable computations from noisy quantum processors, proving itself essential not only in the near term but also as a valuable supplement to fully fault-tolerant systems in the future. However, its practical implementation is hampered by two major challenges: the expansive cost of sampling from quantum circuits and the reliance on un… ▽ More Quantum error mitigation is a critical technology for extracting reliable computations from noisy quantum processors, proving itself essential not only in the near term but also as a valuable supplement to fully fault-tolerant systems in the future. However, its practical implementation is hampered by two major challenges: the expansive cost of sampling from quantum circuits and the reliance on unrealistic assumptions, such as gate-independent noise. Here, we introduce Classical Noise Inversion (CNI), a framework that fundamentally bypasses these crucial limitations and is well-suited for various quantum applications. CNI effectively inverts the accumulated noise entirely during classical post-processing, thereby eliminating the need for costly quantum circuit sampling and remaining effective under the realistic condition of gate-dependent noise. Apart from CNI, we introduce noise compression, which groups noise components with equivalent effects on measurement outcomes, achieving the optimal overhead for error mitigation. We integrate CNI with the framework of shadow estimation to create a robust protocol for learning quantum properties under general noise. Our analysis and numerical simulations demonstrate that this approach substantially reduces statistical variance while providing unbiased estimates in practical situations where previous methods fail. By transforming a key quantum overhead into a manageable classical cost, CNI opens a promising pathway towards scalable and practical quantum applications. △ Less

Submitted 23 October, 2025; originally announced October 2025.

arXiv:2510.15086 [pdf, ps, other]

Stem-Symmetry, Comb Products, and their Relation to Amoeba Graphs

Authors: Jillian Eddy, Ryan Pesak, Daniel Qin, Denae Ventura

Abstract: Local and global amoebas are families of labeled graphs that satisfy interpolation properties on a fixed vertex set. A labeled graph $G$ on $n$ vertices is a local amoeba (resp. global amoeba) if there exists a sequence of feasible edge-replacements between any two labelled embeddings of $G$ into $K_n$ (resp. $K_{n+1}$). Here, a feasible edge-replacement removes an edge and reinserts it so that th… ▽ More Local and global amoebas are families of labeled graphs that satisfy interpolation properties on a fixed vertex set. A labeled graph $G$ on $n$ vertices is a local amoeba (resp. global amoeba) if there exists a sequence of feasible edge-replacements between any two labelled embeddings of $G$ into $K_n$ (resp. $K_{n+1}$). Here, a feasible edge-replacement removes an edge and reinserts it so that the resulting graph is isomorphic to $G$; the induced relabeling yields a class of permutations of the label set. Motivated by classical group theoretic ideas, we introduce the hang group, a new invariant that can encode how local amoebas embed into larger ones. Using this framework, we identify necessary and sufficient conditions connecting stem-symmetric and hang-symmetric graphs with local and global amoebas. In particular, we show how hang-symmetry and stem-symmetry conditions propagate under the addition of leaves and isolated vertices, in turn yielding constructive criteria for both local and global amoebas. Finally, via wreath products, we provide four sets of sufficient conditions, one for each property, guaranteeing when the comb product is a local amoeba, a global amoeba, stem-symmetric, or hang-symmetric. These results strengthen and generalize existing constructions of local and global amoebas. △ Less

Submitted 16 October, 2025; originally announced October 2025.

MSC Class: 05C25

arXiv:2510.07819 [pdf, ps, other]

Symmetric Lorentzian Polynomials

Authors: Tracy Chin, Daniel Qin

Abstract: We study the class of Lorentzian symmetric polynomials and Lorentzian symmetric functions, which are defined to be symmetric functions for which every truncation of variables is Lorentzian. Similar to the space of Lorentzian polynomials, we show that the space of Lorentzian symmetric polynomials is homeomorphic to a closed Euclidean ball. Our main result is a reduction scheme that significantly re… ▽ More We study the class of Lorentzian symmetric polynomials and Lorentzian symmetric functions, which are defined to be symmetric functions for which every truncation of variables is Lorentzian. Similar to the space of Lorentzian polynomials, we show that the space of Lorentzian symmetric polynomials is homeomorphic to a closed Euclidean ball. Our main result is a reduction scheme that significantly reduces the complexity of testing for Lorentzianity. Using this method, we provide explicit semialgebraic descriptions of the spaces of Lorentzian symmetric polynomials and functions for degrees up to six. These techniques can also be applied to simplify the proofs to known cases of Lorentzian symmetric functions. We conclude by showing that some natural symmetric operators fail to preserve Lorentzianity which in turn highlights an inherent tension between symmetry in variables and the Lorentzian property. △ Less

Submitted 9 October, 2025; originally announced October 2025.

Comments: 36 pages, 1 figure

arXiv:2510.07704 [pdf, ps, other]

Surface band-selective moiré effect induces flat band in mixed-dimensional heterostructures

Authors: Shuming Yu, Zhentao Fu, Dingkun Qin, Enting Li, Hao Zhong, Xingzhe Wang, Keming Zhao, Shangkun Mo, Qiang Wan, Yiwei Li, Jie Li, Jianxin Zhong, Hong Ding, Nan Xu

Abstract: In this work, we reveal a curious type of moiré effect that selectively modifies the surface states of bulk crystal. We synthesize mixed-dimensional heterostructures consisting of a noble gas monolayer grow on the surface of bulk Bi(111), and determine the electronic structure of the heterostructures using angle-resolved photoemission spectroscopy. We directly observe moiré replicas of the Bi(111)… ▽ More In this work, we reveal a curious type of moiré effect that selectively modifies the surface states of bulk crystal. We synthesize mixed-dimensional heterostructures consisting of a noble gas monolayer grow on the surface of bulk Bi(111), and determine the electronic structure of the heterostructures using angle-resolved photoemission spectroscopy. We directly observe moiré replicas of the Bi(111) surface states, while the bulk states remain barely changed. Meanwhile, we achieve control over the moiré period in the range of 25 Å to 80 Å by selecting monolayers of different noble gases and adjusting the annealing temperature. At large moiré periods, we observe hybridization between the surface band replicas, which leads to the formation of a correlated flat band. Our results serve as a bridge for understanding the moiré modulation effect from 2D to 3D systems, and provide a feasible approach for the realization of correlated phenomena through the engineering of surface states via moiré effects. △ Less

Submitted 8 October, 2025; originally announced October 2025.

Comments: 5 pages, 4 figures

arXiv:2509.18189 [pdf, ps, other]

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

Authors: Daxiang Dong, Mingming Zheng, Dong Xu, Bairong Zhuang, Wenyu Zhang, Chunhua Luo, Haoran Wang, Zijian Zhao, Jie Li, Yuxuan Li, Hanjun Zhong, Mengyue Liu, Jieting Chen, Shupeng Li, Lun Tian, Yaping Feng, Xin Li, Donggang Jiang, Yong Chen, Yehua Xu, Duohao Qin, Chen Feng, Dan Wang, Henghua Zhang, Jingjing Ha , et al. (10 additional authors not shown)

Abstract: We present Qianfan-VL, a series of multimodal large language models ranging from 3B to 70B parameters, achieving state-of-the-art performance through innovative domain enhancement techniques. Our approach employs multi-stage progressive training and high-precision data synthesis pipelines, which prove to be critical technologies for enhancing domain-specific capabilities while maintaining strong g… ▽ More We present Qianfan-VL, a series of multimodal large language models ranging from 3B to 70B parameters, achieving state-of-the-art performance through innovative domain enhancement techniques. Our approach employs multi-stage progressive training and high-precision data synthesis pipelines, which prove to be critical technologies for enhancing domain-specific capabilities while maintaining strong general performance. Qianfan-VL achieves comparable results to leading open-source models on general benchmarks, with state-of-the-art performance on benchmarks such as CCBench, SEEDBench IMG, ScienceQA, and MMStar. The domain enhancement strategy delivers significant advantages in OCR and document understanding, validated on both public benchmarks (OCRBench 873, DocVQA 94.75%) and in-house evaluations. Notably, Qianfan-VL-8B and 70B variants incorporate long chain-of-thought capabilities, demonstrating superior performance on mathematical reasoning (MathVista 78.6%) and logical inference tasks. All models are trained entirely on Baidu's Kunlun P800 chips, validating the capability of large-scale AI infrastructure to train SOTA-level multimodal models with over 90% scaling efficiency on 5000 chips for a single task. This work establishes an effective methodology for developing domain-enhanced multimodal models suitable for diverse enterprise deployment scenarios. △ Less

Submitted 19 September, 2025; originally announced September 2025.

Comments: 12 pages

arXiv:2509.05747 [pdf, ps, other]

doi 10.1145/3747871

InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios

Authors: Leo Ho, Yinghao Huang, Dafei Qin, Mingyi Shi, Wangpok Tse, Wei Liu, Junichi Yamagishi, Taku Komura

Abstract: We address the problem of accurate capture of interactive behaviors between two people in daily scenarios. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and/or position of each actor are constant or barely change over each interaction. In contrast, we propose to simultaneously model two people's activitie… ▽ More We address the problem of accurate capture of interactive behaviors between two people in daily scenarios. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and/or position of each actor are constant or barely change over each interaction. In contrast, we propose to simultaneously model two people's activities, and target objective-driven, dynamic, and semantically consistent interactions which often span longer duration and cover bigger space. To this end, we capture a new multi-modal dataset dubbed InterAct, which is composed of 241 motion sequences where two people perform a realistic and coherent scenario for one minute or longer over a complete interaction. For each sequence, two actors are assigned different roles and emotion labels, and collaborate to finish one task or conduct a common interaction activity. The audios, body motions, and facial expressions of both persons are captured. InterAct contains diverse and complex motions of individuals and interesting and relatively long-term interaction patterns barely seen before. We also demonstrate a simple yet effective diffusion-based method that estimates interactive face expressions and body motions of two people from speech inputs. Our method regresses the body motions in a hierarchical manner, and we also propose a novel fine-tuning mechanism to improve the lip accuracy of facial expressions. To facilitate further research, the data and code is made available at https://hku-cg.github.io/interact/ . △ Less

Submitted 6 September, 2025; originally announced September 2025.

Comments: The first two authors contributed equally to this work

ACM Class: I.5.4

Journal ref: Proceedings of the ACM on Computer Graphics and Interactive Techniques 8.4 (2025) 53:1-27

arXiv:2508.21479 [pdf, ps, other]

doi 10.1038/s41567-025-03005-5

Realization of an untrusted intermediate relay architecture using a quantum dot single-photon source

Authors: Mi Zou, Yu-Ming He, Yizhi Huang, Jun-Yi Zhao, Bin-Chen Li, Yong-Peng Guo, Xing Ding, Mo-Chi Xu, Run-Ze Liu, Geng-Yan Zou, Zhen Ning, Xiang You, Hui Wang, Wen-Xin Pan, Hao-Tao Zhu, Ming-Yang Zheng, Xiu-Ping Xie, Dandan Qin, Xiao Jiang, Yong-Heng Huo, Qiang Zhang, Chao-Yang Lu, Xiongfeng Ma, Teng-Yun Chen, Jian-Wei Pan

Abstract: To fully exploit the potential of quantum technologies, quantum networks are needed to link different systems, significantly enhancing applications in computing, cryptography, and metrology. Central to these networks are quantum relays that can facilitate long-distance entanglement distribution and quantum communication. In this work, we present a modular and scalable quantum relay architecture us… ▽ More To fully exploit the potential of quantum technologies, quantum networks are needed to link different systems, significantly enhancing applications in computing, cryptography, and metrology. Central to these networks are quantum relays that can facilitate long-distance entanglement distribution and quantum communication. In this work, we present a modular and scalable quantum relay architecture using a high-quality single-photon source. The proposed network incorporates three untrusted intermediate nodes and is capable of a repetition rate of 304.52 MHz. We use a measurement-device-independent protocol to demonstrate secure key establishment over fibers covering up to 300 kilometers. This study highlights the potential of single-photon sources in quantum relays to enhance information transmission, expand network coverage, and improve deployment flexibility, with promising applications in future quantum networks. △ Less

Submitted 29 August, 2025; originally announced August 2025.

Comments: 29 pages,17 figures, 2 tables

arXiv:2507.19050 [pdf, ps, other]

Large Language Model-Based Task Offloading and Resource Allocation for Digital Twin Edge Computing Networks

Authors: Qiong Wu, Yu Xie, Pingyi Fan, Dong Qin, Kezhi Wang, Nan Cheng, Khaled B. Letaief

Abstract: In this paper, we propose a general digital twin edge computing network comprising multiple vehicles and a server. Each vehicle generates multiple computing tasks within a time slot, leading to queuing challenges when offloading tasks to the server. The study investigates task offloading strategies, queue stability, and resource allocation. Lyapunov optimization is employed to transform long-term… ▽ More In this paper, we propose a general digital twin edge computing network comprising multiple vehicles and a server. Each vehicle generates multiple computing tasks within a time slot, leading to queuing challenges when offloading tasks to the server. The study investigates task offloading strategies, queue stability, and resource allocation. Lyapunov optimization is employed to transform long-term constraints into tractable short-term decisions. To solve the resulting problem, an in-context learning approach based on large language model (LLM) is adopted, replacing the conventional multi-agent reinforcement learning (MARL) framework. Experimental results demonstrate that the LLM-based method achieves comparable or even superior performance to MARL. △ Less

Submitted 25 July, 2025; originally announced July 2025.

Comments: This paper has been submitted to IEEE TMC

arXiv:2507.14582 [pdf, ps, other]

BT-TL-DMPs: A Novel Robot TAMP Framework Combining Behavior Tree, Temporal Logic and Dynamical Movement Primitives

Authors: Zezhi Liu, Shizhen Wu, Hanqian Luo, Deyun Qin, Yongchun Fang

Abstract: In the field of Learning from Demonstration (LfD), enabling robots to generalize learned manipulation skills to novel scenarios for long-horizon tasks remains challenging. Specifically, it is still difficult for robots to adapt the learned skills to new environments with different task and motion requirements, especially in long-horizon, multi-stage scenarios with intricate constraints. This paper… ▽ More In the field of Learning from Demonstration (LfD), enabling robots to generalize learned manipulation skills to novel scenarios for long-horizon tasks remains challenging. Specifically, it is still difficult for robots to adapt the learned skills to new environments with different task and motion requirements, especially in long-horizon, multi-stage scenarios with intricate constraints. This paper proposes a novel hierarchical framework, called BT-TL-DMPs, that integrates Behavior Tree (BT), Temporal Logic (TL), and Dynamical Movement Primitives (DMPs) to address this problem. Within this framework, Signal Temporal Logic (STL) is employed to formally specify complex, long-horizon task requirements and constraints. These STL specifications are systematically transformed to generate reactive and modular BTs for high-level decision-making task structure. An STL-constrained DMP optimization method is proposed to optimize the DMP forcing term, allowing the learned motion primitives to adapt flexibly while satisfying intricate spatiotemporal requirements and, crucially, preserving the essential dynamics learned from demonstrations. The framework is validated through simulations demonstrating generalization capabilities under various STL constraints and real-world experiments on several long-horizon robotic manipulation tasks. The results demonstrate that the proposed framework effectively bridges the symbolic-motion gap, enabling more reliable and generalizable autonomous manipulation for complex robotic tasks. △ Less

Submitted 19 July, 2025; originally announced July 2025.

Comments: 11 pages, 8 figures

arXiv:2507.13237 [pdf, ps, other]

Robust and efficient estimation of global quantum properties under realistic noise

Authors: Qingyue Zhang, Dayue Qin, Zhou You, Feng Xu, Jens Eisert, You Zhou

Abstract: Measuring global quantum properties -- such as the fidelity to complex multipartite states -- is both an essential and experimentally challenging task. Classical shadow estimation offers favorable sample complexity, but typically relies on many-qubit circuits that are difficult to realize on current platforms. We propose the robust phase shadow scheme, a measurement framework based on random circu… ▽ More Measuring global quantum properties -- such as the fidelity to complex multipartite states -- is both an essential and experimentally challenging task. Classical shadow estimation offers favorable sample complexity, but typically relies on many-qubit circuits that are difficult to realize on current platforms. We propose the robust phase shadow scheme, a measurement framework based on random circuits with controlled-Z as the unique entangling gate type, tailored to architectures such as trapped ions and neutral atoms. Leveraging tensor diagrammatic reasoning, we rigorously analyze the induced circuit ensemble and show that phase shadows match the performance of full Clifford-based ones. Importantly, our approach supports a noise-robust extension via purely classical post-processing, enabling reliable estimation under realistic, gate-dependent noise where existing techniques often fail. Additionally, by exploiting structural properties of random stabilizer states, we design an efficient post-processing algorithm that resolves a key computational bottleneck in previous shadow protocols. Our results enhance the practicality of shadow-based techniques, providing a robust and scalable route for estimating global properties in noisy quantum systems. △ Less

Submitted 17 July, 2025; originally announced July 2025.

Comments: 7+34 pages, 3+12 figures

arXiv:2507.08419 [pdf]

Observation of quasi-steady dark excitons and gap phase in a doped semiconductor

Authors: Shangkun Mo, Yunfei Bai, Chunlong Wu, Xingxia Cui, Guangqiang Mei, Qiang Wan, Renzhe Li, Cao Peng, Keming Zhao, Dingkun Qin, Shuming Yu, Hao Zhong, Xingzhe Wang, Enting Li, Yiwei Li, Limin Cao, Min Feng, Sheng Meng, Nan Xu

Abstract: Exciton plays an important role in optics and optics-related behaviors and leads to novel correlated phases like charge order, exciton insulator, and exciton-polariton condensation. Dark exciton shows distinct properties from bright one. However, it cannot be directly detected by conventional optic measurements. The electronic modulation effect of dark excitons in quasi-equilibrium distribution, c… ▽ More Exciton plays an important role in optics and optics-related behaviors and leads to novel correlated phases like charge order, exciton insulator, and exciton-polariton condensation. Dark exciton shows distinct properties from bright one. However, it cannot be directly detected by conventional optic measurements. The electronic modulation effect of dark excitons in quasi-equilibrium distribution, critical for electronic devices in working status, is still elusive. Here, using angle-resolved photoemission spectroscopy, we report creating, detecting, and controlling dark excitons in the quasi-equilibrium distribution in a doped semiconductor SnSe2. Surprisingly, we observe an excitonic gap phase, with a conduction band opening an anisotropic gap. Our results broaden the scope of dark excitons, extending their studies from the picosecond timescale in the ultrafast photoemission process to conditions occurring under quasi-equilibrium. We reveal the light-matter interaction in the engineering of electronic structures and provide a new way to realize the excitonic gap phase in semiconductors with large band gaps. △ Less

Submitted 11 July, 2025; originally announced July 2025.

Comments: 16 pages, 5 figures

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3410 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 16 October, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2505.16577 [pdf, other]

Large Language Model-Empowered Interactive Load Forecasting

Authors: Yu Zuo, Dalin Qin, Yi Wang

Abstract: The growing complexity of power systems has made accurate load forecasting more important than ever. An increasing number of advanced load forecasting methods have been developed. However, the static design of current methods offers no mechanism for human-model interaction. As the primary users of forecasting models, system operators often find it difficult to understand and apply these advanced m… ▽ More The growing complexity of power systems has made accurate load forecasting more important than ever. An increasing number of advanced load forecasting methods have been developed. However, the static design of current methods offers no mechanism for human-model interaction. As the primary users of forecasting models, system operators often find it difficult to understand and apply these advanced models, which typically requires expertise in artificial intelligence (AI). This also prevents them from incorporating their experience and real-world contextual understanding into the forecasting process. Recent breakthroughs in large language models (LLMs) offer a new opportunity to address this issue. By leveraging their natural language understanding and reasoning capabilities, we propose an LLM-based multi-agent collaboration framework to bridge the gap between human operators and forecasting models. A set of specialized agents is designed to perform different tasks in the forecasting workflow and collaborate via a dedicated communication mechanism. This framework embeds interactive mechanisms throughout the load forecasting pipeline, reducing the technical threshold for non-expert users and enabling the integration of human experience. Our experiments demonstrate that the interactive load forecasting accuracy can be significantly improved when users provide proper insight in key stages. Our cost analysis shows that the framework remains affordable, making it practical for real-world deployment. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2504.20839 [pdf, other]

Universal language model with the intervention of quantum theory

Authors: D. -F. Qin

Abstract: This paper examines language modeling based on the theory of quantum mechanics. It focuses on the introduction of quantum mechanics into the symbol-meaning pairs of language in order to build a representation model of natural language. At the same time, it is realized that word embedding, which is widely used as a basic technique for statistical language modeling, can be explained and improved by… ▽ More This paper examines language modeling based on the theory of quantum mechanics. It focuses on the introduction of quantum mechanics into the symbol-meaning pairs of language in order to build a representation model of natural language. At the same time, it is realized that word embedding, which is widely used as a basic technique for statistical language modeling, can be explained and improved by the mathematical framework of quantum mechanics. On this basis, this paper continues to try to use quantum statistics and other related theories to study the mathematical representation, natural evolution and statistical properties of natural language. It is also assumed that the source of such quantum properties is the physicality of information. The feasibility of using quantum theory to model natural language is pointed out through the construction of a experimental code. The paper discusses, in terms of applications, the possible help of the theory in constructing generative models that are popular nowadays. A preliminary discussion of future applications of the theory to quantum computers is also presented. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.07553 [pdf]

Single-Cell Trajectory Reconstruction Reveals Migration Potential of Cell Populations

Authors: Yanping Liu, Dui Qin, Xinwei Li, Guoqiang Li, Zhichao Liu, Kena Song, Wei Wang, Zhangyong Li

Abstract: Cell migration, which is strictly regulated by intracellular and extracellular cues, is crucial for normal physiological processes and the progression of certain diseases. However, there is a lack of an efficient approach to analyze super-statistical and time-varying characteristics of cell migration based on single trajectories. Here, we propose an approach to reconstruct single-cell trajectories… ▽ More Cell migration, which is strictly regulated by intracellular and extracellular cues, is crucial for normal physiological processes and the progression of certain diseases. However, there is a lack of an efficient approach to analyze super-statistical and time-varying characteristics of cell migration based on single trajectories. Here, we propose an approach to reconstruct single-cell trajectories, which incorporates wavelet transform, power spectrum of an OU-process, and fits of the power spectrum to analyze statistical and time-varying properties of customized target-finding and migration metrics. Our results reveal diverse relationships between motility parameters and dynamic metrics, especially the existence of an optimal parameter range. Moreover, the analysis reveals that the loss of Arpin protein enhances the migration potential of D. discoideum, and a previously reported result that the rescued amoeba is distinguishable from the wild-type amoeba. Significantly, time-varying dynamic metrics emerge periodic phenomena under the influence of irregularly changing parameters, which correlates with migration potential. Our analysis suggests that the approach provides a powerful tool for estimating time-dependent migration potential and statistical features of single-cell trajectories, enabling a better understanding of the relationship between intracellular proteins and cellular behaviors. This also provides more insights on the migration dynamics of single cells and cell populations. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2503.20743 [pdf, other]

Topology of The Polar Vortex and Montana Weather

Authors: Joshua Dorrington, Sushovan Majhi, Atish Mitra, James Moukheiber, Demi Qin, Jacob Sriraman, Kristian Strommen

Abstract: This paper explores the use of Topological Data Analysis (TDA) to investigate patterns in zonal-mean zonal winds of the Arctic, which make up the polar vortex, in order to better explain polar vortex dynamics. We demonstrate how TDA reveals significant topological features in this polar vortex data, and how they may relate these features to the collapse of the stratospheric vortex during the winte… ▽ More This paper explores the use of Topological Data Analysis (TDA) to investigate patterns in zonal-mean zonal winds of the Arctic, which make up the polar vortex, in order to better explain polar vortex dynamics. We demonstrate how TDA reveals significant topological features in this polar vortex data, and how they may relate these features to the collapse of the stratospheric vortex during the winter in the northern hemisphere. Using a time series representation of this data, we build a point cloud using the principles of Takens' Embedding theorem and apply persistent homology to uncover nontrivial topological structures that provide insight into the dynamical system's chaotic and periodic behaviors. These structures can offer new perspectives on the dynamics of the polar vortex, and perhaps other weather regimes, all of which have a global impact. Our results show clear transitions between seasons, with substantial increases in topological activity during periods of extreme cold. This is particularly evident in the historically strong polar vortex event of early 2016. Our analysis captures the persistence of topological features during such events and may even offer insights into vortex splitting, as indicated by the number of distinct persistent features. This work highlights the potential of TDA in climate science, offering a novel approach to studying complex dynamical systems. △ Less

Submitted 26 March, 2025; originally announced March 2025.

MSC Class: 37N10

arXiv:2412.02419 [pdf, other]

It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model

Authors: Mingyi Shi, Dafei Qin, Leo Ho, Zhouyingcheng Liao, Yinghao Huang, Junichi Yamagishi, Taku Komura

Abstract: Conversational scenarios are very common in real-world settings, yet existing co-speech motion synthesis approaches often fall short in these contexts, where one person's audio and gestures will influence the other's responses. Additionally, most existing methods rely on offline sequence-to-sequence frameworks, which are unsuitable for online applications. In this work, we introduce an audio-drive… ▽ More Conversational scenarios are very common in real-world settings, yet existing co-speech motion synthesis approaches often fall short in these contexts, where one person's audio and gestures will influence the other's responses. Additionally, most existing methods rely on offline sequence-to-sequence frameworks, which are unsuitable for online applications. In this work, we introduce an audio-driven, auto-regressive system designed to synthesize dynamic movements for two characters during a conversation. At the core of our approach is a diffusion-based full-body motion synthesis model, which is conditioned on the past states of both characters, speech audio, and a task-oriented motion trajectory input, allowing for flexible spatial control. To enhance the model's ability to learn diverse interactions, we have enriched existing two-person conversational motion datasets with more dynamic and interactive motions. We evaluate our system through multiple experiments to show it outperforms across a variety of tasks, including single and two-person co-speech motion generation, as well as interactive motion generation. To the best of our knowledge, this is the first system capable of generating interactive full-body motions for two characters from speech in an online manner. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 15 pages, 10 figures

arXiv:2412.01654 [pdf, other]

FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Authors: Zhengnan Li, Haoxuan Li, Hao Wang, Jun Fang, Duoyin Li Yunxiao Qin

Abstract: Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channe… ▽ More Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series forecasting, comprising of two kinds of modules: \textbf{S}implex \textbf{C}hannel-\textbf{W}ise MLP (SCWM) and \textbf{F}requency \textbf{T}emporal \textbf{M}LP (FTM). The SCWM effectively leverages the Simplex-MLP to capture inter-channel dependencies, while the FTM is a simple yet efficient temporal MLP designed to extract temporal information from the data. Our theoretical analysis shows that the upper bound of the Rademacher Complexity for Simplex-MLP is lower than that for standard MLPs. Moreover, we validate our proposed method on seven benchmark datasets, demonstrating significant improvements in forecasting accuracy and efficiency, while also showcasing superior scalability. Additionally, we demonstrate that Simplex-MLP can improve other methods that use channel-wise MLP to achieve less overfitting and improved performance. Code are available \href{https://github.com/FMLYD/FSMLP}{\textcolor{red}{here}}. △ Less

Submitted 2 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.15764 [pdf, other]

LLM Online Spatial-temporal Signal Reconstruction Under Noise

Authors: Yi Yan, Dayu Qin, Ercan Engin Kuruoglu

Abstract: This work introduces the LLM Online Spatial-temporal Reconstruction (LLM-OSR) framework, which integrates Graph Signal Processing (GSP) and Large Language Models (LLMs) for online spatial-temporal signal reconstruction. The LLM-OSR utilizes a GSP-based spatial-temporal signal handler to enhance graph signals and employs LLMs to predict missing values based on spatiotemporal patterns. The performan… ▽ More This work introduces the LLM Online Spatial-temporal Reconstruction (LLM-OSR) framework, which integrates Graph Signal Processing (GSP) and Large Language Models (LLMs) for online spatial-temporal signal reconstruction. The LLM-OSR utilizes a GSP-based spatial-temporal signal handler to enhance graph signals and employs LLMs to predict missing values based on spatiotemporal patterns. The performance of LLM-OSR is evaluated on traffic and meteorological datasets under varying Gaussian noise levels. Experimental results demonstrate that utilizing GPT-4-o mini within the LLM-OSR is accurate and robust under Gaussian noise conditions. The limitations are discussed along with future research insights, emphasizing the potential of combining GSP techniques with LLMs for solving spatiotemporal prediction tasks. △ Less

Submitted 24 November, 2024; originally announced November 2024.

arXiv:2410.18718 [pdf, other]

LLM-based Online Prediction of Time-varying Graph Signals

Authors: Dayu Qin, Yi Yan, Ercan Engin Kuruoglu

Abstract: In this paper, we propose a novel framework that leverages large language models (LLMs) for predicting missing values in time-varying graph signals by exploiting spatial and temporal smoothness. We leverage the power of LLM to achieve a message-passing scheme. For each missing node, its neighbors and previous estimates are fed into and processed by LLM to infer the missing observations. Tested on… ▽ More In this paper, we propose a novel framework that leverages large language models (LLMs) for predicting missing values in time-varying graph signals by exploiting spatial and temporal smoothness. We leverage the power of LLM to achieve a message-passing scheme. For each missing node, its neighbors and previous estimates are fed into and processed by LLM to infer the missing observations. Tested on the task of the online prediction of wind-speed graph signals, our model outperforms online graph filtering algorithms in terms of accuracy, demonstrating the potential of LLMs in effectively addressing partially observed signals in graphs. △ Less

Submitted 24 October, 2024; originally announced October 2024.

arXiv:2410.16874 [pdf, other]

Topological and Graph Theoretical Analysis of Dynamic Functional Connectivity for Autism Spectrum Disorder

Authors: Yuzhe Chen, Dayu Qin, Ercan Engin Kuruoglu

Abstract: Autism Spectrum Disorder (ASD) is a prevalent neurological disorder. However, the multi-faceted symptoms and large individual differences among ASD patients are hindering the diagnosis process, which largely relies on subject descriptions and lacks quantitative biomarkers. To remediate such problems, this paper explores the use of graph theory and topological data analysis (TDA) to study brain act… ▽ More Autism Spectrum Disorder (ASD) is a prevalent neurological disorder. However, the multi-faceted symptoms and large individual differences among ASD patients are hindering the diagnosis process, which largely relies on subject descriptions and lacks quantitative biomarkers. To remediate such problems, this paper explores the use of graph theory and topological data analysis (TDA) to study brain activity in ASD patients and normal controls. We employ the Mapper algorithm in TDA and the distance correlation graphical model (DCGM) in graph theory to create brain state networks, then innovatively adopt complex network metrics in Graph signal processing (GSP) and physical quantities to analyze brain activities over time. Our findings reveal statistical differences in network characteristics between ASD and control groups. Compared to normal subjects, brain state networks of ASD patients tend to have decreased modularity, higher von Neumann entropy, increased Betti-0 numbers, and decreased Betti-1 numbers. These findings attest to the biological traits of ASD, suggesting less organized and more variable brain dynamics. These findings offer potential biomarkers for ASD diagnosis and deepen our understanding of its neural correlations. △ Less

Submitted 8 November, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

Comments: Accepted by the Brain Informatics 2024 Conference. This is the final version of the paper for the conference. First author: Yuzhe Chen. Second author: Dayu Qin. Third & Corresponding author: Ercan Engin Kuruoglu

arXiv:2410.14142 [pdf, ps, other]

Secure Collaborative Computation Offloading and Resource Allocation in Cache-Assisted Ultra-Dense IoT Networks With Multi-Slope Channels

Authors: Tianqing Zhou, Bobo Wang, Dong Qin, Xuefang Nie, Nan Jiang, Chunguo Li

Abstract: Cache-assisted ultra-dense mobile edge computing (MEC) networks are a promising solution for meeting the increasing demands of numerous Internet-of-Things mobile devices (IMDs). To address the complex interferences caused by small base stations (SBSs) deployed densely in such networks, this paper explores the combination of orthogonal frequency division multiple access (OFDMA), non-orthogonal mult… ▽ More Cache-assisted ultra-dense mobile edge computing (MEC) networks are a promising solution for meeting the increasing demands of numerous Internet-of-Things mobile devices (IMDs). To address the complex interferences caused by small base stations (SBSs) deployed densely in such networks, this paper explores the combination of orthogonal frequency division multiple access (OFDMA), non-orthogonal multiple access (NOMA), and base station (BS) clustering. Additionally, security measures are introduced to protect IMDs' tasks offloaded to BSs from potential eavesdropping and malicious attacks. As for such a network framework, a computation offloading scheme is proposed to minimize IMDs' energy consumption while considering constraints such as delay, power, computing resources, and security costs, optimizing channel selections, task execution decisions, device associations, power controls, security service assignments, and computing resource allocations. To solve the formulated problem efficiently, we develop a further improved hierarchical adaptive search (FIHAS) algorithm, giving some insights into its parallel implementation, computation complexity, and convergence. Simulation results demonstrate that the proposed algorithms can achieve lower total energy consumption and delay compared to other algorithms when strict latency and cost constraints are imposed. △ Less

Submitted 21 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.12186 [pdf, ps, other]

Joint Data Compression, Secure Multi-Part Collaborative Task Offloading and Resource Assignment in Ultra-Dense Networks

Authors: Tianqing Zhou, Kangle Liu, Dong Qin, Xuan Li, Nan Jiang, Chunguo Li

Abstract: To enhance resource utilization and address interference issues in ultra-dense networks with mobile edge computing (MEC), a resource utilization approach is first introduced, which integrates orthogonal frequency division multiple access (OFDMA) and non-orthogonal multiple access (NOMA). Then, to minimize the energy consumed by ultra-densely deployed small base stations (SBSs) while ensuring propo… ▽ More To enhance resource utilization and address interference issues in ultra-dense networks with mobile edge computing (MEC), a resource utilization approach is first introduced, which integrates orthogonal frequency division multiple access (OFDMA) and non-orthogonal multiple access (NOMA). Then, to minimize the energy consumed by ultra-densely deployed small base stations (SBSs) while ensuring proportional assignment of computational resources and the constraints related to processing delay and security breach cost, the joint optimization of channel selection, the number of subchannels, secure service assignment, multi-step computation offloading, device association, data compression (DC) control, power control, and frequency band partitioning is done for minimizing network-wide energy consumption (EC). Given that the current problem is nonlinear and involves integral optimization parameters, we have devised an adaptive genetic water wave optimization (AGWWO) algorithm by improving the traditional water wave optimization (WWO) algorithm using genetic operations. After that, the computational complexity, convergence, and parallel implementation of AGWWO algorithm are analyzed. Simulation results reveal that this algorithm effectively reduces network-wide EC while guaranteeing the constraints of processing delay and security breach cost. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2409.19718 [pdf, other]

Evolving Multi-Scale Normalization for Time Series Forecasting under Distribution Shifts

Authors: Dalin Qin, Yehui Li, Weiqi Chen, Zhaoyang Zhu, Qingsong Wen, Liang Sun, Pierre Pinson, Yi Wang

Abstract: Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving fun… ▽ More Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving functions of distribution dynamics and normalized mapping relationships. To this end, we propose a novel model-agnostic Evolving Multi-Scale Normalization (EvoMSN) framework to tackle the distribution shift problem. Flexible normalization and denormalization are proposed based on the multi-scale statistics prediction module and adaptive ensembling. An evolving optimization strategy is designed to update the forecasting model and statistics prediction module collaboratively to track the shifting distributions. We evaluate the effectiveness of EvoMSN in improving the performance of five mainstream forecasting methods on benchmark datasets and also show its superiority compared to existing advanced normalization and online learning approaches. The code is publicly available at https://github.com/qindalin/EvoMSN. △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2409.18018 [pdf]

Molecular dynamics simulations of interaction between a super edge dislocation and interstitial dislocation loops in irradiated L12-Ni3Al

Authors: Cheng Chen, Dongyang Qin, Yiding Wang, Fei Xu, Jun Song

Abstract: The study employed MD simulations to investigate the interactions between a <110> super-edge dislocation, consisting of the four Shockley partials, and interstitial dislocation loops (IDLs) in irradiated L12-Ni3Al. Accounting for symmetry breakage in the L12 lattice, the superlattice planar faults with four distinct fault vectors have been considered for different IDL configurations. The detailed… ▽ More The study employed MD simulations to investigate the interactions between a <110> super-edge dislocation, consisting of the four Shockley partials, and interstitial dislocation loops (IDLs) in irradiated L12-Ni3Al. Accounting for symmetry breakage in the L12 lattice, the superlattice planar faults with four distinct fault vectors have been considered for different IDL configurations. The detailed dislocation reactions and structural evolution events were identified as the four partials interacted with various IDL configurations. The slipping characteristics of Shockley partials within the IDLs and the resultant shearing and looping mechanisms were also clarified, revealing distinct energetic transition states determined by the fault vectors after the Shockley partials sweeping the IDL. Furthermore, significant variations in critical resolved shear stress (CRSS) required for the super-edge dislocation to move past the IDL were observed, attributed to various sizes and faulted vectors of enclosed superlattice planar faults in the IDLs. The current study extends the existing dislocation-IDL interaction theory from pristine FCC to L12 lattice, advances the understanding of irradiation hardening effects in L12-Ni3Al, and suggests potential applicability to other L12 systems. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: 25pages,10 figures

arXiv:2409.07441 [pdf, other]

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Authors: Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura

Abstract: We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we in… ▽ More We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets. △ Less

Submitted 30 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: Project Page: https://dafei-qin.github.io/TransGS.github.io/

arXiv:2409.01039 [pdf, other]

Exploring Neurofunctional Phase Transition Patterns in Autism Spectrum Disorder: A Thermodynamics Parameters Analysis Approach

Authors: Dayu Qin, Yuzhe Chen, Ercan Engin Kuruoglu

Abstract: Designing network parameters that can effectively represent complex networks is of significant importance for the analysis of time-varying complex networks. This paper introduces a novel thermodynamic framework for analyzing complex networks, focusing on Spectral Core Entropy (SCE), Node Energy, internal energy and temperature to measure structural changes in dynamic complex network. This framewor… ▽ More Designing network parameters that can effectively represent complex networks is of significant importance for the analysis of time-varying complex networks. This paper introduces a novel thermodynamic framework for analyzing complex networks, focusing on Spectral Core Entropy (SCE), Node Energy, internal energy and temperature to measure structural changes in dynamic complex network. This framework provides a quantitative representation of network characteristics, capturing time-varying structural changes. We apply this framework to study brain activity in autism versus control subjects, illustrating its potential to identify significant structural changes and brain state transitions. By treating brain networks as thermodynamic systems, important parameters such as node energy and temperature are derived to depict brain activities. Our research has found that in our designed framework the thermodynamic parameter-temperature, is significantly correlated with the transitions of brain states. Statistical tests confirm the effectiveness of our approach. Moreover, our study demonstrates that node energy effectively captures the activity levels of brain regions and reveals biologically proven differences between autism patients and controls, offering a powerful tool for exploring the characteristics of complex networks in various applications. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2407.13292 [pdf, other]

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

Authors: Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou

Abstract: The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense… ▽ More The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense that the annotated speech is very limited. With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition. Our experiments are based on the recently released, three backbone models pretrained over the 10 languages from the CommonVoice dataset (CV-Lang10), which correspond to the three approaches for low-resourced ASR. It is found that phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. Particularly, the Whistle models, i.e., obtained by the weakly-supervised phoneme-based multilingual pre-training, obtain the most competitive results. △ Less

Submitted 16 September, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted into ISCSLP 2024

arXiv:2405.11690 [pdf, other]

InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios

Authors: Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura

Abstract: We address the problem of accurate capture and expressive modelling of interactive behaviors happening between two persons in daily scenarios. Different from previous works which either only consider one person or focus on conversational gestures, we propose to simultaneously model the activities of two persons, and target objective-driven, dynamic, and coherent interactions which often span long… ▽ More We address the problem of accurate capture and expressive modelling of interactive behaviors happening between two persons in daily scenarios. Different from previous works which either only consider one person or focus on conversational gestures, we propose to simultaneously model the activities of two persons, and target objective-driven, dynamic, and coherent interactions which often span long duration. To this end, we capture a new dataset dubbed InterAct, which is composed of 241 motion sequences where two persons perform a realistic scenario over the whole sequence. The audios, body motions, and facial expressions of both persons are all captured in our dataset. We also demonstrate the first diffusion model based approach that directly estimates the interactive motions between two persons from their audios alone. All the data and code will be available at: https://hku-cg.github.io/interact. △ Less

Submitted 27 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: The first two authors contributed equally to this work

arXiv:2404.13605 [pdf, other]

Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence

Authors: Ripon Kumar Saha, Dehao Qin, Nianyi Li, Jinwei Ye, Suren Jayasuriya

Abstract: Tackling image degradation due to atmospheric turbulence, particularly in dynamic environment, remains a challenge for long-range imaging systems. Existing techniques have been primarily designed for static scenes or scenes with small motion. This paper presents the first segment-then-restore pipeline for restoring the videos of dynamic scenes in turbulent environment. We leverage mean optical flo… ▽ More Tackling image degradation due to atmospheric turbulence, particularly in dynamic environment, remains a challenge for long-range imaging systems. Existing techniques have been primarily designed for static scenes or scenes with small motion. This paper presents the first segment-then-restore pipeline for restoring the videos of dynamic scenes in turbulent environment. We leverage mean optical flow with an unsupervised motion segmentation method to separate dynamic and static scene components prior to restoration. After camera shake compensation and segmentation, we introduce foreground/background enhancement leveraging the statistics of turbulence strength and a transformer model trained on a novel noise-based procedural turbulence generator for fast dataset augmentation. Benchmarked against existing restoration methods, our approach restores most of the geometric distortion and enhances sharpness for videos. We make our code, simulator, and data publicly available to advance the field of video restoration from turbulence: riponcs.github.io/TurbSegRes △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Paper

arXiv:2404.10518 [pdf, other]

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Authors: Danfeng Qin, Chas Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard

Abstract: We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB,… ▽ More We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms. △ Less

Submitted 29 September, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10512 [pdf]

Four-hour thunderstorm nowcasting using deep diffusion models of satellite

Authors: Kuai Dai, Xutao Li, Junying Fang, Yunming Ye, Demin Yu, Hui Su, Di Xian, Danyu Qin, Jingsong Wang

Abstract: Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and… ▽ More Convection (thunderstorm) develops rapidly within hours and is highly destructive, posing a significant challenge for nowcasting and resulting in substantial losses to nature and society. After the emergence of artificial intelligence (AI)-based methods, convection nowcasting has experienced rapid advancements, with its performance surpassing that of physics-based numerical weather prediction and other conventional approaches. However, the lead time and coverage of it still leave much to be desired and hardly meet the needs of disaster emergency response. Here, we propose deep diffusion models of satellite (DDMS) to establish an AI-based convection nowcasting system. Specifically, DDMS employs diffusion processes to effectively simulate complicated spatiotemporal evolution patterns of convective clouds, significantly improving the forecast lead time. Additionally, it combines geostationary satellite brightness temperature data and domain knowledge from meteorological experts, thereby achieving planetary-scale forecast coverage. During long-term tests and objective validation based on the FengYun-4A satellite, our system achieves, for the first time, effective convection nowcasting up to 4 hours, with broad coverage (about 20,000,000 km$^2$), remarkable accuracy, and high resolution (15 minutes; 4 km). Its performance reaches a new height in convection nowcasting compared to the existing models. In terms of application, our system is highly transferable with the potential to collaborate with multiple satellites for global convection nowcasting. Furthermore, our results highlight the remarkable capabilities of diffusion models in convective clouds forecasting, as well as the significant value of geostationary satellite data when empowered by AI technologies. △ Less

Submitted 1 June, 2025; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.14949 [pdf, other]

Addressing Concept Shift in Online Time Series Forecasting: Detect-then-Adapt

Authors: YiFan Zhang, Weiqi Chen, Zhaoyang Zhu, Dalin Qin, Liang Sun, Xue Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin

Abstract: Online updating of time series forecasting models aims to tackle the challenge of concept drifting by adjusting forecasting models based on streaming data. While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. To address t… ▽ More Online updating of time series forecasting models aims to tackle the challenge of concept drifting by adjusting forecasting models based on streaming data. While numerous algorithms have been developed, most of them focus on model design and updating. In practice, many of these methods struggle with continuous performance regression in the face of accumulated concept drifts over time. To address this limitation, we present a novel approach, Concept \textbf{D}rift \textbf{D}etection an\textbf{D} \textbf{A}daptation (D3A), that first detects drifting conception and then aggressively adapts the current model to the drifted concepts after the detection for rapid adaption. To best harness the utility of historical data for model adaptation, we propose a data augmentation strategy introducing Gaussian noise into existing training instances. It helps mitigate the data distribution gap, a critical factor contributing to train-test performance inconsistency. The significance of our data augmentation process is verified by our theoretical analysis. Our empirical studies across six datasets demonstrate the effectiveness of D3A in improving model adaptation capability. Notably, compared to a simple Temporal Convolutional Network (TCN) baseline, D3A reduces the average Mean Squared Error (MSE) by $43.9\%$. For the state-of-the-art (SOTA) model, the MSE is reduced by $33.3\%$. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 figures, 14 pages. arXiv admin note: text overlap with arXiv:2309.12659

arXiv:2403.11416 [pdf]

doi 10.1103/PhysRevB.109.115415

Surface region band enhancement in noble gas adsorption assisted ARPES on kagome superconductor RbV3Sb5

Authors: Cao Peng, Yiwei Li, Xu Chen, Shenghao Dai, Zewen Wu, Chunlong Wu, Qiang Wan, Keming Zhao, Renzhe Li, Shangkun Mo, Dingkun Qin, Shuming Yu, Hao Zhong, Shengjun Yuan, Jiangang Guo, Nan Xu

Abstract: Electronic states near surface regions can be distinct from bulk states, which are paramount in understanding various physical phenomena occurring at surfaces and in applications in semiconductors, energy, and catalysis. Here, we report an abnormal surface region band enhancement effect in angle-resolved photoemission spectroscopy on kagome superconductor RbV3Sb5, by depositing noble gases with fi… ▽ More Electronic states near surface regions can be distinct from bulk states, which are paramount in understanding various physical phenomena occurring at surfaces and in applications in semiconductors, energy, and catalysis. Here, we report an abnormal surface region band enhancement effect in angle-resolved photoemission spectroscopy on kagome superconductor RbV3Sb5, by depositing noble gases with fine control. In contrast to conventional surface contamination, the intensity of surface region Sb band can be enhanced more than three times with noble gas adsorption. In the meantime, a hole-dope effect is observed for the enhanced surface region band, with other bands hardly changing. The doping effect is more pronounced with heavier noble gases. We propose that noble gas atoms selectively fill into alkali metal vacancy sites on the surface, which improves the surface condition, boosts surface region bands, and effectively dopes it with the Pauli repulsion mechanism. Our results provide a novel and reversible way to improve surface conditions and tune surface region bands by controlled surface noble gas deposition. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 17 pages,4 figures

Journal ref: Phys. Rev. B 109, 115415 (2024)

arXiv:2402.16430 [pdf, other]

Improving behavior based authentication against adversarial attack using XAI

Authors: Dong Qin, George Amariucai, Daji Qiao, Yong Guan

Abstract: In recent years, machine learning models, especially deep neural networks, have been widely used for classification tasks in the security domain. However, these models have been shown to be vulnerable to adversarial manipulation: small changes learned by an adversarial attack model, when applied to the input, can cause significant changes in the output. Most research on adversarial attacks and cor… ▽ More In recent years, machine learning models, especially deep neural networks, have been widely used for classification tasks in the security domain. However, these models have been shown to be vulnerable to adversarial manipulation: small changes learned by an adversarial attack model, when applied to the input, can cause significant changes in the output. Most research on adversarial attacks and corresponding defense methods focuses only on scenarios where adversarial samples are directly generated by the attack model. In this study, we explore a more practical scenario in behavior-based authentication, where adversarial samples are collected from the attacker. The generated adversarial samples from the model are replicated by attackers with a certain level of discrepancy. We propose an eXplainable AI (XAI) based defense strategy against adversarial attacks in such scenarios. A feature selector, trained with our method, can be used as a filter in front of the original authenticator. It filters out features that are more vulnerable to adversarial attacks or irrelevant to authentication, while retaining features that are more robust. Through comprehensive experiments, we demonstrate that our XAI based defense strategy is effective against adversarial attacks and outperforms other defense strategies, such as adversarial training and defensive distillation. △ Less

Submitted 10 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.04671 [pdf, other]

V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication

Authors: Yuanfang Zhang, Junxuan Li, Kaiqing Luo, Yiying Yang, Jiayi Han, Nian Liu, Denghui Qin, Peng Han, Chengpei Xu

Abstract: Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to… ▽ More Semantic scene completion (SSC) has recently gained popularity because it can provide both semantic and geometric information that can be used directly for autonomous vehicle navigation. However, there are still challenges to overcome. SSC is often hampered by occlusion and short-range perception due to sensor limitations, which can pose safety risks. This paper proposes a fundamental solution to this problem by leveraging vehicle-to-vehicle (V2V) communication. We propose the first generalized collaborative SSC framework that allows autonomous vehicles to share sensing information from different sensor views to jointly perform SSC tasks. To validate the proposed framework, we further build V2VSSC, the first V2V SSC benchmark, on top of the large-scale V2V perception dataset OPV2V. Extensive experiments demonstrate that by leveraging V2V communication, the SSC performance can be increased by 8.3% on geometric metric IoU and 6.0% mIOU. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.02589 [pdf, other]

doi 10.1038/s41366-024-01679-0

Prospective Prediction of Body Mass Index Trajectories using Multi-task Gaussian Processes

Authors: Arthur Leroy, Varsha Gupta, Mya Thway Tint, Delicia Ooi Shu Qin, Keith M. Godfrey, Fabian Yap, Leck Ngee, Yung Seng Lee, Johan G. Eriksson, Navin Michael, Mauricio A. Alvarez, Dennis Wang

Abstract: Clinicians often investigate the body mass index (BMI) trajectories of children to assess their growth with respect to their peers, as well as to anticipate future growth and disease risk. While retrospective modelling of BMI trajectories has been an active area of research, prospective prediction of continuous BMI trajectories from historical growth data has not been well investigated. Using weig… ▽ More Clinicians often investigate the body mass index (BMI) trajectories of children to assess their growth with respect to their peers, as well as to anticipate future growth and disease risk. While retrospective modelling of BMI trajectories has been an active area of research, prospective prediction of continuous BMI trajectories from historical growth data has not been well investigated. Using weight and height measurements from birth to age 10 years from a longitudinal mother-offspring cohort, we leveraged a multi-task Gaussian processes model, called MagmaClust, to derive probabilistic predictions for BMI trajectories over various forecasting periods. Experiments were conducted to evaluate the accuracy, sensitivity to missing values, and number of clusters. The results were compared with cubic B-spline regression and a parametric Jenss-Bayley mixed effects model. A downstream tool computing individual overweight probabilities was also proposed and evaluated. In all experiments, MagmaClust outperformed conventional models in prediction accuracy while correctly calibrating uncertainty regardless of the missing data amount (up to 90\% missing) or the forecasting period (from 2 to 8 years in the future). Moreover, the overweight probabilities computed from MagmaClust's uncertainty quantification exhibited high specificity ($0.94$ to $0.96$) and accuracy ($0.86$ to $0.94$) in predicting the 10-year overweight status even from age 2 years. MagmaClust provides a probabilistic non-parametric framework to prospectively predict BMI trajectories, which is robust to missing values and outperforms conventional BMI trajectory modelling approaches. It also clusters individuals to identify typical BMI patterns (early peak, adiposity rebounds) during childhood. Overall, we demonstrated its potential to anticipate BMI evolution throughout childhood, allowing clinicians to implement prevention strategies. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 17 pages, 9 figures, 5 tables

Journal ref: International Journal of Obesity, 2025, volume 49

arXiv:2401.15687 [pdf, other]

Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Authors: Qingcheng Zhao, Pengyu Long, Qixuan Zhang, Dafei Qin, Han Liang, Longwen Zhang, Yingliang Zhang, Jingyi Yu, Lan Xu

Abstract: The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an effic… ▽ More The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an efficient variational auto-encoder mapping facial geometry and images to a highly generalized expression latent space, decoupling expressions and identities. Then, we utilize GNPFA to extract high-quality expressions and accurate head poses from a large array of videos. This presents the M2F-D dataset, a large, diverse, and scan-level co-speech 3D facial animation dataset with well-annotated emotional and style labels. Finally, we propose Media2Face, a diffusion model in GNPFA latent space for co-speech facial animation generation, accepting rich multi-modality guidances from audio, text, and image. Extensive experiments demonstrate that our model not only achieves high fidelity in facial animation synthesis but also broadens the scope of expressiveness and style adaptability in 3D facial animation. △ Less

Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Project Page: https://sites.google.com/view/media2face

arXiv:2311.03863 [pdf]

An Explainable Framework for Machine learning-Based Reactive Power Optimization of Distribution Network

Authors: Wenlong Liao, Benjamin Schäfer, Dalin Qin, Gonghao Zhang, Zhixian Wang, Zhe Yang

Abstract: To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of mach… ▽ More To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of machine learning models. To address this issue, an explainable machine-learning framework is proposed to optimize the reactive power in distribution networks. Firstly, a Shapley additive explanation framework is presented to measure the contribution of each input feature to the solution of reactive power optimizations generated from machine learning models. Secondly, a model-agnostic approximation method is developed to estimate Shapley values, so as to avoid the heavy computational burden associated with direct calculations of Shapley values. The simulation results show that the proposed explainable framework can accurately explain the solution of the machine learning model-based reactive power optimization by using visual analytics, from both global and instance perspectives. Moreover, the proposed explainable framework is model-agnostic, and thus applicable to various models (e.g., neural networks). △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: It was submitted to the 23rd Power Systems Computation Conference (PSCC 2024) on Sept.2023

arXiv:2311.03572 [pdf, other]

Unsupervised Region-Growing Network for Object Segmentation in Atmospheric Turbulence

Authors: Dehao Qin, Ripon Saha, Suren Jayasuriya, Jinwei Ye, Nianyi Li

Abstract: Moving object segmentation in the presence of atmospheric turbulence is highly challenging due to turbulence-induced irregular and time-varying distortions. In this paper, we present an unsupervised approach for segmenting moving objects in videos downgraded by atmospheric turbulence. Our key approach is a detect-then-grow scheme: we first identify a small set of moving object pixels with high con… ▽ More Moving object segmentation in the presence of atmospheric turbulence is highly challenging due to turbulence-induced irregular and time-varying distortions. In this paper, we present an unsupervised approach for segmenting moving objects in videos downgraded by atmospheric turbulence. Our key approach is a detect-then-grow scheme: we first identify a small set of moving object pixels with high confidence, then gradually grow a foreground mask from those seeds to segment all moving objects. This method leverages rigid geometric consistency among video frames to disentangle different types of motions, and then uses the Sampson distance to initialize the seedling pixels. After growing per-frame foreground masks, we use spatial grouping loss and temporal consistency loss to further refine the masks in order to ensure their spatio-temporal consistency. Our method is unsupervised and does not require training on labeled data. For validation, we collect and release the first real-captured long-range turbulent video dataset with ground truth masks for moving objects. Results show that our method achieves good accuracy in segmenting moving objects and is robust for long-range videos with various turbulence strengths. △ Less

Submitted 4 August, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.17945 [pdf, other]

A Comprehensive and Reliable Feature Attribution Method: Double-sided Remove and Reconstruct (DoRaR)

Authors: Dong Qin, George Amariucai, Daji Qiao, Yong Guan, Shen Fu

Abstract: The limited transparency of the inner decision-making mechanism in deep neural networks (DNN) and other machine learning (ML) models has hindered their application in several domains. In order to tackle this issue, feature attribution methods have been developed to identify the crucial features that heavily influence decisions made by these black box models. However, many feature attribution metho… ▽ More The limited transparency of the inner decision-making mechanism in deep neural networks (DNN) and other machine learning (ML) models has hindered their application in several domains. In order to tackle this issue, feature attribution methods have been developed to identify the crucial features that heavily influence decisions made by these black box models. However, many feature attribution methods have inherent downsides. For example, one category of feature attribution methods suffers from the artifacts problem, which feeds out-of-distribution masked inputs directly through the classifier that was originally trained on natural data points. Another category of feature attribution method finds explanations by using jointly trained feature selectors and predictors. While avoiding the artifacts problem, this new category suffers from the Encoding Prediction in the Explanation (EPITE) problem, in which the predictor's decisions rely not on the features, but on the masks that selects those features. As a result, the credibility of attribution results is undermined by these downsides. In this research, we introduce the Double-sided Remove and Reconstruct (DoRaR) feature attribution method based on several improvement methods that addresses these issues. By conducting thorough testing on MNIST, CIFAR10 and our own synthetic dataset, we demonstrate that the DoRaR feature attribution method can effectively bypass the above issues and can aid in training a feature selector that outperforms other state-of-the-art feature attribution methods. Our code is available at https://github.com/dxq21/DoRaR. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 16 pages, 22 figures

arXiv:2310.06851 [pdf, other]

doi 10.1145/3592456

BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

Authors: Kunkun Pang, Dafei Qin, Yingruo Fan, Julian Habekost, Takaaki Shiratori, Junichi Yamagishi, Taku Komura

Abstract: Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framew… ▽ More Automatic gesture synthesis from speech is a topic that has attracted researchers for applications in remote communication, video games and Metaverse. Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training. In this paper, we propose a novel transformer-based framework for automatic 3D body gesture synthesis from speech. To learn the stochastic nature of the body gesture during speech, we propose a variational transformer to effectively model a probabilistic distribution over gestures, which can produce diverse gestures during inference. Furthermore, we introduce a mode positional embedding layer to capture the different motion speeds in different speaking modes. To cope with the scarcity of data, we design an intra-modal pre-training scheme that can learn the complex mapping between the speech and the 3D gesture from a limited amount of data. Our system is trained with either the Trinity speech-gesture dataset or the Talking With Hands 16.2M dataset. The results show that our system can produce more realistic, appropriate, and diverse body gestures compared to existing state-of-the-art approaches. △ Less

Submitted 6 September, 2023; originally announced October 2023.

Comments: 12 pages, 13 figures

arXiv:2305.08296 [pdf, other]

Neural Face Rigging for Animating and Retargeting Facial Meshes in the Wild

Authors: Dafei Qin, Jun Saito, Noam Aigerman, Thibault Groueix, Taku Komura

Abstract: We propose an end-to-end deep-learning approach for automatic rigging and retargeting of 3D models of human faces in the wild. Our approach, called Neural Face Rigging (NFR), holds three key properties: (i) NFR's expression space maintains human-interpretable editing parameters for artistic controls; (ii) NFR is readily applicable to arbitrary facial meshes with different connectivity and expr… ▽ More We propose an end-to-end deep-learning approach for automatic rigging and retargeting of 3D models of human faces in the wild. Our approach, called Neural Face Rigging (NFR), holds three key properties: (i) NFR's expression space maintains human-interpretable editing parameters for artistic controls; (ii) NFR is readily applicable to arbitrary facial meshes with different connectivity and expressions; (iii) NFR can encode and produce fine-grained details of complex expressions performed by arbitrary subjects. To the best of our knowledge, NFR is the first approach to provide realistic and controllable deformations of in-the-wild facial meshes, without the manual creation of blendshapes or correspondence. We design a deformation autoencoder and train it through a multi-dataset training scheme, which benefits from the unique advantages of two data sources: a linear 3DMM with interpretable control parameters as in FACS, and 4D captures of real faces with fine-grained details. Through various experiments, we show NFR's ability to automatically produce realistic and accurate facial deformations across a wide range of existing datasets as well as noisy facial scans in-the-wild, while providing artist-controlled, editable parameters. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: SIGGRAPH 2023(Conference Track), 13 pages, 15 figures

arXiv:2303.06353 [pdf, ps, other]

Secure and Multi-Step Computation Offloading and Resource Allocation in Ultra-Dense Multi-Task NOMA-Enabled IoT Networks

Authors: Tianqing Zhou, Yanyan Fu, Dong Qin, Xuefang Nie, Nan Jiang, Chunguo Li

Abstract: Ultra-dense networks are widely regarded as a promising solution to explosively growing applications of Internet-of-Things (IoT) mobile devices (IMDs). However, complicated and severe interferences need to be tackled properly in such networks. To this end, both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are utilized at first. Then, in order to attain a goal of green… ▽ More Ultra-dense networks are widely regarded as a promising solution to explosively growing applications of Internet-of-Things (IoT) mobile devices (IMDs). However, complicated and severe interferences need to be tackled properly in such networks. To this end, both orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) are utilized at first. Then, in order to attain a goal of green and secure computation offloading, under the proportional allocation of computational resources and the constraints of latency and security cost, joint device association, channel selection, security service assignment, power control and computation offloading are done for minimizing the overall energy consumed by all IMDs. It is noteworthy that multi-step computation offloading is concentrated to balance the network loads and utilize computing resources fully. Since the finally formulated problem is in a nonlinear mixed-integer form, it may be very difficult to find its closed-form solution. To solve it, an improved whale optimization algorithm (IWOA) is designed. As for this algorithm, the convergence, computational complexity and parallel implementation are analyzed in detail. Simulation results show that the designed algorithm may achieve lower energy consumption than other existing algorithms under the constraints of latency and security cost. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2302.09975 [pdf, other]

Optimal energy harvesting efficiency from vortex-induced vibration of a circular cylinder under flow

Authors: Peng Han, Qiaogao Huang, Guang Pan, Denghui Qin, Wei Wang, Rodolfo T. Gonçalves, Jisheng Zhao

Abstract: This work applies a combined approach a reduced-order model (ROM) together with experiments and direct numerical simulations to investigate the optimal efficiency of fluid-flow energy harvesting from transverse vortex-induced vibration (VIV) of a circular cylinder. High resolution efficiency maps were predicted over wide ranges of flow reduced velocities and structural damping ratios, and the maxi… ▽ More This work applies a combined approach a reduced-order model (ROM) together with experiments and direct numerical simulations to investigate the optimal efficiency of fluid-flow energy harvesting from transverse vortex-induced vibration (VIV) of a circular cylinder. High resolution efficiency maps were predicted over wide ranges of flow reduced velocities and structural damping ratios, and the maximum efficiency and optimal settings of damping ratio and reduced velocity were then examined for different mass ratios and Reynolds numbers. Efficiencies predicted by the ROM were also validated against either experiments or direct simulations. The present work indicates that: (i) the maximum efficiency is controlled by both the incoming reduced velocity and the product of mass ratio and structural damping ratio, which is similar to the maximum amplitude of VIV; (ii) the maximum efficiency at a relatively high Reynolds number ($Re \approx 6 \times 10^3$) in subcritical regime is higher than that of a low Reynolds number ($Re = 150$) in laminar regime; (iii) the energy harvesting efficiency from VIV of a circular cylinder with a low mass ratio is more robust than that with a high mass ratio. This finding suggests that the VIV harvester performs better in water than in air. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2212.01336 [pdf]

The Influence of Cultural Distance on Settlement Intention of Floating Population in China

Authors: Dan Qin

Abstract: Based on a nationwide labour-force survey data, this paper investigates the influence of cultural variance on migrants' settlement intention in China. By using dialectal distance as a proxy for cultural distance, we find strong evidence for the negative effects of cultural distance on migrants' settlement intention. By further investigation into sub-samples separated by gender, generation and high… ▽ More Based on a nationwide labour-force survey data, this paper investigates the influence of cultural variance on migrants' settlement intention in China. By using dialectal distance as a proxy for cultural distance, we find strong evidence for the negative effects of cultural distance on migrants' settlement intention. By further investigation into sub-samples separated by gender, generation and higher education experience, we find that the influence is less effective for younger migrants and higher-educated migrants, which indicates that the impact of cultural barrier may gradually diminish with the integration of society and promotion of education. △ Less

Submitted 8 November, 2022; originally announced December 2022.

arXiv:2211.04031 [pdf, other]

Hilbert Distillation for Cross-Dimensionality Networks

Authors: Dian Qin, Haishuai Wang, Zhe Liu, Hongjia Xu, Sheng Zhou, Jiajun Bu

Abstract: 3D convolutional neural networks have revealed superior performance in processing volumetric data such as video and medical imaging. However, the competitive performance by leveraging 3D networks results in huge computational costs, which are far beyond that of 2D networks. In this paper, we propose a novel Hilbert curve-based cross-dimensionality distillation approach that facilitates the knowled… ▽ More 3D convolutional neural networks have revealed superior performance in processing volumetric data such as video and medical imaging. However, the competitive performance by leveraging 3D networks results in huge computational costs, which are far beyond that of 2D networks. In this paper, we propose a novel Hilbert curve-based cross-dimensionality distillation approach that facilitates the knowledge of 3D networks to improve the performance of 2D networks. The proposed Hilbert Distillation (HD) method preserves the structural information via the Hilbert curve, which maps high-dimensional (>=2) representations to one-dimensional continuous space-filling curves. Since the distilled 2D networks are supervised by the curves converted from dimensionally heterogeneous 3D features, the 2D networks are given an informative view in terms of learning structural information embedded in well-trained high-dimensional representations. We further propose a Variable-length Hilbert Distillation (VHD) method to dynamically shorten the walking stride of the Hilbert curve in activation feature areas and lengthen the stride in context feature areas, forcing the 2D networks to pay more attention to learning from activation features. The proposed algorithm outperforms the current state-of-the-art distillation techniques adapted to cross-dimensionality distillation on two classification tasks. Moreover, the distilled 2D networks by the proposed method achieve competitive performance with the original 3D networks, indicating the lightweight distilled 2D networks could potentially be the substitution of cumbersome 3D networks in the real-world scenario. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2210.03384 [pdf, other]

doi 10.1126/science.ade7759

Ultrafast reversible self-assembly of living tangled matter

Authors: Vishal P. Patil, Harry Tuazon, Emily Kaufman, Tuhin Chakrabortty, David Qin, Jörn Dunkel, M. Saad Bhamla

Abstract: Tangled active filaments are ubiquitous in nature, from chromosomal DNA and cilia carpets to root networks and worm blobs. How activity and elasticity facilitate collective topological transformations in living tangled matter is not well understood. Here, we report an experimental and theoretical study of California blackworms (Lumbriculus variegatus), which slowly form tangles over minutes but ca… ▽ More Tangled active filaments are ubiquitous in nature, from chromosomal DNA and cilia carpets to root networks and worm blobs. How activity and elasticity facilitate collective topological transformations in living tangled matter is not well understood. Here, we report an experimental and theoretical study of California blackworms (Lumbriculus variegatus), which slowly form tangles over minutes but can untangle in milliseconds. Combining ultrasound imaging, theoretical analysis and simulations, we develop and validate a mechanistic model that explains how the kinematics of individual active filaments determines their emergent collective topological dynamics. The model reveals that resonantly alternating helical waves enable both tangle formation and ultrafast untangling. By identifying generic dynamical principles of topological self-transformations, our results can provide guidance for designing new classes of topologically tunable active materials. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2207.06629 [pdf]

doi 10.1088/1361-6668/ac8025

K-doped Ba122 epitaxial thin film on MgO substrate by buffer engineering

Authors: Dongyi Qin, Kazumasa Iida, Zimeng Guo, Chao Wang, Hikaru Saito, Satoshi Hata, Michio Naito, Akiyasu Yamamoto

Abstract: Molecular beam epitaxy of K-doped Ba122 (Ba$_{1-x}$K$_x$Fe$_\text{2}$As$_\text{2}$) superconductor was realized on a MgO substrate. Microstructural observation revealed that the undoped Ba122 served as a perfect buffer layer for epitaxial growth of the K-doped Ba122. The film exhibited a high critical temperature of 39.8 K and a high critical current density of 3.9 MA/cm$^\text{2}$ at 4 K. The suc… ▽ More Molecular beam epitaxy of K-doped Ba122 (Ba$_{1-x}$K$_x$Fe$_\text{2}$As$_\text{2}$) superconductor was realized on a MgO substrate. Microstructural observation revealed that the undoped Ba122 served as a perfect buffer layer for epitaxial growth of the K-doped Ba122. The film exhibited a high critical temperature of 39.8 K and a high critical current density of 3.9 MA/cm$^\text{2}$ at 4 K. The successful growth of epitaxial thin film will enable artificial single grain boundary on oxide bicrystal substrates and reveal the grain boundary transport nature of K-doped Ba122. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 5 pages, 4 figures, accepted manuscript Supercond. Sci. Technol 2022

Showing 1–50 of 67 results for author: Qin, D