Search | arXiv e-print repository

arXiv:2305.01947 [pdf]

Understanding the Impact of Heatwave on Urban Heat Island in Greater Sydney: Temporal Surface Energy Budget Change with Land Types

Authors: Jing Kong, Yongling Zhao, Dominik Strebel, Kai Gao, Jan Carmeliet, Chengwang Lei

Abstract: The impact of heatwaves (HWs) on urban heat island (UHI) is a contentious topic with contradictory research findings. A comprehensive understanding of the response of urban and rural areas to HWs, considering the underlying cause of surface energy budget changes, remains elusive. This study attempts to address this gap by investigating a 2020 HW event in the Greater Sydney Area using the Advanced… ▽ More The impact of heatwaves (HWs) on urban heat island (UHI) is a contentious topic with contradictory research findings. A comprehensive understanding of the response of urban and rural areas to HWs, considering the underlying cause of surface energy budget changes, remains elusive. This study attempts to address this gap by investigating a 2020 HW event in the Greater Sydney Area using the Advanced Weather Research and Forecasting (WRF) model. Findings indicate that the HW intensifies the nighttime surface UHI by approximately 4°C. An analysis of surface energy budgets reveals that urban areas store more heat during the HW due to receiving more solar radiation and less evapotranspiration compared to rural areas. The maximum heat storage flux in urban during the HW can be around 200 W/m2 higher than that during post-HW. The stored heat is released at nightime, raising the air temperature in the urban areas. Forests and savannas have relatively lower storage heat fluxes due to high transpiration and albedo, and the maximum heat storage flux is only around 50 W/m2 higher than that during post-HW. In contrast, a negative synergistic effect is detected between the 2-m UHI and HW. This may be because other meteorological conditions including wind have substantial impacts on the air temperature pattern. The strong hot and dry winds coming from the west and the proximity of tall buildings to the coast diminish the sea breeze coming from the east, resulting in a higher air temperature in the western urban district. Meanwhile, the western forest area also experiences higher temperatures due to the westward winds. In addition, changes in wind direction alter the temperature distribution in the northern rural region. Based on the present study, urban climate simulation data and associated findings can be used to develop urban heat mitigation strategies for UHI during HW. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.11959 [pdf, other]

A Forward and Backward Compatible Framework for Few-shot Class-incremental Pill Recognition

Authors: Jinghua Zhang, Li Liu, Kai Gao, Dewen Hu

Abstract: Automatic Pill Recognition (APR) systems are crucial for enhancing hospital efficiency, assisting visually impaired individuals, and preventing cross-infection. However, most existing deep learning-based pill recognition systems can only perform classification on classes with sufficient training data. In practice, the high cost of data annotation and the continuous increase in new pill classes nec… ▽ More Automatic Pill Recognition (APR) systems are crucial for enhancing hospital efficiency, assisting visually impaired individuals, and preventing cross-infection. However, most existing deep learning-based pill recognition systems can only perform classification on classes with sufficient training data. In practice, the high cost of data annotation and the continuous increase in new pill classes necessitate the development of a few-shot class-incremental pill recognition system. This paper introduces the first few-shot class-incremental pill recognition framework, named Discriminative and Bidirectional Compatible Few-Shot Class-Incremental Learning (DBC-FSCIL). It encompasses forward-compatible and backward-compatible learning components. In forward-compatible learning, we propose an innovative virtual class synthesis strategy and a Center-Triplet (CT) loss to enhance discriminative feature learning. These virtual classes serve as placeholders in the feature space for future class updates, providing diverse semantic knowledge for model training. For backward-compatible learning, we develop a strategy to synthesize reliable pseudo-features of old classes using uncertainty quantification, facilitating Data Replay (DR) and Knowledge Distillation (KD). This approach allows for the flexible synthesis of features and effectively reduces additional storage requirements for samples and models. Additionally, we construct a new pill image dataset for FSCIL and assess various mainstream FSCIL methods, establishing new benchmarks. Our experimental results demonstrate that our framework surpasses existing State-of-the-art (SOTA) methods. The code is available at https://github.com/zhang-jinghua/DBC-FSCIL. △ Less

Submitted 25 March, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.07699 [pdf, other]

doi 10.1109/TKDE.2023.3340732

A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

Authors: Hanlei Zhang, Hua Xu, Xin Wang, Fei Long, Kai Gao

Abstract: New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID,… ▽ More New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services. However, most existing methods struggle to capture the complicated semantics of discrete text representations when limited or no prior knowledge of labeled data is available. To tackle this problem, we propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery, which has three key technologies. First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations and provide well-initialized representations for clustering. Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency and provide high-quality self-supervised targets for representation learning. Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters by optimizing both cluster-level and instance-level objectives. We also propose an effective method for estimating the cluster number in open-world scenarios without knowing the number of new intents beforehand. USNID performs exceptionally well on several benchmark intent datasets, achieving new state-of-the-art results in unsupervised and semi-supervised new intent discovery and demonstrating robust performance with different cluster numbers. △ Less

Submitted 12 December, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE TKDE

Journal ref: IEEE Transactions on Knowledge and Data Engineering 2023

arXiv:2304.03950 [pdf, other]

GANHead: Towards Generative Animatable Neural Head Avatars

Authors: Sijing Wu, Yichao Yan, Yunhao Li, Yuhao Cheng, Wenhan Zhu, Ke Gao, Xiaobo Li, Guangtao Zhai

Abstract: To bring digital avatars into people's lives, it is highly demanded to efficiently generate complete, realistic, and animatable head avatars. This task is challenging, and it is difficult for existing methods to satisfy all the requirements at once. To achieve these goals, we propose GANHead (Generative Animatable Neural Head Avatar), a novel generative head model that takes advantages of both the… ▽ More To bring digital avatars into people's lives, it is highly demanded to efficiently generate complete, realistic, and animatable head avatars. This task is challenging, and it is difficult for existing methods to satisfy all the requirements at once. To achieve these goals, we propose GANHead (Generative Animatable Neural Head Avatar), a novel generative head model that takes advantages of both the fine-grained control over the explicit expression parameters and the realistic rendering results of implicit representations. Specifically, GANHead represents coarse geometry, fine-gained details and texture via three networks in canonical space to obtain the ability to generate complete and realistic head avatars. To achieve flexible animation, we define the deformation filed by standard linear blend skinning (LBS), with the learned continuous pose and expression bases and LBS weights. This allows the avatars to be directly animated by FLAME parameters and generalize well to unseen poses and expressions. Compared to state-of-the-art (SOTA) methods, GANHead achieves superior performance on head avatar generation and raw scan fitting. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: Camera-ready for CVPR 2023. Project page: https://wsj-sjtu.github.io/GANHead/

arXiv:2304.01764 [pdf, other]

Minimizing Running Buffers for Tabletop Object Rearrangement: Complexity, Fast Algorithms, and Applications

Authors: Kai Gao, Si Wei Feng, Baichuan Huang, Jingjin Yu

Abstract: For rearranging objects on tabletops with overhand grasps, temporarily relocating objects to some buffer space may be necessary. This raises the natural question of how many simultaneous storage spaces, or "running buffers", are required so that certain classes of tabletop rearrangement problems are feasible. In this work, we examine the problem for both labeled and unlabeled settings. On the stru… ▽ More For rearranging objects on tabletops with overhand grasps, temporarily relocating objects to some buffer space may be necessary. This raises the natural question of how many simultaneous storage spaces, or "running buffers", are required so that certain classes of tabletop rearrangement problems are feasible. In this work, we examine the problem for both labeled and unlabeled settings. On the structural side, we observe that finding the minimum number of running buffers (MRB) can be carried out on a dependency graph abstracted from a problem instance, and show that computing MRB is NP-hard. We then prove that under both labeled and unlabeled settings, even for uniform cylindrical objects, the number of required running buffers may grow unbounded as the number of objects to be rearranged increases. We further show that the bound for the unlabeled case is tight. On the algorithmic side, we develop effective exact algorithms for finding MRB for both labeled and unlabeled tabletop rearrangement problems, scalable to over a hundred objects under very high object density. More importantly, our algorithms also compute a sequence witnessing the computed MRB that can be used for solving object rearrangement tasks. Employing these algorithms, empirical evaluations reveal that random labeled and unlabeled instances, which more closely mimics real-world setups, generally have fairly small MRBs. Using real robot experiments, we demonstrate that the running buffer abstraction leads to state-of-the-art solutions for in-place rearrangement of many objects in tight, bounded workspace. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: Accepted by The International Journal of Robotics Research (IJRR). arXiv admin note: substantial text overlap with arXiv:2105.06357

arXiv:2303.14655 [pdf, other]

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Authors: Ji Qi, Jifan Yu, Teng Tu, Kunyu Gao, Yifan Xu, Xinyu Guan, Xiaozhi Wang, Yuxiao Dong, Bin Xu, Lei Hou, Juanzi Li, Jie Tang, Weidong Guo, Hui Liu, Yu Xu

Abstract: Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of ov… ▽ More Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal. △ Less

Submitted 5 October, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: Accepted by CIKM 2023

arXiv:2303.12993 [pdf, other]

Backdoor Defense via Adaptively Splitting Poisoned Dataset

Authors: Kuofeng Gao, Yang Bai, Jindong Gu, Yong Yang, Shu-Tao Xia

Abstract: Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust backdoor defense strategy during the training stage is of importance. We argue that the core of training-time defense is to select poisoned samples and to handle… ▽ More Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt some external training data from an untrusted third party, a robust backdoor defense strategy during the training stage is of importance. We argue that the core of training-time defense is to select poisoned samples and to handle them properly. In this work, we summarize the training-time defenses from a unified framework as splitting the poisoned dataset into two data pools. Under our framework, we propose an adaptively splitting dataset-based defense (ASD). Concretely, we apply loss-guided split and meta-learning-inspired split to dynamically update two data pools. With the split clean data pool and polluted data pool, ASD successfully defends against backdoor attacks during training. Extensive experiments on multiple benchmark datasets and DNN models against six state-of-the-art backdoor attacks demonstrate the superiority of our ASD. Our code is available at https://github.com/KuofengGao/ASD. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023

arXiv:2303.11134 [pdf, other]

doi 10.1016/j.physletb.2024.139019

Microlensing and event rate of static spherically symmetric wormhole

Authors: Ke Gao, Lei-Hua Liu

Abstract: The study focuses on the impact of microlensing in modern cosmology and introduces a new framework for the static spherically symmetrical wormhole in terms of the radial equation of state. Following a standard procedure, the study calculates the lensing equation, magnification, and event rate based on the the radial equation of state. The analysis highlights that the image problem of the light sou… ▽ More The study focuses on the impact of microlensing in modern cosmology and introduces a new framework for the static spherically symmetrical wormhole in terms of the radial equation of state. Following a standard procedure, the study calculates the lensing equation, magnification, and event rate based on the the radial equation of state. The analysis highlights that the image problem of the light source is complex. Furthermore, the study suggests that larger values for the throat radius of the wormhole and the radial equation of state lead to higher event rates. Additionally, it is proposed that the event rate of a wormhole will be larger compared to that of a black hole, provided their masses and distances from the light source and observer are comparable. This study offers the potential to distinguish between a wormhole and a black hole under similar conditions. △ Less

Submitted 10 September, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: To be published in PLB

Journal ref: Phys. Lett. B 858 (2024) 139019

arXiv:2303.07939 [pdf, ps, other]

Measurement of hyperfine structure and the Zemach radius in $\rm^6Li^+$ using optical Ramsey technique

Authors: Wei Sun, Pei-Pei Zhang, Peng-peng Zhou, Shao-long Chen, Zhi-qiang Zhou, Yao Huang, Xiao-Qiu Qi, Zong-Chao Yan, Ting-Yun Shi, G. W. F. Drake, Zhen-Xiang Zhong, Hua Guan, Ke-lin Gao

Abstract: We investigate the $2\,^3\!S_1$--$2\,^3\!P_J$ ($J = 0, 1, 2$) transitions in $\rm^6Li^+$ using the optical Ramsey technique and achieve the most precise values of the hyperfine splittings of the $2\,^3\!S_1$ and $2\,^3\!P_J$ states, with smallest uncertainty of about 10~kHz. The present results reduce the uncertainties of previous experiments by a factor of 5 for the $2\,^3\!S_1$ state and a facto… ▽ More We investigate the $2\,^3\!S_1$--$2\,^3\!P_J$ ($J = 0, 1, 2$) transitions in $\rm^6Li^+$ using the optical Ramsey technique and achieve the most precise values of the hyperfine splittings of the $2\,^3\!S_1$ and $2\,^3\!P_J$ states, with smallest uncertainty of about 10~kHz. The present results reduce the uncertainties of previous experiments by a factor of 5 for the $2\,^3\!S_1$ state and a factor of 50 for the $2\,^3\!P_J$ states, and are in better agreement with theoretical values. Combining our measured hyperfine intervals of the $2\,^3\!S_1$ state with the latest quantum electrodynamic (QED) calculations, the improved Zemach radius of the $\rm^6Li$ nucleus is determined to be 2.44(2)~fm, with the uncertainty entirely due to the uncalculated QED effects of order $mα^7$. The result is in sharp disagreement with the value 3.71(16) fm determined from simple models of the nuclear charge and magnetization distribution. We call for a more definitive nuclear physics value of the $\rm^6Li$ Zemach radius. △ Less

Submitted 18 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 6 pages, 6 figures

arXiv:2303.07566 [pdf]

Towards a transportable Ca$^+$ optical clock with a systematic uncertainty of $4.8\times 10^{-18}$

Authors: Mengyan Zeng, Yao Huang, Baolin Zhang, Yanmei Hao, Zixiao Ma, Ruming Hu, Huaqing Zhang, Zheng Chen, Miao Wang, Hua Guan, Kelin Gao

Abstract: We present a compact, long-term nearly continuous operation of a room-temperature Ca$^+$ optical clock setup towards a transportable clock, achieving an overall systematic uncertainty of $4.8\times 10^{-18}$ and an uptime rate of 97.8% over an 8-day period. The active liquid-cooling scheme is adopted, combined with the precise temperature measurement with 13 temperature sensors both inside and out… ▽ More We present a compact, long-term nearly continuous operation of a room-temperature Ca$^+$ optical clock setup towards a transportable clock, achieving an overall systematic uncertainty of $4.8\times 10^{-18}$ and an uptime rate of 97.8% over an 8-day period. The active liquid-cooling scheme is adopted, combined with the precise temperature measurement with 13 temperature sensors both inside and outside the vacuum chamber to ensure the accurate evaluation of the thermal environment for the optical clock. The environmental temperature uncertainty is evaluated as 293.31(0.4) K, corresponding to a blackbody radiation (BBR) frequency shift uncertainty of $4.6\times 10^{-18}$, which is reduced more than two times compared to our previous work. Through the frequency comparison between the room temperature Ca$^+$ optical clock and a cryogenic Ca$^+$ optical clock, the overall uncertainty of the clock comparison is $7.5\times 10^{-18}$, including a statistic uncertainty of $4.9\times 10^{-18}$ and a systematic uncertainty of $5.7\times 10^{-18}$. This work provides a set of feasible implementations for high-precision transportable ion optical clocks. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 11 pages, 4 figures

arXiv:2303.04552 [pdf]

Precision Measurement of M1 Optical Clock Transition in Ni12+

Authors: Shaolong Chen, Zhiqiang Zhou, Jiguang Li, Tingxian Zhang, Chengbin Li, Tingyun Shi, Yao Huang, Kelin Gao, Hua Guan

Abstract: Highly charged ions (HCIs) have drawn significant interest in quantum metrology and in search for new physics. Among these, Ni12+ is considered as one of the most promising candidates for the next generation of HCI optical clocks, due to its two E1-forbidden transitions M1 and E2, which occur in the visible spectral range. In this work, we used the Shanghai-Wuhan Electron Beam Ion Trap to perform… ▽ More Highly charged ions (HCIs) have drawn significant interest in quantum metrology and in search for new physics. Among these, Ni12+ is considered as one of the most promising candidates for the next generation of HCI optical clocks, due to its two E1-forbidden transitions M1 and E2, which occur in the visible spectral range. In this work, we used the Shanghai-Wuhan Electron Beam Ion Trap to perform a high-precision measurement of the M1 transition wavelength. Our approach involved an improved calibration scheme for the spectra, utilizing auxiliary Ar+ lines for calibration and correction. Our final measured result of the M1 transition wavelength demonstrates a five-fold improvement in accuracy compared to our previous findings, reaching the sub-picometer level accuracy. In combination with our rigorous atomic-structure calculations to capture the electron correlations and relativistic effects, the quantum electrodynamic (QED) corrections were extracted. Moreover, comparing with an estimate of the one-electron QED contributions by using the GRASP2018 package, we found that the present experimental accuracy is high enough for testing the higher-order QED corrections for such a complex system with four electrons in the p subshell. △ Less

Submitted 9 September, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 15 pages, 5 figures

arXiv:2303.00896 [pdf]

doi 10.1088/1681-7575/acd05d

Absolute frequency measurements with a robust, transportable ^{40}Ca^{+} optical clock

Authors: Huaqing Zhang, Yao Huang, Baolin Zhang, Yanmei Hao, Mengyan Zeng, Qunfeng Chen, Yuzhuo Wang, Shiying Cao, Yige Lin, Zhanjun Fang, Hua Guan, Kelin Gao

Abstract: We constructed a transportable 40Ca+ optical clock (with an estimated minimum systematic shift uncertainty of 1.3*10^(-17) and a stability of 5*10^(-15)/sqrt{tau} ) that can operate outside the laboratory. We transported it from the Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan to the National Institute of Metrology, Beijing. The absolute f… ▽ More We constructed a transportable 40Ca+ optical clock (with an estimated minimum systematic shift uncertainty of 1.3*10^(-17) and a stability of 5*10^(-15)/sqrt{tau} ) that can operate outside the laboratory. We transported it from the Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan to the National Institute of Metrology, Beijing. The absolute frequency of the 729 nm clock transition was measured for up to 35 days by tracing its frequency to the second of International System of Units. Some improvements were implemented in the measurement process, such as the increased effective up-time of 91.3 % of the 40Ca+ optical clock over a 35-day-period, the reduced statistical uncertainty of the comparison between the optical clock and hydrogen maser, and the use of longer measurement times to reduce the uncertainty of the frequency traceability link. The absolute frequency measurement of the 40Ca+ optical clock yielded a value of 411042129776400.26 (13) Hz with an uncertainty of 3.2*10^(-16), which is reduced by a factor of 1.7 compared with our previous results. As a result of the increase in the operating rate of the optical clock, the accuracy of 35 days of absolute frequency measurement can be comparable to the best results of different institutions in the world based on different optical frequency measurements. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 15 pages, 5 figures

arXiv:2302.00268 [pdf, other]

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

Authors: Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun

Abstract: Prompt tuning with large-scale pretrained vision-language models empowers open-vocabulary predictions trained on limited base categories, e.g., object classification and detection. In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. In particular, we present Relation Prompt (RePro) for Open-vocabula… ▽ More Prompt tuning with large-scale pretrained vision-language models empowers open-vocabulary predictions trained on limited base categories, e.g., object classification and detection. In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. In particular, we present Relation Prompt (RePro) for Open-vocabulary Video Visual Relation Detection (Open-VidVRD), where conventional prompt tuning is easily biased to certain subject-object combinations and motion patterns. To this end, RePro addresses the two technical challenges of Open-VidVRD: 1) the prompt tokens should respect the two different semantic roles of subject and object, and 2) the tuning should account for the diverse spatio-temporal motion patterns of the subject-object compositions. Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones. Extensive ablations also demonstrate the effectiveness of the proposed compositional and multi-mode design of prompts. Code is available at https://github.com/Dawn-LX/OpenVoc-VidVRD. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: accepted by ICLR 2023

arXiv:2301.00339 [pdf, ps, other]

doi 10.1103/PhysRevB.107.165411

The Origin of Two-dimensional Electron Gas in Zn$_{1-x}$Mg$_x$O/ZnO Heterostructures

Authors: Xiang-Hong Chen, Dong-Yu Hou, Zhi-Xin Hu, Kuang-Hong Gao, Zhi-Qing Li

Abstract: Although the two-dimensional electron gas (2DEG) in (001) Zn$_{1-x}$Mg$_x$O/ZnO heterostructures has been discovered for about twenty years, the origin of the 2DEG is still inconclusive. In the present letter, the formation mechanisms of 2DEG near the interfaces of (001) Zn$_{1-x}$Mg$_x$O/ZnO heterostructures were investigated via the first-principles calculations method. It is found that the pola… ▽ More Although the two-dimensional electron gas (2DEG) in (001) Zn$_{1-x}$Mg$_x$O/ZnO heterostructures has been discovered for about twenty years, the origin of the 2DEG is still inconclusive. In the present letter, the formation mechanisms of 2DEG near the interfaces of (001) Zn$_{1-x}$Mg$_x$O/ZnO heterostructures were investigated via the first-principles calculations method. It is found that the polarity discontinuity near the interface can neither lead to the formation of 2DEG in devices with thick Zn$_{1-x}$Mg$_{x}$O layers nor in devices with thin Zn$_{1-x}$Mg$_{x}$O layers. For the heterostructure with thick Zn$_{1-x}$Mg$_{x}$O layers, the oxygen vacancies near the interface introduce a defect band in the band gap, and the top of the defect band overlaps with the bottom of the conduction band, leading to the formation of the 2DEG near the interface of the device. For the heterostructure with thin Zn$_{1-x}$Mg$_{x}$O layers, the absorption of hydrogen atoms, oxygen atoms, or OH groups on the surface of Zn$_{1-x}$Mg$_{x}$O film plays a key role for the formation of 2DEG in the device. Our results manifest the sources of 2DEGs in Zn$_{1-x}$Mg$_x$O/ZnO heterostructures on the electronic structure level. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: 6 pages, 6 figures

arXiv:2212.11772 [pdf, other]

A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences

Authors: Kaicheng Yang, Ruxuan Zhang, Hua Xu, Kai Gao

Abstract: Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn… ▽ More Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences. Different from previous works, our model not only makes full use of the interaction between different modalities but also maximizes the protection of the unimodal characteristics. Specifically, we first employ a crossmodal alignment module to project different modalities features to the same dimension. The crossmodal collaboration attention is then adopted to model the inter-modal interaction between text and audio sequences and initialize the fusion representations. After that, as the core unit of the SA-FRLM, the crossmodal adjustment transformer is proposed to protect original unimodal characteristics. It can dynamically adapt the fusion representations by using single modal streams. We evaluate our approach on the public multimodal sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experiment results show that our model has significantly improved the performance of all the metrics on the unaligned text-audio sequences. △ Less

Submitted 12 November, 2022; originally announced December 2022.

Comments: 8 pages

arXiv:2212.03412 [pdf, other]

Artificial Intelligence Security Competition (AISC)

Authors: Yinpeng Dong, Peng Chen, Senyou Deng, Lianji L, Yi Sun, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yangyi Dong, Enhui Xu, Jincai Xu, Shu Xu, Xuelin Fu, Changfeng Sun, Haoliang Han, Xuchong Zhang, Shen Chen, Zhimin Sun, Junyi Cao, Taiping Yao, Shouhong Ding, Yu Wu, Jian Lin, Tianpeng Wu , et al. (27 additional authors not shown)

Abstract: The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and… ▽ More The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Technical report of AISC

arXiv:2211.17065 [pdf, other]

doi 10.1016/j.dark.2023.101254

Microlensing effects of wormholes associated to blackhole spacetimes

Authors: Ke Gao, Lei-Hua Liu, Mian Zhu

Abstract: In this paper, we investigate the microlensing effects of wormholes associated to black hole spacetimes. Specifically, we work on three typical wormholes (WH): Schwarzschild WH, Kerr WH, and RN WH, as well as their blackhole correspondences. We evaluate the deflection angle upon the second order under weak field approximation using Gauss-Bonnet theorem. Then, we study their magnification with nume… ▽ More In this paper, we investigate the microlensing effects of wormholes associated to black hole spacetimes. Specifically, we work on three typical wormholes (WH): Schwarzschild WH, Kerr WH, and RN WH, as well as their blackhole correspondences. We evaluate the deflection angle upon the second order under weak field approximation using Gauss-Bonnet theorem. Then, we study their magnification with numerics.We find that a Kerr WH could lead to multi peaks in the magnification with certain parameters in the prograde case, while a Kerr BH predicts one peak. Therefore, the multi-peak feature of can be used to distinguish the Kerr WH from other compact objects. We also find that the magnification of RN BH will be one peak compared to RN WH, in which the magnification of RN WH is negative in some situations. For other cases, the behavior of magnification from wormholes and their corresponding blackholes is similar. Our result may shed new light on exploring compact objects through the microlensing effect. △ Less

Submitted 8 June, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: Match the publication version

Journal ref: Phys.Dark Univ. 41 (2023) 101254

arXiv:2211.14721 [pdf, other]

Generalizing Gaussian Smoothing for Random Search

Authors: Katelyn Gao, Ozan Sener

Abstract: Gaussian smoothing (GS) is a derivative-free optimization (DFO) algorithm that estimates the gradient of an objective using perturbations of the current parameters sampled from a standard normal distribution. We generalize it to sampling perturbations from a larger family of distributions. Based on an analysis of DFO for non-convex functions, we propose to choose a distribution for perturbations t… ▽ More Gaussian smoothing (GS) is a derivative-free optimization (DFO) algorithm that estimates the gradient of an objective using perturbations of the current parameters sampled from a standard normal distribution. We generalize it to sampling perturbations from a larger family of distributions. Based on an analysis of DFO for non-convex functions, we propose to choose a distribution for perturbations that minimizes the mean squared error (MSE) of the gradient estimate. We derive three such distributions with provably smaller MSE than Gaussian smoothing. We conduct evaluations of the three sampling distributions on linear regression, reinforcement learning, and DFO benchmarks in order to validate our claims. Our proposal improves on GS with the same computational complexity, and are usually competitive with and often outperform Guided ES and Orthogonal ES, two computationally more expensive algorithms that adapt the covariance matrix of normally distributed perturbations. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: This work was published at ICML 2022. This version contains some minor corrections and a link to a code repository

arXiv:2211.08406 [pdf, other]

Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design

Authors: Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu

Abstract: Antibodies are versatile proteins that can bind to pathogens and provide effective protection for human body. Recently, deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences. However, the computational methods heavily rely on high-quality antibody structure data… ▽ More Antibodies are versatile proteins that can bind to pathogens and provide effective protection for human body. Recently, deep learning-based computational antibody design has attracted popular attention since it automatically mines the antibody patterns from data that could be complementary to human experiences. However, the computational methods heavily rely on high-quality antibody structure data, which is quite limited. Besides, the complementarity-determining region (CDR), which is the key component of an antibody that determines the specificity and binding affinity, is highly variable and hard to predict. Therefore, the data limitation issue further raises the difficulty of CDR generation for antibodies. Fortunately, there exists a large amount of sequence data of antibodies that can help model the CDR and alleviate the reliance on structure data. By witnessing the success of pre-training models for protein modeling, in this paper, we develop the antibody pre-training language model and incorporate it into the (antigen-specific) antibody design model in a systemic way. Specifically, we first pre-train an antibody language model based on the sequence data, then propose a one-shot way for sequence and structure generation of CDR to avoid the heavy cost and error propagation from an autoregressive manner, and finally leverage the pre-trained antibody model for the antigen-specific antibody generation model with some carefully designed modules. Through various experiments, we show that our method achieves superior performances over previous baselines on different tasks, such as sequence and structure generation and antigen-binding CDR-H3 design. △ Less

Submitted 17 November, 2022; v1 submitted 26 October, 2022; originally announced November 2022.

arXiv:2211.01970 [pdf]

AI enhanced finite element multiscale modelling and structural uncertainty analysis of a functionally graded porous beam

Authors: Da Chen, Nima Emami, Shahed Rezaei, Philipp L. Rosendahl, Bai-Xiang Xu, Jens Schneider, Kang Gao, Jie Yang

Abstract: The local geometrical randomness of metal foams brings complexities to the performance prediction of porous structures. Although the relative density is commonly deemed as the key factor, the stochasticity of internal cell sizes and shapes has an apparent effect on the porous structural behaviour but the corresponding measurement is challenging. To address this issue, we are aimed to develop an as… ▽ More The local geometrical randomness of metal foams brings complexities to the performance prediction of porous structures. Although the relative density is commonly deemed as the key factor, the stochasticity of internal cell sizes and shapes has an apparent effect on the porous structural behaviour but the corresponding measurement is challenging. To address this issue, we are aimed to develop an assessment strategy for efficiently examining the foam properties by combining multiscale modelling and deep learning. The multiscale modelling is based on the finite element (FE) simulation employing representative volume elements (RVEs) with random cellular morphologies, mimicking the typical features of closed-cell Aluminium foams. A deep learning database is constructed for training the designed convolutional neural networks (CNNs) to establish a direct link between the mesoscopic porosity characteristics and the effective Youngs modulus of foams. The error range of CNN models leads to an uncertain mechanical performance, which is further evaluated in a structural uncertainty analysis on the FG porous three-layer beam consisting of two thin high-density layers and a thick low-density one, where the imprecise CNN predicted moduli are represented as triangular fuzzy numbers in double parametric form. The uncertain beam bending deflections under a mid-span point load are calculated with the aid of Timoshenko beam theory and the Ritz method. Our findings suggest the success in training CNN models to estimate RVE modulus using images with an average error of 5.92%. The evaluation of FG porous structures can be significantly simplified with the proposed method and connects to the mesoscopic cellular morphologies without establishing the mechanics model for local foams. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: Book chapter in MACHINE LEARNING AIDED ANALYSIS, DESIGN, AND ADDITIVE MANUFACTURING OF FUNCTIONALLY GRADED POROUS COMPOSITE STRUCTURES, 20 pages, 10 figures

arXiv:2210.01283 [pdf, other]

Effective and Robust Non-Prehensile Manipulation via Persistent Homology Guided Monte-Carlo Tree Search

Authors: Ewerton R. Vieira, Kai Gao, Daniel Nakhimovich, Kostas E. Bekris, Jingjin Yu

Abstract: Performing object retrieval in real-world workspaces must tackle challenges including \emph{uncertainty} and \emph{clutter}. One option is to apply prehensile operations, which can be time consuming in highly-cluttered scenarios. On the other hand, non-prehensile actions, such as pushing simultaneously multiple objects, can help to quickly clear a cluttered workspace and retrieve a target object.… ▽ More Performing object retrieval in real-world workspaces must tackle challenges including \emph{uncertainty} and \emph{clutter}. One option is to apply prehensile operations, which can be time consuming in highly-cluttered scenarios. On the other hand, non-prehensile actions, such as pushing simultaneously multiple objects, can help to quickly clear a cluttered workspace and retrieve a target object. Such actions, however, can also lead to increased uncertainty as it is difficult to estimate the outcome of pushing operations. The proposed framework in this work integrates topological tools and Monte-Carlo Tree Search (MCTS) to achieve effective and robust pushing for object retrieval. It employs persistent homology to automatically identify manageable clusters of blocking objects without the need for manually adjusting hyper-parameters. Then, MCTS uses this information to explore feasible actions to push groups of objects, aiming to minimize the number of operations needed to clear the path to the target. Real-world experiments using a Baxter robot, which involves some noise in actuation, show that the proposed framework achieves a higher success rate in solving retrieval tasks in dense clutter than alternatives. Moreover, it produces solutions with few pushing actions improving the overall execution time. More critically, it is robust enough that it allows one to plan the sequence of actions offline and then execute them reliably on a Baxter robot. △ Less

Submitted 6 February, 2024; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2210.00379 [pdf, ps, other]

NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)

Authors: Kyle Gao, Yina Gao, Hongjie He, Dening Lu, Linlin Xu, Jonathan Li

Abstract: In March 2020, Neural Radiance Field (NeRF) revolutionized Computer Vision, allowing for implicit, neural network-based scene representation and novel view synthesis. NeRF models have found diverse applications in robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, and more. In August 2023, Gaussian Splatting, a direct competitor to the NeRF-based framework, was prop… ▽ More In March 2020, Neural Radiance Field (NeRF) revolutionized Computer Vision, allowing for implicit, neural network-based scene representation and novel view synthesis. NeRF models have found diverse applications in robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, and more. In August 2023, Gaussian Splatting, a direct competitor to the NeRF-based framework, was proposed, gaining tremendous momentum and overtaking NeRF-based research in terms of interest as the dominant framework for novel view synthesis. We present a comprehensive survey of NeRF papers from the past five years (2020-2025). These include papers from the pre-Gaussian Splatting era, where NeRF dominated the field for novel view synthesis and 3D implicit and hybrid representation neural field learning. We also include works from the post-Gaussian Splatting era where NeRF and implicit/hybrid neural fields found more niche applications. Our survey is organized into architecture and application-based taxonomies in the pre-Gaussian Splatting era, as well as a categorization of active research areas for NeRF, neural field, and implicit/hybrid neural representation methods. We provide an introduction to the theory of NeRF and its training via differentiable volume rendering. We also present a benchmark comparison of the performance and speed of classical NeRF, implicit and hybrid neural representation, and neural field models, and an overview of key datasets. △ Less

Submitted 19 June, 2025; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: Updated Post-Gaussian Splatting

ACM Class: I.4

arXiv:2209.11255 [pdf, other]

3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation

Authors: Dening Lu, Kyle Gao, Qian Xie, Linlin Xu, Jonathan Li

Abstract: Although the application of Transformers in 3D point cloud processing has achieved significant progress and success, it is still challenging for existing 3D Transformer methods to efficiently and accurately learn both valuable global features and valuable local features for improved applications. This paper presents a novel point cloud representational learning network, called 3D Dual Self-attenti… ▽ More Although the application of Transformers in 3D point cloud processing has achieved significant progress and success, it is still challenging for existing 3D Transformer methods to efficiently and accurately learn both valuable global features and valuable local features for improved applications. This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN), for improved feature learning in both classification and segmentation tasks, with the following key contributions. First, a GLocal Feature Learning (GFL) block with the dual self-attention mechanism (i.e., a novel Point-Patch Self-Attention, called PPSA, and a channel-wise self-attention) is designed to efficiently learn the GLocal context information. Second, the GFL block is integrated with a multi-scale Graph Convolution-based Local Feature Aggregation (LFA) block, leading to a Global-Local (GLocal) information extraction module that can efficiently capture critical information. Third, a series of GLocal modules are used to construct a new hierarchical encoder-decoder structure to enable the learning of "GLocal" information in different scales in a hierarchical manner. The proposed framework is evaluated on both classification and segmentation datasets, demonstrating that the proposed method is capable of outperforming many state-of-the-art methods on both classification and segmentation tasks. △ Less

Submitted 30 May, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 10 pages, 6 figures, 4 tables

arXiv:2209.05390 [pdf, other]

On the Utility of Buffers in Pick-n-Swap Based Lattice Rearrangement

Authors: Kai Gao, Jingjin Yu

Abstract: We investigate the utility of employing multiple buffers in solving a class of rearrangement problems with pick-n-swap manipulation primitives. In this problem, objects stored randomly in a lattice are to be sorted using a robot arm with k>=1 swap spaces or buffers, capable of holding up to k objects on its end-effector simultaneously. On the structural side, we show that the addition of each new… ▽ More We investigate the utility of employing multiple buffers in solving a class of rearrangement problems with pick-n-swap manipulation primitives. In this problem, objects stored randomly in a lattice are to be sorted using a robot arm with k>=1 swap spaces or buffers, capable of holding up to k objects on its end-effector simultaneously. On the structural side, we show that the addition of each new buffer brings diminishing returns in saving the end-effector travel distance while holding the total number of pick-n-swap operations at the minimum. This is due to an interesting recursive cycle structure in random m-permutation, where the largest cycle covers over 60% of objects. On the algorithmic side, we propose fast algorithms for 1D and 2D lattice rearrangement problems that can effectively use multiple buffers to boost solution optimality. Numerical experiments demonstrate the efficiency and scalability of our methods, as well as confirm the diminishing return structure as more buffers are employed. △ Less

Submitted 17 February, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: To present in 2023 IEEE International Conference on Robotics and Automation(ICRA 2023)

arXiv:2209.02604 [pdf, other]

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Authors: Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

Abstract: Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Un… ▽ More Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Under such circumstances, in this work, we emphasize making non-verbal cues matter for the MSA task. Firstly, from the resource perspective, we present the CH-SIMS v2.0 dataset, an extension and enhancement of the CH-SIMS. Compared with the original dataset, the CH-SIMS v2.0 doubles its size with another 2121 refined video segments with both unimodal and multimodal annotations and collects 10161 unlabelled raw video segments with rich acoustic and visual emotion-bearing context to highlight non-verbal cues for sentiment prediction. Secondly, from the model perspective, benefiting from the unimodal annotations and the unsupervised data in the CH-SIMS v2.0, the Acoustic Visual Mixup Consistent (AV-MC) framework is proposed. The designed modality mixup module can be regarded as an augmentation, which mixes the acoustic and visual modalities from different videos. Through drawing unobserved multimodal context along with the text, the model can learn to be aware of different non-verbal contexts for sentiment prediction. Our evaluations demonstrate that both CH-SIMS v2.0 and AV-MC framework enables further research for discovering emotion-bearing acoustic and visual cues and paves the path to interpretable end-to-end HCI applications for real-world scenarios. △ Less

Submitted 21 August, 2022; originally announced September 2022.

Comments: 16pages, 7 figures, accepted by ICMI 2022

arXiv:2208.14207 [pdf, other]

doi 10.1002/wilm.11014

Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model

Authors: Kang Gao, Perukrishnen Vytelingum, Stephen Weston, Wayne Luk, Ce Guo

Abstract: This article presents XGB-Chiarella, a powerful new approach for deploying agent-based models to generate realistic intra-day artificial financial price data. This approach is based on agent-based models, calibrated by XGBoost machine learning surrogate. Following the Extended Chiarella model, three types of trading agents are introduced in this agent-based model: fundamental traders, momentum tra… ▽ More This article presents XGB-Chiarella, a powerful new approach for deploying agent-based models to generate realistic intra-day artificial financial price data. This approach is based on agent-based models, calibrated by XGBoost machine learning surrogate. Following the Extended Chiarella model, three types of trading agents are introduced in this agent-based model: fundamental traders, momentum traders, and noise traders. In particular, XGB-Chiarella focuses on configuring the simulation to accurately reflect real market behaviours. Instead of using the original Expectation-Maximisation algorithm for parameter estimation, the agent-based Extended Chiarella model is calibrated using XGBoost machine learning surrogate. It is shown that the machine learning surrogate learned in the proposed method is an accurate proxy of the true agent-based market simulation. The proposed calibration method is superior to the original Expectation-Maximisation parameter estimation in terms of the distance between historical and simulated stylised facts. With the same underlying model, the proposed methodology is capable of generating realistic price time series in various stocks listed at three different exchanges, which indicates the universality of intra-day price formation process. For the time scale (minutes) chosen in this paper, one agent per category is shown to be sufficient to capture the intra-day price formation process. The proposed XGB-Chiarella approach provides insights that the price formation process is comprised of the interactions between momentum traders, fundamental traders, and noise traders. It can also be used to enhance risk management by practitioners. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: Published in WILMOTT Magazine: May 2022 issue. arXiv admin note: text overlap with arXiv:2208.13654

Journal ref: Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model, Wilmott, vol. 2022, iss. 119, p. 22-38, 2022

arXiv:2208.13654 [pdf, other]

doi 10.18564/jasss.5403

High-frequency financial market simulation and flash crash scenarios analysis: an agent-based modelling approach

Authors: Kang Gao, Perukrishnen Vytelingum, Stephen Weston, Wayne Luk, Ce Guo

Abstract: This paper describes simulations and analysis of flash crash scenarios in an agent-based modelling framework. We design, implement, and assess a novel high-frequency agent-based financial market simulator that generates realistic millisecond-level financial price time series for the E-Mini S&P 500 futures market. Specifically, a microstructure model of a single security traded on a central limit o… ▽ More This paper describes simulations and analysis of flash crash scenarios in an agent-based modelling framework. We design, implement, and assess a novel high-frequency agent-based financial market simulator that generates realistic millisecond-level financial price time series for the E-Mini S&P 500 futures market. Specifically, a microstructure model of a single security traded on a central limit order book is provided, where different types of traders follow different behavioural rules. The model is calibrated using the machine learning surrogate modelling approach. Statistical test and moment coverage ratio results show that the model has excellent capability of reproducing realistic stylised facts in financial markets. By introducing an institutional trader that mimics the real-world Sell Algorithm on May 6th, 2010, the proposed high-frequency agent-based financial market simulator is used to simulate the Flash Crash that took place that day. We scrutinise the market dynamics during the simulated flash crash and show that the simulated dynamics are consistent with what happened in historical flash crash scenarios. With the help of Monte Carlo simulations, we discover functional relationships between the amplitude of the simulated 2010 Flash Crash and three conditions: the percentage of volume of the Sell Algorithm, the market maker inventory limit, and the trading frequency of fundamental traders. Similar analyses are carried out for mini flash crash events. An innovative "Spiking Trader" is introduced to the model, aiming at precipitating mini flash crash events. We analyse the market dynamics during the course of a typical simulated mini flash crash event and study the conditions affecting its characteristics. The proposed model can be used for testing resiliency and robustness of trading algorithms and providing advice for policymakers. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Journal ref: Journal of Artificial Societies and Social Simulation 2024 27 (2) 8 <http://jasss.soc.surrey.ac.uk/27/2/8.html>

arXiv:2208.11457 [pdf, other]

doi 10.1145/3511808.3557154

Scenario-Adaptive and Self-Supervised Model for Multi-Scenario Personalized Recommendation

Authors: Yuanliang Zhang, Xiaofeng Wang, Jinxin Hu, Ke Gao, Chenyi Lei, Fei Fang

Abstract: Multi-scenario recommendation is dedicated to retrieve relevant items for users in multiple scenarios, which is ubiquitous in industrial recommendation systems. These scenarios enjoy portions of overlaps in users and items, while the distribution of different scenarios is different. The key point of multi-scenario modeling is to efficiently maximize the use of whole-scenario information and granul… ▽ More Multi-scenario recommendation is dedicated to retrieve relevant items for users in multiple scenarios, which is ubiquitous in industrial recommendation systems. These scenarios enjoy portions of overlaps in users and items, while the distribution of different scenarios is different. The key point of multi-scenario modeling is to efficiently maximize the use of whole-scenario information and granularly generate adaptive representations both for users and items among multiple scenarios. we summarize three practical challenges which are not well solved for multi-scenario modeling: (1) Lacking of fine-grained and decoupled information transfer controls among multiple scenarios. (2) Insufficient exploitation of entire space samples. (3) Item's multi-scenario representation disentanglement problem. In this paper, we propose a Scenario-Adaptive and Self-Supervised (SASS) model to solve the three challenges mentioned above. Specifically, we design a Multi-Layer Scenario Adaptive Transfer (ML-SAT) module with scenario-adaptive gate units to select and fuse effective transfer information from whole scenario to individual scenario in a quite fine-grained and decoupled way. To sufficiently exploit the power of entire space samples, a two-stage training process including pre-training and fine-tune is introduced. The pre-training stage is based on a scenario-supervised contrastive learning task with the training samples drawn from labeled and unlabeled data spaces. The model is created symmetrically both in user side and item side, so that we can get distinguishing representations of items in different scenarios. Extensive experimental results on public and industrial datasets demonstrate the superiority of the SASS model over state-of-the-art methods. This model also achieves more than 8.0% improvement on Average Watching Time Per User in online A/B tests. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: Accepted by CIKM 2022

arXiv:2208.08052 [pdf, other]

Imperceptible and Robust Backdoor Attack in 3D Point Cloud

Authors: Kuofeng Gao, Jiawang Bai, Baoyuan Wu, Mengxi Ya, Shu-Tao Xia

Abstract: With the thriving of deep learning in processing point cloud data, recent works show that backdoor attacks pose a severe security threat to 3D vision applications. The attacker injects the backdoor into the 3D model by poisoning a few training samples with trigger, such that the backdoored model performs well on clean samples but behaves maliciously when the trigger pattern appears. Existing attac… ▽ More With the thriving of deep learning in processing point cloud data, recent works show that backdoor attacks pose a severe security threat to 3D vision applications. The attacker injects the backdoor into the 3D model by poisoning a few training samples with trigger, such that the backdoored model performs well on clean samples but behaves maliciously when the trigger pattern appears. Existing attacks often insert some additional points into the point cloud as the trigger, or utilize a linear transformation (e.g., rotation) to construct the poisoned point cloud. However, the effects of these poisoned samples are likely to be weakened or even eliminated by some commonly used pre-processing techniques for 3D point cloud, e.g., outlier removal or rotation augmentation. In this paper, we propose a novel imperceptible and robust backdoor attack (IRBA) to tackle this challenge. We utilize a nonlinear and local transformation, called weighted local transformation (WLT), to construct poisoned samples with unique transformations. As there are several hyper-parameters and randomness in WLT, it is difficult to produce two similar transformations. Consequently, poisoned samples with unique transformations are likely to be resistant to aforementioned pre-processing techniques. Besides, as the controllability and smoothness of the distortion caused by a fixed WLT, the generated poisoned samples are also imperceptible to human inspection. Extensive experiments on three benchmark datasets and four models show that IRBA achieves 80%+ ASR in most cases even with pre-processing techniques, which is significantly higher than previous state-of-the-art attacks. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2208.06759 [pdf, ps, other]

On variational principles of metric mean dimension on subset in Feldman-Katok metric

Authors: Kunmei Gao, Ruifeng Zhang

Abstract: In this paper, we studied the metric mean dimension in Feldman-Katok(FK for short) metric. We introduced the notions of FK-Bowen metric mean dimension and FK-Packing metric mean dimension on subset. And we established two variational principles. In this paper, we studied the metric mean dimension in Feldman-Katok(FK for short) metric. We introduced the notions of FK-Bowen metric mean dimension and FK-Packing metric mean dimension on subset. And we established two variational principles. △ Less

Submitted 13 August, 2022; originally announced August 2022.

arXiv:2208.05217 [pdf, other]

doi 10.18653/v1/2022.findings-naacl.179

Continual Machine Reading Comprehension via Uncertainty-aware Fixed Memory and Adversarial Domain Adaptation

Authors: Zhijing Wu, Hua Xu, Jingliang Fang, Kai Gao

Abstract: Continual Machine Reading Comprehension aims to incrementally learn from a continuous data stream across time without access the previous seen data, which is crucial for the development of real-world MRC systems. However, it is a great challenge to learn a new domain incrementally without catastrophically forgetting previous knowledge. In this paper, MA-MRC, a continual MRC model with uncertainty-… ▽ More Continual Machine Reading Comprehension aims to incrementally learn from a continuous data stream across time without access the previous seen data, which is crucial for the development of real-world MRC systems. However, it is a great challenge to learn a new domain incrementally without catastrophically forgetting previous knowledge. In this paper, MA-MRC, a continual MRC model with uncertainty-aware fixed Memory and Adversarial domain adaptation, is proposed. In MA-MRC, a fixed size memory stores a small number of samples in previous domain data along with an uncertainty-aware updating strategy when new domain data arrives. For incremental learning, MA-MRC not only keeps a stable understanding by learning both memory and new domain data, but also makes full use of the domain adaptation relationship between them by adversarial learning strategy. The experimental results show that MA-MRC is superior to strong baselines and has a substantial incremental learning ability without catastrophically forgetting under two different continual MRC settings. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Journal ref: Published in Findings of NAACL 2022

arXiv:2207.13417 [pdf, other]

Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips

Authors: Jiawang Bai, Kuofeng Gao, Dihong Gong, Shu-Tao Xia, Zhifeng Li, Wei Liu

Abstract: The security of deep neural networks (DNNs) has attracted increasing attention due to their widespread use in various applications. Recently, the deployed DNNs have been demonstrated to be vulnerable to Trojan attacks, which manipulate model parameters with bit flips to inject a hidden behavior and activate it by a specific trigger pattern. However, all existing Trojan attacks adopt noticeable pat… ▽ More The security of deep neural networks (DNNs) has attracted increasing attention due to their widespread use in various applications. Recently, the deployed DNNs have been demonstrated to be vulnerable to Trojan attacks, which manipulate model parameters with bit flips to inject a hidden behavior and activate it by a specific trigger pattern. However, all existing Trojan attacks adopt noticeable patch-based triggers (e.g., a square pattern), making them perceptible to humans and easy to be spotted by machines. In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). HPT crafts hardly perceptible Trojan images by utilizing the additive noise and per pixel flow field to tweak the pixel values and positions of the original images, respectively. To achieve superior attack performance, we propose to jointly optimize bit flips, additive noise, and flow field. Since the weight bits of the DNNs are binary, this problem is very hard to be solved. We handle the binary constraint with equivalent replacement and provide an effective optimization algorithm. Extensive experiments on CIFAR-10, SVHN, and ImageNet datasets show that the proposed HPT can generate hardly perceptible Trojan images, while achieving comparable or better attack performance compared to the state-of-the-art methods. The code is available at: https://github.com/jiawangbai/HPT. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV2022; Code: https://github.com/jiawangbai/HPT

arXiv:2207.12601 [pdf]

doi 10.1088/1674-1137/ac9371

Flux Variations of Cosmic Ray Air Showers Detected by LHAASO-KM2A During a Thunderstorm on 10 June 2021

Authors: LHAASO Collaboration, F. Aharonian, Q. An, Axikegu, L. X. Bai, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Zhe Cao, Zhen Cao, J. Chang, J. F. Chang, E. S. Chen, Liang Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, S. H. Chen, S. Z. Chen, T. L. Chen, X. J. Chen , et al. (248 additional authors not shown)

Abstract: The Large High Altitude Air Shower Observatory (LHAASO) has three sub-arrays, KM2A, WCDA and WFCTA. The flux variations of cosmic ray air showers were studied by analyzing the KM2A data during the thunderstorm on 10 June 2021. The number of shower events that meet the trigger conditions increases significantly in atmospheric electric fields, with maximum fractional increase of 20%. The variations… ▽ More The Large High Altitude Air Shower Observatory (LHAASO) has three sub-arrays, KM2A, WCDA and WFCTA. The flux variations of cosmic ray air showers were studied by analyzing the KM2A data during the thunderstorm on 10 June 2021. The number of shower events that meet the trigger conditions increases significantly in atmospheric electric fields, with maximum fractional increase of 20%. The variations of trigger rates (increases or decreases) are found to be strongly dependent on the primary zenith angle. The flux of secondary particles increases significantly, following a similar trend with that of the shower events. To better understand the observed behavior, Monte Carlo simulations are performed with CORSIKA and G4KM2A (a code based on GEANT4). We find that the experimental data (in saturated negative fields) are in good agreement with simulations, assuming the presence of a uniform upward electric field of 700 V/cm with a thickness of 1500 m in the atmosphere above the observation level. Due to the acceleration/deceleration and deflection by the atmospheric electric field, the number of secondary particles with energy above the detector threshold is modified, resulting in the changes in shower detection rate. △ Less

Submitted 6 December, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: 18 pages, 11 figures

Journal ref: Chinese Phys. C 47 015001 (2023)

arXiv:2207.08078 [pdf, other]

Toward Efficient Task Planning for Dual-Arm Tabletop Object Rearrangement

Authors: Kai Gao, Jingjin Yu

Abstract: We investigate the problem of coordinating two robot arms to solve non-monotone tabletop multi-object rearrangement tasks. In a non-monotone rearrangement task, complex object-object dependencies exist that require moving some objects multiple times to solve an instance. In working with two arms in a large workspace, some objects must be handed off between the robots, which further complicates the… ▽ More We investigate the problem of coordinating two robot arms to solve non-monotone tabletop multi-object rearrangement tasks. In a non-monotone rearrangement task, complex object-object dependencies exist that require moving some objects multiple times to solve an instance. In working with two arms in a large workspace, some objects must be handed off between the robots, which further complicates the planning process. For the challenging dual-arm tabletop rearrangement problem, we develop effective task planning algorithms for scheduling the pick-n-place sequence that can be properly distributed between the two arms. We show that, even without using a sophisticated motion planner, our method achieves significant time savings in comparison to greedy approaches and naive parallelization of single-robot plans. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: Accepted by IROS 2022

arXiv:2207.03655 [pdf]

doi 10.1016/j.cplett.2022.139991

Unraveling the spin reorientation process in rare earth perovskite PrFe0.1Cr0.9O3

Authors: Jiyu Shen, Jiajun Mo, Zeyi Lu, Chenying Gong, Zongjin Wu, Kaiyang Gao, Min Liu, Yanfang Xia

Abstract: Ultrafast spin control plays a pivotal role in condensed matter physics. In this study, we analyzed the macroscopic magnetization of the PrFe0.1Cr0.9O3 system by molecular field model fitting. And the whole process of system spin reorientation is accurately calculated in the fitting process. It is found that, unlike the rare-earth perovskites we have previously studied, PrFe0.1Cr0.9O3 exhibits spi… ▽ More Ultrafast spin control plays a pivotal role in condensed matter physics. In this study, we analyzed the macroscopic magnetization of the PrFe0.1Cr0.9O3 system by molecular field model fitting. And the whole process of system spin reorientation is accurately calculated in the fitting process. It is found that, unlike the rare-earth perovskites we have previously studied, PrFe0.1Cr0.9O3 exhibits spin-reversion properties during the reorientation process. This research will lay a theoretical foundation for precise spin control in the future. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.03220

arXiv:2207.03654 [pdf]

Special spin behavior of rare earth ions at the A site of polycrystalline ErFe1-xCrxO3 (x = 0.1, 0.9)

Authors: Jiyu Shen, Jiajun Mo, Zeyi Lu, Zhongjin Wu, Chenying Gong, Kaiyang Gao, Pinglu Zheng, Min Liu, Yanfang Xia

Abstract: Thermally induced spin control is one of the main directions for future spin devices. In this study, we synthesized single-phase polycrystalline ErFe1-xCrxO3 and combined the magnetization curves and Mössbauer spectra to determine the macroscopic magnetism at room temperature. The magnetization of the system at various temperatures is well simulated by molecular field theory. And it is found that… ▽ More Thermally induced spin control is one of the main directions for future spin devices. In this study, we synthesized single-phase polycrystalline ErFe1-xCrxO3 and combined the magnetization curves and Mössbauer spectra to determine the macroscopic magnetism at room temperature. The magnetization of the system at various temperatures is well simulated by molecular field theory. And it is found that under the DM interaction, not only the B-site ions undergo a reorientation process, but the spins of the A-site ions also change at the same time. The effective spin is defined as the projection of Er3+ on the Fe3+/Cr3+ spin plane, and the whole reorientation process is obtained by fitting. This study will complement the actual process of ErFe1-xCrxO3 spin reorientation and will lay a theoretical foundation for the fabrication of future spin-controlled devices. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2207.03220 [pdf]

Unraveling Thermally Induced Spin reorientation of Strongly Disordered NdFe0.5Cr0.5O3 System

Authors: Jiyu Shen, Jiajun Mo, Zeyi Lu, Chenying Gong, Kaiyang Gao, Ke Shi, Lizhou Yu, Yan Chen, Min Liu, Yanfang Xia

Abstract: Sophisticated spin instruments require high-precision spin control. In this study, we accurately study the intrinsic magnetic properties of the strongly disordered system NdFe0.5Cr0.5O3 through molecular field models combined with ASD theory. The three constituent sub-magnetic phases of the system are separated, and their magnetization contributions are calculated separately. Fitting the angle of… ▽ More Sophisticated spin instruments require high-precision spin control. In this study, we accurately study the intrinsic magnetic properties of the strongly disordered system NdFe0.5Cr0.5O3 through molecular field models combined with ASD theory. The three constituent sub-magnetic phases of the system are separated, and their magnetization contributions are calculated separately. Fitting the angle of the A/B magnetic moment at a given temperature, the reorientation temperature point and temperature dependence of different magnetic phases are obtained. This research will provide a very good theoretical support for studying complex disordered systems and applying high-precision spin control and lay a foundation for the design of new functional materials. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2207.01472 [pdf, other]

Deep Contrastive One-Class Time Series Anomaly Detection

Authors: Rui Wang, Chongwei Liu, Xudong Mou, Kai Gao, Xiaohui Guo, Pin Liu, Tianyu Wo, Xudong Liu

Abstract: The accumulation of time-series data and the absence of labels make time-series Anomaly Detection (AD) a self-supervised deep learning task. Single-normality-assumption-based methods, which reveal only a certain aspect of the whole normality, are incapable of tasks involved with a large number of anomalies. Specifically, Contrastive Learning (CL) methods distance negative pairs, many of which cons… ▽ More The accumulation of time-series data and the absence of labels make time-series Anomaly Detection (AD) a self-supervised deep learning task. Single-normality-assumption-based methods, which reveal only a certain aspect of the whole normality, are incapable of tasks involved with a large number of anomalies. Specifically, Contrastive Learning (CL) methods distance negative pairs, many of which consist of both normal samples, thus reducing the AD performance. Existing multi-normality-assumption-based methods are usually two-staged, firstly pre-training through certain tasks whose target may differ from AD, limiting their performance. To overcome the shortcomings, a deep Contrastive One-Class Anomaly detection method of time series (COCA) is proposed by authors, following the normality assumptions of CL and one-class classification. It treats the original and reconstructed representations as the positive pair of negative-sample-free CL, namely "sequence contrast". Next, invariance terms and variance terms compose a contrastive one-class loss function in which the loss of the assumptions is optimized by invariance terms simultaneously and the "hypersphere collapse" is prevented by variance terms. In addition, extensive experiments on two real-world time-series datasets show the superior performance of the proposed method achieves state-of-the-art. △ Less

Submitted 16 April, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.04910 [pdf, other]

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Authors: Jinsong Chen, Kaiyuan Gao, Gaichao Li, Kun He

Abstract: The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number… ▽ More The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number of nodes for the self-attention computation. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations and thereby produces a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale to large graphs. Moreover, we mathematically show that as compared to a category of advanced Graph Neural Networks (GNNs), the decoupled Graph Convolutional Network, NAGphormer could learn more informative node representations from the multi-hop neighborhoods. Extensive experiments on benchmark datasets from small to large are conducted to demonstrate that NAGphormer consistently outperforms existing graph Transformers and mainstream GNNs. Code is available at https://github.com/JHL-HUST/NAGphormer. △ Less

Submitted 27 February, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: Accepted by ICLR 2023

arXiv:2205.11233 [pdf, other]

Poincaré Heterogeneous Graph Neural Networks for Sequential Recommendation

Authors: Naicheng Guo, Xiaolei Liu, Shaoshuai Li, Qiongxu Ma, Kaixin Gao, Bing Han, Lin Zheng, Xiaobo Guo

Abstract: Sequential recommendation (SR) learns users' preferences by capturing the sequential patterns from users' behaviors evolution. As discussed in many works, user-item interactions of SR generally present the intrinsic power-law distribution, which can be ascended to hierarchy-like structures. Previous methods usually handle such hierarchical information by making user-item sectionalization empirical… ▽ More Sequential recommendation (SR) learns users' preferences by capturing the sequential patterns from users' behaviors evolution. As discussed in many works, user-item interactions of SR generally present the intrinsic power-law distribution, which can be ascended to hierarchy-like structures. Previous methods usually handle such hierarchical information by making user-item sectionalization empirically under Euclidean space, which may cause distortion of user-item representation in real online scenarios. In this paper, we propose a Poincaré-based heterogeneous graph neural network named PHGR to model the sequential pattern information as well as hierarchical information contained in the data of SR scenarios simultaneously. Specifically, for the purpose of explicitly capturing the hierarchical information, we first construct a weighted user-item heterogeneous graph by aliening all the user-item interactions to improve the perception domain of each user from a global view. Then the output of the global representation would be used to complement the local directed item-item homogeneous graph convolution. By defining a novel hyperbolic inner product operator, the global and local graph representation learning are directly conducted in Poincaré ball instead of commonly used projection operation between Poincaré ball and Euclidean space, which could alleviate the cumulative error issue of general bidirectional translation process. Moreover, for the purpose of explicitly capturing the sequential dependency information, we design two types of temporal attention operations under Poincaré ball space. Empirical evaluations on datasets from the public and financial industry show that PHGR outperforms several comparison methods. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 32 pages, 12 figuews

arXiv:2205.07417 [pdf, other]

Transformers in 3D Point Clouds: A Survey

Authors: Dening Lu, Qian Xie, Mingqiang Wei, Kyle Gao, Linlin Xu, Jonathan Li

Abstract: Transformers have been at the heart of the Natural Language Processing (NLP) and Computer Vision (CV) revolutions. The significant success in NLP and CV inspired exploring the use of Transformers in point cloud processing. However, how do Transformers cope with the irregularity and unordered nature of point clouds? How suitable are Transformers for different 3D representations (e.g., point- or vox… ▽ More Transformers have been at the heart of the Natural Language Processing (NLP) and Computer Vision (CV) revolutions. The significant success in NLP and CV inspired exploring the use of Transformers in point cloud processing. However, how do Transformers cope with the irregularity and unordered nature of point clouds? How suitable are Transformers for different 3D representations (e.g., point- or voxel-based)? How competent are Transformers for various 3D processing tasks? As of now, there is still no systematic survey of the research on these issues. For the first time, we provided a comprehensive overview of increasingly popular Transformers for 3D point cloud analysis. We start by introducing the theory of the Transformer architecture and reviewing its applications in 2D/3D fields. Then, we present three different taxonomies (i.e., implementation-, data representation-, and task-based), which can classify current Transformer-based methods from multiple perspectives. Furthermore, we present the results of an investigation of the variants and improvements of the self-attention mechanism in 3D. To demonstrate the superiority of Transformers in point cloud analysis, we present comprehensive comparisons of various Transformer-based methods for classification, segmentation, and object detection. Finally, we suggest three potential research directions, providing benefit references for the development of 3D Transformers. △ Less

Submitted 21 September, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: 20 pages, 5 figures, 4 tables

arXiv:2204.13570 [pdf, other]

Learning First-Order Rules with Differentiable Logic Program Semantics

Authors: Kun Gao, Katsumi Inoue, Yongzhi Cao, Hanpin Wang

Abstract: Learning first-order logic programs (LPs) from relational facts which yields intuitive insights into the data is a challenging topic in neuro-symbolic research. We introduce a novel differentiable inductive logic programming (ILP) model, called differentiable first-order rule learner (DFOL), which finds the correct LPs from relational facts by searching for the interpretable matrix representations… ▽ More Learning first-order logic programs (LPs) from relational facts which yields intuitive insights into the data is a challenging topic in neuro-symbolic research. We introduce a novel differentiable inductive logic programming (ILP) model, called differentiable first-order rule learner (DFOL), which finds the correct LPs from relational facts by searching for the interpretable matrix representations of LPs. These interpretable matrices are deemed as trainable tensors in neural networks (NNs). The NNs are devised according to the differentiable semantics of LPs. Specifically, we first adopt a novel propositionalization method that transfers facts to NN-readable vector pairs representing interpretation pairs. We replace the immediate consequence operator with NN constraint functions consisting of algebraic operations and a sigmoid-like activation function. We map the symbolic forward-chained format of LPs into NN constraint functions consisting of operations between subsymbolic vector representations of atoms. By applying gradient descent, the trained well parameters of NNs can be decoded into precise symbolic LPs in forward-chained logic format. We demonstrate that DFOL can perform on several standard ILP datasets, knowledge bases, and probabilistic relation facts and outperform several well-known differentiable ILP models. Experimental results indicate that DFOL is a precise, robust, scalable, and computationally cheap differentiable ILP model. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted by IJCAI 2022

arXiv:2204.11937 [pdf, other]

doi 10.1162/artl_a_00358

Computation by Convective Logic Gates and Thermal Communication

Authors: Stuart Bartlett, Andrew K Gao, Yuk L Yung

Abstract: We demonstrate a novel computational architecture based on fluid convection logic gates and heat flux-mediated information flows. Our previous work demonstrated that Boolean logic operations can be performed by thermally-driven convection flows. In this work, we use numerical simulations to demonstrate a different, but universal Boolean logic operation (NOR), performed by simpler convective gates.… ▽ More We demonstrate a novel computational architecture based on fluid convection logic gates and heat flux-mediated information flows. Our previous work demonstrated that Boolean logic operations can be performed by thermally-driven convection flows. In this work, we use numerical simulations to demonstrate a different, but universal Boolean logic operation (NOR), performed by simpler convective gates. The gates in the present work do not rely on obstacle flows or periodic boundary conditions, a significant improvement in terms of experimental realizability. Conductive heat transfer links can be used to connect the convective gates, and we demonstrate this with the example of binary half addition. These simulated circuits could be constructed in an experimental setting with modern, 2-dimensional fluidics equipment, such as a thin layer of fluid between acrylic plates. The presented approach thus introduces a new realm of unconventional, thermal fluid-based computation. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Journal ref: Artificial Life, 1-12 (2022)

arXiv:2204.11544 [pdf, other]

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives

Authors: Shaoning Xiao, Long Chen, Kaifeng Gao, Zhao Wang, Yi Yang, Zhimeng Zhang, Jun Xiao

Abstract: Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA).The major stumbling block to achieve this purpose is the semantic gap between language and video since they are at different levels of abstraction. Existing efforts mainly focus on designing sophisticated architectures while utilizing frame- or object-level visual representation… ▽ More Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA).The major stumbling block to achieve this purpose is the semantic gap between language and video since they are at different levels of abstraction. Existing efforts mainly focus on designing sophisticated architectures while utilizing frame- or object-level visual representations. In this paper, we reconsider the multi-modal alignment problem in VideoQA from feature and sample perspectives to achieve better performance. From the view of feature,we break down the video into trajectories and first leverage trajectory feature in VideoQA to enhance the alignment between two modalities. Moreover, we adopt a heterogeneous graph architecture and design a hierarchical framework to align both trajectory-level and frame-level visual feature with language feature. In addition, we found that VideoQA models are largely dependent on language priors and always neglect visual-language interactions. Thus, two effective yet portable training augmentation strategies are designed to strengthen the cross-modal correspondence ability of our model from the view of sample. Extensive results show that our method outperforms all the state-of-the-art models on the challenging NExT-QA benchmark, which demonstrates the effectiveness of the proposed method. △ Less

Submitted 2 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

arXiv:2203.15592 [pdf, other]

Demystifying Software Release Note Issues on GitHub

Authors: Jianyu Wu, Hao He, Wenxin Xiao, Kai Gao, Minghui Zhou

Abstract: Release notes (RNs) summarize main changes between two consecutive software versions and serve as a central source of information when users upgrade software. While producing high quality RNs can be hard and poses a variety of challenges to developers, a comprehensive empirical understanding of these challenges is still lacking. In this paper, we bridge this knowledge gap by manually analyzing 1,7… ▽ More Release notes (RNs) summarize main changes between two consecutive software versions and serve as a central source of information when users upgrade software. While producing high quality RNs can be hard and poses a variety of challenges to developers, a comprehensive empirical understanding of these challenges is still lacking. In this paper, we bridge this knowledge gap by manually analyzing 1,731 latest GitHub issues to build a comprehensive taxonomy of RN issues with four dimensions: Content, Presentation, Accessibility, and Production. Among these issues, nearly half (48.47%) of them focus on Production; Content, Accessibility, and Presentation take 25.61%, 17.65%, and 8.27%, respectively. We find that: 1) RN producers are more likely to miss information than to include incorrect information, especially for breaking changes; 2) improper layout may bury important information and confuse users; 3) many users find RNs inaccessible due to link deterioration, lack of notification, and obfuscate RN locations; 4) automating and regulating RN production remains challenging despite the great needs of RN producers. Our taxonomy not only pictures a roadmap to improve RN production in practice but also reveals interesting future research directions for automating RN production. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted for IEEE/ACM 30th International Conference on Program Comprehension (ICPC 2022)

arXiv:2203.12441 [pdf, other]

M-SENA: An Integrated Platform for Multimodal Sentiment Analysis

Authors: Huisheng Mao, Ziqi Yuan, Hua Xu, Wenmeng Yu, Yihe Liu, Kai Gao

Abstract: M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the… ▽ More M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate advanced research by providing flexible toolkits, reliable benchmarks, and intuitive demonstrations. The platform features a fully modular video sentiment analysis framework consisting of data management, feature extraction, model training, and result analysis modules. In this paper, we first illustrate the overall architecture of the M-SENA platform and then introduce features of the core modules. Reliable baseline results of different modality features and MSA benchmarks are also reported. Moreover, we use model evaluation and analysis tools provided by M-SENA to present intermediate representation visualization, on-the-fly instance test, and generalization ability test results. The source code of the platform is publicly available at https://github.com/thuiar/M-SENA. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 11 pages, 4 figures, to be published in ACL 2022 System Demonstration Track

arXiv:2203.10379 [pdf, other]

Lazy Rearrangement Planning in Confined Spaces

Authors: Rui Wang, Kai Gao, Jingjin Yu, Kostas Bekris

Abstract: Object rearrangement is important for many applications but remains challenging, especially in confined spaces, such as shelves, where objects cannot be accessed from above and they block reachability to each other. Such constraints require many motion planning and collision checking calls, which are computationally expensive. In addition, the arrangement space grows exponentially with the number… ▽ More Object rearrangement is important for many applications but remains challenging, especially in confined spaces, such as shelves, where objects cannot be accessed from above and they block reachability to each other. Such constraints require many motion planning and collision checking calls, which are computationally expensive. In addition, the arrangement space grows exponentially with the number of objects. To address these issues, this work introduces a lazy evaluation framework with a local monotone solver and a global planner. Monotone instances are those that can be solved by moving each object at most once. A key insight is that reachability constraints at the grasps for objects' starts and goals can quickly reveal dependencies between objects without having to execute expensive motion planning queries. Given that, the local solver builds lazily a search tree that respects these reachability constraints without verifying that the arm paths are collision free. It only collision checks when a promising solution is found. If a monotone solution is not found, the non-monotone planner loads the lazy search tree and explores ways to move objects to intermediate locations from where monotone solutions to the goal can be found. Results show that the proposed framework can solve difficult instances in confined spaces with up to 16 objects, which state-of-the-art methods fail to solve. It also solves problems faster than alternatives, when the alternatives find a solution. It also achieves high-quality solutions, i.e., only 1.8 additional actions on average are needed for non-monotone instances. △ Less

Submitted 12 October, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted to the 32nd International Conference on Automated Planning and Scheduling (ICAPS 2022)

arXiv:2203.02721 [pdf, other]

Consistent Representation Learning for Continual Relation Extraction

Authors: Kang Zhao, Hua Xu, Jiangong Yang, Kai Gao

Abstract: Continual relation extraction (CRE) aims to continuously train a model on data with new relations while avoiding forgetting old ones. Some previous work has proved that storing a few typical samples of old relations and replaying them when learning new relations can effectively avoid forgetting. However, these memory-based methods tend to overfit the memory samples and perform poorly on imbalanced… ▽ More Continual relation extraction (CRE) aims to continuously train a model on data with new relations while avoiding forgetting old ones. Some previous work has proved that storing a few typical samples of old relations and replaying them when learning new relations can effectively avoid forgetting. However, these memory-based methods tend to overfit the memory samples and perform poorly on imbalanced datasets. To solve these challenges, a consistent representation learning method is proposed, which maintains the stability of the relation embedding by adopting contrastive learning and knowledge distillation when replaying memory. Specifically, supervised contrastive learning based on a memory bank is first used to train each new task so that the model can effectively learn the relation representation. Then, contrastive replay is conducted of the samples in memory and makes the model retain the knowledge of historical relations through memory knowledge distillation to prevent the catastrophic forgetting of the old task. The proposed method can better learn consistent representations to alleviate forgetting effectively. Extensive experiments on FewRel and TACRED datasets show that our method significantly outperforms state-of-the-art baselines and yield strong robustness on the imbalanced dataset. △ Less

Submitted 21 May, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

Comments: Accepted to Findings of ACL 2022

arXiv:2202.12248 [pdf, ps, other]

doi 10.1103/PhysRevB.105.205437

High-mobility two-dimensional electron gas in $γ$-Al$_2$O$_3$/SrTiO$_3$ heterostructures

Authors: Xiang-Hong Chen, Zhi-Xin Hu, Kuang-Hong Gao, Zhi-Qing Li

Abstract: The origin of the two-dimensional electron gas (2DEG) in the interface between $γ$-Al$_2$O$_3$ (GAO) and SrTiO$_3$ (STO) (GAO/STO) as well as the reason for the high mobility of the 2DEG is still in debate. In this paper, the electronic structures of [001]-oriented GAO/STO heterostructures with and without oxygen vacancies are investigated by first-principle calculations based on the density funct… ▽ More The origin of the two-dimensional electron gas (2DEG) in the interface between $γ$-Al$_2$O$_3$ (GAO) and SrTiO$_3$ (STO) (GAO/STO) as well as the reason for the high mobility of the 2DEG is still in debate. In this paper, the electronic structures of [001]-oriented GAO/STO heterostructures with and without oxygen vacancies are investigated by first-principle calculations based on the density functional theory. The calculation results show that the necessary condition for the formation of 2DEG is that the GAO/STO heterostructure has the interface composed of Al and TiO$_2$ layers. For the heterostructure without oxygen vacancy on the GAO side, the 2DEG originates from the polar discontinuity near the interface, and there is a critical thickness for the GAO film, below which the 2DEG would not present and the heterostructure exhibits insulator characteristics. For the case that only the GAO film contains oxygen vacancies, the polar discontinuity near the interface disappears, but the 2DEG still exists. In this situation, the critical thickness of the GAO film for 2DEG formation does not exist either. When the GAO film and STO substrate both contain oxygen vacancies, it is found that the 2DEG retains as long as the oxygen vacancies on the STO side are not very close to the interface. The low-temperature mobilities of the 2DEGs in these GAO/STO heterostructures are considered to be governed by the ionized impurity scattering, and $\sim$3 to $\sim$11 times as large as that in LaAlO$_3$/SrTiO$_3$ heterojunction. The high mobility of the 2DEG is mainly due to the small electron effective mass in GAO/STO heterostructure. △ Less

Submitted 27 February, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: 10 pages, 6 figures

arXiv:2202.09457 [pdf]

doi 10.1108/JICV-02-2022-0005

Merging Control Strategies of Connected and Autonomous Vehicles at Freeway On-Ramps: A Comprehensive Review

Authors: Jie Zhu, Said Easa, Kun Gao

Abstract: On-ramp merging areas are typical bottlenecks in the freeway network, since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to various negative impacts on traffic efficiency and safety. The connected and autonomous vehicles (CAVs), with their capabilities of real-time communication and precise motion control, hold a great potential to facilitate ramp… ▽ More On-ramp merging areas are typical bottlenecks in the freeway network, since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to various negative impacts on traffic efficiency and safety. The connected and autonomous vehicles (CAVs), with their capabilities of real-time communication and precise motion control, hold a great potential to facilitate ramp merging operation through enhanced coordination strategies. This paper presents a comprehensive review of the existing ramp merging strategies leveraging CAVs, focusing on the latest trends and developments in the research field. The review comprehensively covers 44 papers recently published in leading transportation journals. Based on the application context, control strategies are categorized into three categories: merging into sing-lane freeways with total CAVs, merging into sing-lane freeways with mixed traffic flows, and merging into multilane freeways. Relevant literature is reviewed regarding the required technologies, control decision level, applied methods, and impacts on traffic performance. More importantly, we identify the existing research gaps and provide insightful discussions on the potential and promising directions for future research based on the review, which facilitates further advancement in this research topic. △ Less

Submitted 22 March, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Showing 151–200 of 373 results for author: Gao, K