Search | arXiv e-print repository

A Deep Learning Approach to Anomaly Detection in High-Frequency Trading Data

Authors: Qiuliuyang Bao, Jiawei Wang, Hao Gong, Yiwei Zhang, Xiaojun Guo, Hanrui Feng

Abstract: This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted… ▽ More This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted attention mechanism of the Transformer, and uses a classifier to identify abnormal events. Experimental results on a real high-frequency dataset containing order book depth, spread, and trading volume show that the proposed method significantly outperforms traditional machine learning (such as decision trees and random forests) and deep learning methods (such as MLP, CNN, RNN, LSTM) in terms of accuracy (0.93), F1-Score (0.91), and AUC-ROC (0.95). Ablation experiments verify the contribution of each component, and the visualization of order book depth and anomaly detection further reveals the effectiveness of the model under complex market dynamics. Despite the false positive problem, the model still provides important support for market supervision. In the future, noise processing can be optimized and extended to other markets to improve generalization and real-time performance. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.19695 [pdf, ps, other]

Cohomology of the differential fundamental group of algebraic curves

Authors: Vo Quoc Bao, Phung Ho Hai, Dao Van Thinh

Abstract: Let X be a smooth projective curve over a field k of characteristic zero. The differential fundamental group of X is defined as the Tannakian dual to the category of vector bundles with (integrable) connections on X. This work investigates the relationship between the de Rham cohomology of a vector bundle with connection and the group cohomology of the corresponding representation of the different… ▽ More Let X be a smooth projective curve over a field k of characteristic zero. The differential fundamental group of X is defined as the Tannakian dual to the category of vector bundles with (integrable) connections on X. This work investigates the relationship between the de Rham cohomology of a vector bundle with connection and the group cohomology of the corresponding representation of the differential fundamental group of X . Consequently, we obtain some vanishing and non-vanishing results for the group cohomology. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 12 pages, 0 figures, submitted to Bulletin des sciences mathématiques

MSC Class: 14F40; 14F43; 14L15; 18G15; 18G40; 18M25

arXiv:2503.06396 [pdf, other]

Optimizing Minimum Vertex Cover Solving via a GCN-assisted Heuristic Algorithm

Authors: Enqiang Zhu, Qiqi Bao, Yu Zhang, Chanjuan Liu

Abstract: The problem of finding a minimum vertex cover (MVC) in a graph is a well-known NP-hard problem with significant practical applications in optimization and scheduling. Its complexity, combined with the increasing scale of problems, underscores the need for efficient and effective algorithms. However, existing heuristic algorithms for MVC often rely on simplistic initialization strategies and overlo… ▽ More The problem of finding a minimum vertex cover (MVC) in a graph is a well-known NP-hard problem with significant practical applications in optimization and scheduling. Its complexity, combined with the increasing scale of problems, underscores the need for efficient and effective algorithms. However, existing heuristic algorithms for MVC often rely on simplistic initialization strategies and overlook the impact of edge attributes and neighborhood information on vertex selection. In this paper, we introduce GCNIVC, a novel heuristic search algorithm designed to address the limitations of existing methods for solving MVC problems in large-scale graphs. Our approach features two main innovations. First, it utilizes a Graph Convolutional Network (GCN) to capture the global structure of graphs, which enables the generation of high-quality initial solutions that enhance the efficiency of the subsequent search process. Second, GCNIVC introduces a new heuristic that employs three containers and the concept of double-covered edges (dc-edges), improving search efficiency and providing greater flexibility for adding and removing operations based on edge attributes. Through extensive experiments on benchmark datasets, we demonstrate that GCNIVC outperforms state-of-the-art MVC algorithms in terms of both accuracy and efficiency. Our results highlight the effectiveness of GCNIVC's GCN-assisted initialization and its edge-informed search strategy. This study not only advances the understanding of MVC problem-solving but also contributes a new tool for addressing large-scale graph optimization challenges. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.03755 [pdf]

Research on evolution and early warning model of network public opinion based on online Latent Dirichlet distribution model and BP neural network

Authors: Qiaozhi Bao, Yanlin Chen, Xusheng Ji

Abstract: Online public opinion is increasingly becoming a significant factor affecting the stability of the internet and society, particularly as the frequency of online public opinion crises has risen in recent years. Enhancing the capability for early warning of online public opinion crises is urgent. The most effective approach is to identify potential crises in their early stages and implement correspo… ▽ More Online public opinion is increasingly becoming a significant factor affecting the stability of the internet and society, particularly as the frequency of online public opinion crises has risen in recent years. Enhancing the capability for early warning of online public opinion crises is urgent. The most effective approach is to identify potential crises in their early stages and implement corresponding management measures. This study establishes a preliminary indicator system for online public opinion early warning, based on the principles of indicator system construction and the characteristics and evolution patterns of online public opinion. Subsequently, data-driven methodologies were employed to collect and preprocess public opinion indicator data. Utilizing grey relational analysis and the K-Means clustering algorithm, we classified online public opinion events into three levels: slight, warning, and severe. Furthermore, we constructed an online topic evolution model using the online Hierarchical Dirichlet Process model to analyze the thematic changes of online public opinion events across different warning levels. Finally, we developed an online public opinion early warning model using a Backpropagation (BP) neural network. The test results of early warning samples show that the model achieves high accuracy. Thus, in practical early warning applications, the BP neural network can be effectively utilized for predicting online public opinion events. △ Less

Submitted 16 February, 2025; originally announced March 2025.

arXiv:2503.03702 [pdf, other]

Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models

Authors: Jiyue Jiang, Alfred Kar Yin Truong, Yanyu Chen, Qinghang Bao, Sheng Wang, Pengan Chen, Jiuming Wang, Lingpeng Kong, Yu Li, Chuan Wu

Abstract: High-quality data resources play a crucial role in learning large language models (LLMs), particularly for low-resource languages like Cantonese. Despite having more than 85 million native speakers, Cantonese is still considered a low-resource language in the field of natural language processing (NLP) due to factors such as the dominance of Mandarin, lack of cohesion within the Cantonese-speaking… ▽ More High-quality data resources play a crucial role in learning large language models (LLMs), particularly for low-resource languages like Cantonese. Despite having more than 85 million native speakers, Cantonese is still considered a low-resource language in the field of natural language processing (NLP) due to factors such as the dominance of Mandarin, lack of cohesion within the Cantonese-speaking community, diversity in character encoding and input methods, and the tendency of overseas Cantonese speakers to prefer using English. In addition, rich colloquial vocabulary of Cantonese, English loanwords, and code-switching characteristics add to the complexity of corpus collection and processing. To address these challenges, we collect Cantonese texts from a variety of sources, including open source corpora, Hong Kong-specific forums, Wikipedia, and Common Crawl data. We conduct rigorous data processing through language filtering, quality filtering, content filtering, and de-duplication steps, successfully constructing a high-quality Cantonese corpus of over 2 billion tokens for training large language models. We further refined the model through supervised fine-tuning (SFT) on curated Cantonese tasks, enhancing its ability to handle specific applications. Upon completion of the training, the model achieves state-of-the-art (SOTA) performance on four Cantonese benchmarks. After training on our dataset, the model also exhibits improved performance on other mainstream language tasks. △ Less

Submitted 5 March, 2025; originally announced March 2025.

arXiv:2502.09308 [pdf]

Natural van der Waals canalization lens for non-destructive nanoelectronic circuit imaging and inspection

Authors: Qingdong Ou, Shuwen Xue, Weiliang Ma, Jiong Yang, Guangyuan Si, Lu Liu, Gang Zhong, Jingying Liu, Zongyuan Xie, Ying Xiao, Kourosh Kalantar-Zadeh, Xiang Qi, Peining Li, Zhigao Dai, Huanyang Chen, Qiaoliang Bao

Abstract: Optical inspection has long served as a cornerstone non-destructive method in semiconductor wafer manufacturing, particularly for surface and defect analysis. However, conventional techniques such as bright-field and dark-field scattering optics face significant limitations, including insufficient resolution and the inability to penetrate and detect buried structures. Atomic force microscopy (AFM)… ▽ More Optical inspection has long served as a cornerstone non-destructive method in semiconductor wafer manufacturing, particularly for surface and defect analysis. However, conventional techniques such as bright-field and dark-field scattering optics face significant limitations, including insufficient resolution and the inability to penetrate and detect buried structures. Atomic force microscopy (AFM), while offering higher resolution and precise surface characterization, is constrained by slow speed, limited to surface-level imaging, and incapable of resolving subsurface features. Here, we propose an approach that integrates the strengths of dark-field scattering optics and AFM by leveraging a van der Waals (vdW) canalization lens based on natural biaxial α-MoO3 crystals. This method enables ultrahigh-resolution subwavelength imaging with the ability to visualize both surface and buried structures, achieving a spatial resolution of 15 nm and grating pitch detection down to 100 nm. The underlying mechanism relies on the unique anisotropic properties of α-MoO3, where its atomic-scale unit cells and biaxial symmetry facilitate the diffraction-free propagation of both evanescent and propagating waves via a flat-band canalization regime. Unlike metamaterial-based superlenses and hyperlenses, which suffer from high plasmonic losses, fabrication imperfections, and uniaxial constraints, α-MoO3 provides robust and aberration-free imaging in multiple directions. We successfully applied this approach to high-resolution inspection of buried nanoscale electronic circuits, offering unprecedented capabilities essential for next-generation semiconductor manufacturing. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2501.08705 [pdf]

doi 10.1038/s41563-020-0665-0

Broad Spectral Tuning of Ultra-Low Loss Polaritons in a van der Waals Crystal by Intercalation

Authors: Javier Taboada-Gutiérrez, Gonzalo Álvarez-Pérez, Jiahua Duan, Weiliang Ma, Kyle Crowley, Iván Prieto, Andrei Bylinkin, Marta Autore, Halyna Volkova, Kenta Kimura, Tsuyoshi Kimura, M. -H. Berger, Shaojuan Li, Qiaoliang Bao, Xuan P. A. Gao, Ion Errea, Alexey Nikitin, Rainer Hillenbrand, Javier Martín-Sánchez, Pablo Alonso-González

Abstract: Phonon polaritons (PhPs) -- light coupled to lattice vibrations -- in polar van der Waals (vdW) crystals are promising candidates for controlling the flow of energy at the nanoscale due to their strong field confinement, anisotropic propagation, and ultra-long lifetime in the picosecond range \cite{ref1,ref2,ref3,ref4,ref5}. However, the lack of tunability in their narrow and material-specific spe… ▽ More Phonon polaritons (PhPs) -- light coupled to lattice vibrations -- in polar van der Waals (vdW) crystals are promising candidates for controlling the flow of energy at the nanoscale due to their strong field confinement, anisotropic propagation, and ultra-long lifetime in the picosecond range \cite{ref1,ref2,ref3,ref4,ref5}. However, the lack of tunability in their narrow and material-specific spectral range -- the Reststrahlen Band (RB) -- severely limits their technological implementation. Here, we demonstrate that the intercalation of Na atoms in the vdW semiconductor $α$-V$_2$O$_5$ enables a broad spectral shift of RBs, and that the PhPs excited exhibit ultra-low losses (lifetime of $4 \pm 1$~ps), similar to PhPs in the non-intercalated crystal (lifetime of $6 \pm 1$ ps). We expect our intercalation method to be applicable to other vdW crystals, opening the door for the use of PhPs in broad spectral bands in the mid-infrared domain. △ Less

Submitted 15 January, 2025; originally announced January 2025.

Comments: 15 pages, 5 figures

Journal ref: Nature Materials 19, 964-968 (2020)

arXiv:2501.07849 [pdf, ps, other]

The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation

Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Qingshuang Bao, Weipeng Jiang, Qian Wang, Chao Shen, Yang Liu

Abstract: Large Language Models (LLMs) have emerged as the new recommendation engines, surpassing traditional methods in both capability and scope, particularly in code generation. In this paper, we reveal a novel provider bias in LLMs: without explicit directives, these models show systematic preferences for services from specific providers in their recommendations (e.g., favoring Google Cloud over Microso… ▽ More Large Language Models (LLMs) have emerged as the new recommendation engines, surpassing traditional methods in both capability and scope, particularly in code generation. In this paper, we reveal a novel provider bias in LLMs: without explicit directives, these models show systematic preferences for services from specific providers in their recommendations (e.g., favoring Google Cloud over Microsoft Azure). To systematically investigate this bias, we develop an automated pipeline to construct the dataset, incorporating 6 distinct coding task categories and 30 real-world application scenarios. Leveraging this dataset, we conduct the first comprehensive empirical study of provider bias in LLM code generation across seven state-of-the-art LLMs, utilizing approximately 500 million tokens (equivalent to $5,000+ in computational costs). Our findings reveal that LLMs exhibit significant provider preferences, predominantly favoring services from Google and Amazon, and can autonomously modify input code to incorporate their preferred providers without users' requests. Such a bias holds far-reaching implications for market dynamics and societal equilibrium, potentially contributing to digital monopolies. It may also deceive users and violate their expectations, leading to various consequences. We call on the academic community to recognize this emerging issue and develop effective evaluation and mitigation methods to uphold AI security and fairness. △ Less

Submitted 3 June, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

Comments: 27 pages, 13 figures

arXiv:2501.04247 [pdf, other]

TransientVerse: A Comprehensive Real-Time Alert and Multi-Wavelength Analysis System for Transient Astronomical Events

Authors: Jian-Hua Fang, Di Li, Pei Wang, Hua-Xi Chen, Han Wang, Deng-Ke Zhou, Qin-Ping Bao, Hai-Yan Li, Jing-Jing Hu, Jin-Tao Xie, Xiao-Dong Ge, Yi Feng, Dong-Hui Quan, Zhi-Xuan Kang, Xue-Rong Guo, Chen-Wu Jin, Zhi-Lin Wang, Jia-Ying Xu, Chen-Chen Miao, Ru-Shuang Zhao, Chen-Hui Niu

Abstract: Transient astrophysical events are characterized by short timescales, high energy, and multi-wavelength radiation, often accompanied by violent energy releases. These phenomena are a major focus of modern astronomical research. To reveal their underlying physical mechanisms, near-real-time, multi-wavelength, and multi-messenger follow-up observations are essential. However, current transient alert… ▽ More Transient astrophysical events are characterized by short timescales, high energy, and multi-wavelength radiation, often accompanied by violent energy releases. These phenomena are a major focus of modern astronomical research. To reveal their underlying physical mechanisms, near-real-time, multi-wavelength, and multi-messenger follow-up observations are essential. However, current transient alert systems face multiple challenges, including fragmented messages, inconsistent formats, and difficulties in retrospective analysis, all of which hinder the efficiency of triggering observations. This paper presents \textbf{TransientVerse}, an innovative real-time database platform to integrate and disseminate transient alerts. The platform uses an automated pipeline to integrate real-time alerts from multiple sources (e.g., ATel, VOEvent, and GCN). It structures unstructured text data into a dual-format database for transient alerts by using open-source large language models. TransientVerse offers retrospective searches, data visualization, literature reviews, and customized subscriptions for efficient event tracking and analysis. Additionally, for Fast Radio Bursts (FRBs), the platform provides real-time statistics on repeat burst rates across different time intervals and alerts astronomers about high-frequency burst sources, enabling rapid follow-up observations and optimizing the use of limited observation windows. TransientVerse improves the efficiency of acquiring transient events in real time, lowers the technical barriers for simultaneous observations, and provides robust technical support for multi-wavelength, multi-messenger time-domain astronomy and astrophysics studies. △ Less

Submitted 12 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

arXiv:2412.07137 [pdf, ps, other]

The ball-covering property of non-commutative spaces of operators on Banach spaces

Authors: Qiyao Bao, Rui Liu, Jie Shen

Abstract: A Banach space is said to have the ball-covering property (BCP) if its unit sphere can be covered by countably many closed or open balls off the origin. Let $X$ be a Banach space with a shrinking $1$-unconditional basis. In this paper, by constructing an equivalent norm on $B(X)$, we prove that the quotient Banach algebra $B(X)/K(X)$ fails the BCP. In particular, the result implies that the Calkin… ▽ More A Banach space is said to have the ball-covering property (BCP) if its unit sphere can be covered by countably many closed or open balls off the origin. Let $X$ be a Banach space with a shrinking $1$-unconditional basis. In this paper, by constructing an equivalent norm on $B(X)$, we prove that the quotient Banach algebra $B(X)/K(X)$ fails the BCP. In particular, the result implies that the Calkin algebra $B(H)/ K(H)$, $B(\ell^p)/K(\ell^p)$ ($1 \leq p <\infty$) and $B(c_0)/K(c_0)$ all fail the BCP. We also show that $B(L^p[0,1])$ has the uniform ball-covering property (UBCP) for $3/2< p < 3$. △ Less

Submitted 9 December, 2024; originally announced December 2024.

MSC Class: 46B20; 46B15; 46B28

arXiv:2410.10874 [pdf]

Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content

Authors: Feiyang Wang, Qiaozhi Bao, Zixuan Wang, Yanlin Chen

Abstract: This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during… ▽ More This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during the training process, the accuracy of the model gradually increased from 49.27% to 82.83%, while the loss value decreased from 0.67 to 0.35, indicating a significant improvement in the performance of the model on the training set. According to the confusion matrix analysis of the training set, the accuracy of the training set is 86.15%. The confusion matrix of the test set also showed good performance, with an accuracy of 82.91%. The accuracy difference between the training set and the test set is only 3.24%, indicating that the model has strong generalization ability. In addition, the evaluation of polygon results shows that the model performs well in classification accuracy, sensitivity, specificity, and area under the curve (AUC), with a Kappa coefficient of 0.66 and an F-measure of 0.80, further verifying the effectiveness of the model in social media sentiment analysis. The improved model proposed in this article not only improves the accuracy of sentiment recognition in employment related texts on social media, but also has important practical significance. This social media based data analysis method can not only capture social dynamics in a timely manner, but also promote decision-makers to pay attention to public concerns and provide data support for improving employment conditions. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 5 pages, 5 figures

arXiv:2410.01459 [pdf]

A Smart Chair for Health Monitoring in Daily Life

Authors: Nguyen Thi Minh Huong, Vo Quoc Bao, Nguyen Trung Hau, Huynh Quang Linh

Abstract: Recent research has focused on the risks associated with poor sitting posture and the impact of sitting on biological parameters, such as heart rate because prolonged sitting is common across all ages and professions. In this work, we propose a novel approach that can display simultaneously posture and heart rate in real-time. In this device, pressure sensors are embedded into a flexible separate… ▽ More Recent research has focused on the risks associated with poor sitting posture and the impact of sitting on biological parameters, such as heart rate because prolonged sitting is common across all ages and professions. In this work, we propose a novel approach that can display simultaneously posture and heart rate in real-time. In this device, pressure sensors are embedded into a flexible separate cushion easily put on any chair to provide sitting behaviours and a smartwatch-like PPG module is worn on the user's wrist. Regarding posture classification, pressure figures of ten pressure sensors under the seat bottom are inputs of four machine learning models, giving a high accuracy of 99 per cent. Besides, the Electrocardiography recording module is illustrated with the same results as a commercial device called DFRobot. Another advantage of this smart chair is that it not only simultaneously displays both sitting postures and heart rates on external devices like laptops, mobile phones, or televisions through microcontrollers but also offers the relationship between them to help people adjust their sitting behaviours, avoiding influencing heart rate. The smart chair is expected to be useful equipment for people with a sedentary lifestyle, especially office workers. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.17778 [pdf, other]

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

Authors: Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zicheng Liu, Zhongdao Wang, Emad Barsoum

Abstract: Diffusion-based image super-resolution (SR) models have attracted substantial interest due to their powerful image restoration capabilities. However, prevailing diffusion models often struggle to strike an optimal balance between efficiency and performance. Typically, they either neglect to exploit the potential of existing extensive pretrained models, limiting their generative capacity, or they n… ▽ More Diffusion-based image super-resolution (SR) models have attracted substantial interest due to their powerful image restoration capabilities. However, prevailing diffusion models often struggle to strike an optimal balance between efficiency and performance. Typically, they either neglect to exploit the potential of existing extensive pretrained models, limiting their generative capacity, or they necessitate a dozens of forward passes starting from random noises, compromising inference efficiency. In this paper, we present DoSSR, a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images. At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models. This integration not only improves the use of diffusion prior but also boosts inference efficiency. Moreover, we advance our method by transitioning the discrete shift process to a continuous formulation, termed as DoS-SDEs. This advancement leads to the fast and customized solvers that further enhance sampling efficiency. Empirical results demonstrate that our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps. Compared to previous diffusion prior based methods, our approach achieves a remarkable speedup of 5-7 times, demonstrating its superior efficiency. Code: https://github.com/QinpengCui/DoSSR. △ Less

Submitted 10 December, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: This paper is accepted by NeurIPS 2024

arXiv:2409.08588 [pdf]

Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism

Authors: Zixuan Wang, Yanlin Chen, Feiyang Wang, Qiaozhi Bao

Abstract: In this paper, we propose an improved Unet model for brain tumor image segmentation, which combines coordinate attention mechanism and ASPP module to improve the segmentation effect. After the data set is divided, we do the necessary preprocessing to the image and use the improved model to experiment. First, we trained and validated the traditional Unet model. By analyzing the loss curve of the tr… ▽ More In this paper, we propose an improved Unet model for brain tumor image segmentation, which combines coordinate attention mechanism and ASPP module to improve the segmentation effect. After the data set is divided, we do the necessary preprocessing to the image and use the improved model to experiment. First, we trained and validated the traditional Unet model. By analyzing the loss curve of the training set and the validation set, we can see that the loss value continues to decline at the first epoch and becomes stable at the eighth epoch. This process shows that the model constantly optimizes its parameters to improve performance. At the same time, the change in the miou (mean Intersection over Union) index shows that the miou value exceeded 0.6 at the 15th epoch, remained above 0.6 thereafter, and reached above 0.7 at the 46th epoch. These results indicate that the basic Unet model is effective in brain tumor image segmentation. Next, we introduce an improved Unet algorithm based on coordinate attention mechanism and ASPP module for experiments. By observing the loss change curves of the training set and the verification set, it is found that the loss value reaches the lowest point at the sixth epoch and then remains relatively stable. At the same time, the miou indicator has stabilized above 0.7 since the 20th epoch and has reached a maximum of 0.76. These results show that the new mechanism introduced significantly improves the segmentation ability of the model. Finally, we apply the trained traditional Unet model and the improved Unet model based on the coordinate attention mechanism and ASPP module to the test set for brain tumor image segmentation prediction. Compared to the traditional Unet, the enhanced model offers superior segmentation and edge accuracy, providing a more reliable method for medical image analysis with the coordinate attention mechanism and ASPP module. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 5 pages, 8 figures, accepted by ICBASE 2024

arXiv:2409.02119 [pdf, other]

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

Authors: Xiaojun Xiao, Sen Shen, Qiming Bao, Hongfei Rong, Kairui Liu, Zhongsheng Wang, Jiamou Liu

Abstract: In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in Lo… ▽ More In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in LoRA might be focused on its fine-tuning methodologies, with not as much exploration as might be expected into further compression of LoRA. Since most of LoRA's parameters might still be superfluous, this may lead to unnecessary wastage of computational resources. In this paper, we propose \textbf{CoRA}: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our two-fold method includes (1) Freezing the substitute matrix $B$ to halve parameters while training matrix $A$ for specific tasks and (2) Using the substitute matrix $B$ as an enhanced initial state for the original matrix $B$, achieving improved results with the same parameters. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters. At the same time, the second approach has some improvements compared to LoRA's original fine-tuning performance. They generally attest to the effectiveness of our work. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.16756 [pdf, other]

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

Authors: Jiyue Jiang, Pengan Chen, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong-Macau Greater Bay Area, and in substantial Cantonese-speaking populations in places like Singapore and North America. Despite its wide use, Cantonese has scant representation in NLP research, especially compared to other languages from similarly developed regions. To bridge these gaps, we outline current Cantonese NLP methods and introduce new benchmarks designed to evaluate LLM performance in factual generation, mathematical logic, complex reasoning, and general knowledge in Cantonese, which aim to advance open-source Cantonese LLM technology. We also propose future research directions and recommended models to enhance Cantonese LLM development. △ Less

Submitted 17 February, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted by NAACL 2025

arXiv:2408.02922 [pdf, other]

Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network

Authors: Xinyi Zhang, Qiqi Bao, Qinpeng Cui, Wenming Yang, Qingmin Liao

Abstract: Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modelin… ▽ More Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modeling. Nonetheless, Mamba still faces challenges in precisely exploiting local dependencies between joints. To address these issues, we propose a new attention-free hybrid spatiotemporal architecture named Hybrid Mamba-GCN (Pose Magic). This architecture introduces local enhancement with GCN by capturing relationships between neighboring joints, thus producing new representations to complement Mamba's outputs. By adaptively fusing representations from Mamba and GCN, Pose Magic demonstrates superior capability in learning the underlying 3D structure. To meet the requirements of real-time inference, we also provide a fully causal version. Extensive experiments show that Pose Magic achieves new SOTA results ($\downarrow 0.9 mm$) while saving $74.1\%$ FLOPs. In addition, Pose Magic exhibits optimal motion consistency and the ability to generalize to unseen sequence lengths. △ Less

Submitted 25 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: This work has been accepted by AAAI 2025

arXiv:2407.16341 [pdf, other]

Motion Capture from Inertial and Vision Sensors

Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 17 pages,9 figures

arXiv:2407.10162 [pdf, other]

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Authors: Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang

Abstract: Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at L… ▽ More Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at LLM reasoning tasks that can enhance the performance of LLMs in multi-step deductive reasoning tasks by integrating logic programming. In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage. We propose a novel method of converting logic problems into symbolic integration with an inference engine. This approach leverages large language models' situational understanding and imitation skills and uses symbolic memory to enhance multi-step deductive reasoning capabilities. Our results show that the ChatLogic framework significantly improves the multi-step reasoning capabilities of LLMs. The source code and data are available at \url{https://github.com/Strong-AI-Lab/ChatLogic} △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 8 pages, 3 figures. This paper has been accepted by WCCI IJCNN 2024

arXiv:2407.09521 [pdf, other]

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

Authors: Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang

Abstract: We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conven… ▽ More We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e.g., event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods. △ Less

Submitted 20 June, 2024; originally announced July 2024.

Comments: Accepted by IJCAI 2024

arXiv:2407.02247 [pdf]

doi 10.1002/adom.202401169

Hypermultiplexed off-chip hologram by on-chip integrated metasurface

Authors: Xianjin Liu, Zhanying Ma, Dasen Zhang, Qiwen Bao, Zhenzhen Liu, Jun-Jun Xiao

Abstract: The waveguide-integrated metasurface introduces a novel photonic chip capable of converting guided modes into free-space light. This enables functions such as off-chip beam focusing, steering, and imaging. The challenge lies in achieving hypermultiplexing across diverse parameters, including guided-wave mode type, direction, polarization, and notably, multiple wavelengths. Here, we introduce a com… ▽ More The waveguide-integrated metasurface introduces a novel photonic chip capable of converting guided modes into free-space light. This enables functions such as off-chip beam focusing, steering, and imaging. The challenge lies in achieving hypermultiplexing across diverse parameters, including guided-wave mode type, direction, polarization, and notably, multiple wavelengths. Here, we introduce a comprehensive end-to-end inverse design framework, rooted in a physical model, for the multifunctional design of on-chip metasurfaces. This framework allows for metasurface optimization through a target-field-driven iteration process. We demonstrate a hypermultiplexed on-chip metasurface capable of generating red-green-blue holograms at multiple target planes, with both independent and cooperative control over guided-wave direction. Significantly, the proposed method streamlines the design process utilizing only the positions of meta-atoms as the design variable. We demonstrate 9 independent holographic channels through a combination of wavelength and distance multiplexing. Moreover, by incorporating the excitation direction into the design, the metasurface produces a total of 36 distinct holograms. The robustness of these results against fabrication discrepancies is validated through 3D full-wave electromagnetic simulations, aligning well with advanced manufacturing techniques. Our research presents a universal design framework for the development of multifunctional on-chip metasurfaces, opening up new avenues for a wide range of applications. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.06923 [pdf]

Machine learning disentangles bias causes of shortwave cloud radiative effect in a climate model

Authors: Hongtao Yang, Guoxing Chen, Wei-Chyung Wang, Qing Bao, Jiandong Li

Abstract: Large bias exists in shortwave cloud radiative effect (SWCRE) of general circulation models (GCMs), attributed mainly to the combined effect of cloud fraction and water contents, whose representations in models remain challenging. Here we show an effective machine-learning approach to dissect the individual bias of relevant cloud parameters determining SWCRE. A surrogate model for calculating SWCR… ▽ More Large bias exists in shortwave cloud radiative effect (SWCRE) of general circulation models (GCMs), attributed mainly to the combined effect of cloud fraction and water contents, whose representations in models remain challenging. Here we show an effective machine-learning approach to dissect the individual bias of relevant cloud parameters determining SWCRE. A surrogate model for calculating SWCRE was developed based on random forest using observations and FGOALS-f3-L simulation data of cloud fraction (CFR), cloud-solar concurrence ratio (CSC), cloud liquid and ice water paths (LWP and IWP), TOA upward clear-sky solar flux (SUC), and solar zenith angle. The model, which achieves high determination coefficient > 0.96 in the validation phase, was then used to quantify SWCRE bias associated with these parameters following the partial radiation perturbation method. The global-mean SWCRE bias (in W m-2) is contributed by CFR (+5.11), LWP (-6.58), IWP (-1.67), and CSC (+4.38), while SUC plays a minor role; the large CSC contribution highlights the importance of cloud diurnal variation. Regionally, the relative importance varies according to climate regimes. In Tropics, overestimated LWP and IWP exist over lands, while oceans exhibit underestimated CFR and CSC. In contrast, the extratropical lands and oceans have, respectively, too-small CSC and the 'too few, too bright' low-level clouds. We thus suggest that machine learning, in addition for developing GCM physical parameterizations, can also be utilized for diagnosing and understanding complex cloud-climate interactions. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 19 pages,8 figures

arXiv:2404.08831 [pdf, other]

Structured Model Pruning for Efficient Inference in Computational Pathology

Authors: Mohammed Adnan, Qinle Ba, Nazim Shaikh, Shivam Kalra, Satarupa Mukherjee, Auranuch Lorsakul

Abstract: Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to devel… ▽ More Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to develop efficient models, especially for deploying AI solutions under resource-constrains or with time sensitivity. One potential solution is to perform model compression, a set of techniques that remove less important model components or reduce parameter precision, to reduce model computation demand. In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance. To this end, we develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging, with which we evaluate multiple pruning heuristics on nuclei instance segmentation and classification, and empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.07905 [pdf]

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

Authors: Zheng Xu, Yulu Gong, Yanlin Zhou, Qiaozhi Bao, Wenpin Qian

Abstract: With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper pro… ▽ More With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper proposes an automatic task scheduling scheme based on deep learning and reinforcement learning. Firstly, the deep learning technology is used to monitor and predict the parameters in the cloud computing system in real time to obtain the system status information. Then, combined with reinforcement learning algorithm, the task scheduling strategy is dynamically adjusted according to the real-time system state and task characteristics to achieve the optimal utilization of system resources and the maximum of task execution efficiency. This paper verifies the effectiveness and performance advantages of the proposed scheme in experiments, and proves the potential and application prospect of deep learning and reinforcement learning in automatic task scheduling in large-scale cloud computing systems. △ Less

Submitted 26 February, 2024; originally announced March 2024.

arXiv:2401.12173 [pdf, other]

Waveform-Domain Complementary Signal Sets for Interrupted Sampling Repeater Jamming Suppression

Authors: Hanning Su, Qinglong Bao, Jiameng Pan, Fucheng Guo, Weidong Hu

Abstract: The interrupted-sampling repeater jamming (ISRJ) is coherent and has the characteristic of suppression and deception to degrade the radar detection capabilities. The study focuses on anti-ISRJ techniques in the waveform domain, primarily capitalizing on waveform design and and anti-jamming signal processing methods in the waveform domain. By exploring the relationship between waveform-domain adapt… ▽ More The interrupted-sampling repeater jamming (ISRJ) is coherent and has the characteristic of suppression and deception to degrade the radar detection capabilities. The study focuses on anti-ISRJ techniques in the waveform domain, primarily capitalizing on waveform design and and anti-jamming signal processing methods in the waveform domain. By exploring the relationship between waveform-domain adaptive matched filtering (WD-AMF) output and waveform-domain signals, we demonstrate that ISRJ can be effectively suppressed when the transmitted waveform exhibits waveform-domain complementarity. We introduce a phase-coded (PC) waveform set with waveform-domain complementarity and propose a method for generating such waveform sets of arbitrary code lengths. The performance of WD-AMF are further developed due to the designed waveforms, and simulations affirm the superior adaptive anti-jamming capabilities of the designed waveforms compared to traditional ones. Remarkably, this improved performance is achieved without the need for prior knowledge of ISRJ interference parameters at either the transmitter or receiver stages. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.01078 [pdf, other]

Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

Authors: Triet Minh Huynh, Quan Le Bao

Abstract: Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacio… ▽ More Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content. △ Less

Submitted 4 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2310.09430 [pdf, ps, other]

Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning

Authors: Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu

Abstract: Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets name… ▽ More Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus" that extend standard logical reasoning datasets to evaluate the robustness of the LLM's reasoning. For each, we create three subsets: the first with randomly shuffled options, the second with the correct choices replaced by "none of the other options is correct", and the third with a combination of shuffling and substitution. Experiments on these datasets show that these simple augmentations greatly hinder the models' performance. Despite their high performance on the original publicly available datasets, we find that all models perform poorly on these newly constructed datasets. We also demonstrate that introducing task variations into the training set can markedly improve the model's performance on both the original and our developed datasets. Finally, we show that applying logic-driven data augmentation for fine-tuning and prompting can enhance generalisation in both discriminative and generative models, offering a path to improving their robustness for tasks involving logical reasoning. Source code and data are made publicly available at https://github.com/Strong-AI-Lab/Logical-and-abstract-reasoning. △ Less

Submitted 16 January, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: The short version (v3) was accepted for oral presentation at the first LLM@IJCAI 2023 non-archival symposium, and the full version was accepted by ICONIP 2024

arXiv:2309.10444 [pdf, other]

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other stud… ▽ More Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other students understand the solution and promotes a deeper understanding of related concepts. However, it is often difficult for students to craft effective solution explanations, due to limited subject understanding. To help scaffold the task of automated explanation generation, we present and evaluate a framework called "ILearner-LLM", that iteratively enhances the generated explanations for the given questions with large language models. Comprising an explanation generation model and an explanation evaluation model, the framework generates high-quality student-aligned explanations by iteratively feeding the quality rating score from the evaluation model back into the instruction prompt of the explanation generation model. Experimental results demonstrate the effectiveness of our ILearner-LLM on LLaMA2-13B and GPT-4 to generate higher quality explanations that are closer to those written by students on five PeerWise datasets. Our findings represent a promising path to enrich the learnersourcing experience for students and to enhance the capabilities of large language models for educational applications. △ Less

Submitted 16 January, 2025; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: The short version (v4) has been accepted as a non-archival workshop paper at AGI@ICLR 2024, and the full version has been accepted by the main track of AAAI/EAAI 2025

arXiv:2309.06169 [pdf, other]

Elucidating the solution space of extended reverse-time SDE for diffusion models

Authors: Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao

Abstract: Sampling from Diffusion Models can alternatively be seen as solving differential equations, where there is a challenge in balancing speed and image visual quality. ODE-based samplers offer rapid sampling time but reach a performance limit, whereas SDE-based samplers achieve superior quality, albeit with longer iterations. In this work, we formulate the sampling process as an Extended Reverse-Time… ▽ More Sampling from Diffusion Models can alternatively be seen as solving differential equations, where there is a challenge in balancing speed and image visual quality. ODE-based samplers offer rapid sampling time but reach a performance limit, whereas SDE-based samplers achieve superior quality, albeit with longer iterations. In this work, we formulate the sampling process as an Extended Reverse-Time SDE (ER SDE), unifying prior explorations into ODEs and SDEs. Theoretically, leveraging the semi-linear structure of ER SDE solutions, we offer exact solutions and approximate solutions for VP SDE and VE SDE, respectively. Based on the approximate solution space of the ER SDE, referred to as one-step prediction errors, we yield mathematical insights elucidating the rapid sampling capability of ODE solvers and the high-quality sampling ability of SDE solvers. Additionally, we unveil that VP SDE solvers stand on par with their VE SDE counterparts. Based on these findings, leveraging the dual advantages of ODE solvers and SDE solvers, we devise efficient high-quality samplers, namely ER-SDE-Solvers. Experimental results demonstrate that ER-SDE-Solvers achieve state-of-the-art performance across all stochastic samplers while maintaining efficiency of deterministic samplers. Specifically, on the ImageNet $128\times128$ dataset, ER-SDE-Solvers obtain 8.33 FID in only 20 function evaluations. Code is available at \href{https://github.com/QinpengCui/ER-SDE-Solver}{https://github.com/QinpengCui/ER-SDE-Solver} △ Less

Submitted 27 February, 2025; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: This paper has been accepted by WACV 2025 (Oral). The official version lacked proper attribution to the co-authors, and this version has been updated accordingly

arXiv:2308.14676 [pdf, other]

Fast generation of Schrödinger cat states in a Kerr-tunable superconducting resonator

Authors: X. L. He, Yong Lu, D. Q. Bao, Hang Xue, W. B. Jiang, Zhen Wang, A. F. Roudsari, Per Delsing, J. S. Tsai, Z. R. Lin

Abstract: Schrödinger cat states, quantum superpositions of macroscopically distinct classical states, are an important resource for quantum communication, quantum metrology and quantum computation. Especially, cat states in a phase space protected against phase-flip errors can be used as a logical qubit. However, cat states, normally generated in three-dimensional cavities, are facing the challenges of sca… ▽ More Schrödinger cat states, quantum superpositions of macroscopically distinct classical states, are an important resource for quantum communication, quantum metrology and quantum computation. Especially, cat states in a phase space protected against phase-flip errors can be used as a logical qubit. However, cat states, normally generated in three-dimensional cavities, are facing the challenges of scalability and controllability. Here, we present a novel strategy to generate and store cat states in a coplanar superconducting circuit by the fast modulation of Kerr nonlinearity. At the Kerr-free work point, our cat states are passively preserved due to the vanishing Kerr effect. We are able to prepare a 2-component cat state in our chip-based device with a fidelity reaching 89.1% under a 96 ns gate time. Our scheme shows an excellent route to constructing a chip-based bosonic quantum processor. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 15 pages,12 figures

arXiv:2307.16629 [pdf]

Reliable Synthesis of Large-Area Monolayer WS2 Single Crystals, Films, and Heterostructures with Extraordinary Photoluminescence Induced by Water Intercalation

Authors: Qianhui Zhang, Jianfeng Lu, Ziyu Wang, Zhigao Dai, Yupeng Zhang, Fuzhi Huang, Qiaoliang Bao, Wenhui Duan, Michael S. Fuhrer, Changxi Zheng

Abstract: Two-dimensional (2D) transition metal dichalcogenides (TMDs) hold great potential for future low-energy optoelectronics owing to their unique electronic, optical, and mechanical properties. Chemical vapor deposition (CVD) is the technique widely used for the synthesis of large-area TMDs. However, due to high sensitivity to the growth environment, reliable synthesis of monolayer TMDs via CVD remain… ▽ More Two-dimensional (2D) transition metal dichalcogenides (TMDs) hold great potential for future low-energy optoelectronics owing to their unique electronic, optical, and mechanical properties. Chemical vapor deposition (CVD) is the technique widely used for the synthesis of large-area TMDs. However, due to high sensitivity to the growth environment, reliable synthesis of monolayer TMDs via CVD remains challenging. Here we develop a controllable CVD process for large-area synthesis of monolayer WS2 crystals, films, and in-plane graphene-WS2 heterostructures by cleaning the reaction tube with hydrochloric acid, sulfuric acid and aqua regia. The concise cleaning process can remove the residual contaminates attached to the CVD reaction tube and crucibles, reducing the nucleation density but enhancing the diffusion length of WS2 species. The photoluminescence (PL) mappings of a WS2 single crystal and film reveal that the extraordinary PL around the edges of a triangular single crystal is induced by ambient water intercalation at the WS2-sapphire interface. The extraordinary PL can be controlled by the choice of substrates with different wettabilities. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Journal ref: Advanced Optical Materials, 6(12), p.1701347 (2018)

arXiv:2307.03368 [pdf, other]

Waveform-Domain Adaptive Matched Filtering for Suppressing Interrupted-Sampling Repeater Jamming

Authors: Hanning Su, Qinglong Bao, Jiameng Pan, Fucheng Guo, Weidong Hu

Abstract: The inadequate adaptability to flexible interference scenarios remains an unresolved challenge in the majority of techniques utilized for mitigating interrupted-sampling repeater jamming (ISRJ). Matched filtering system based methods is desirable to incorporate anti-ISRJ measures based on prior ISRJ modeling, either preceding or succeeding the matched filtering. Due to the partial matching nature… ▽ More The inadequate adaptability to flexible interference scenarios remains an unresolved challenge in the majority of techniques utilized for mitigating interrupted-sampling repeater jamming (ISRJ). Matched filtering system based methods is desirable to incorporate anti-ISRJ measures based on prior ISRJ modeling, either preceding or succeeding the matched filtering. Due to the partial matching nature of ISRJ, its characteristics are revealed during the process of matched filtering. Therefore, this paper introduces an extended domain called the waveform domain within the matched filtering process. On this domain, an adaptive matched filtering model, known as the waveform-domain adaptive matched filtering (WD-AMF), is established to tackle the problem of ISRJ suppression without relying on a pre-existing ISRJ model. The output of the WD-AMF encompasses an adaptive filtering term and a compensation term. The adaptive filtering term encompasses the adaptive integration outcomes in the waveform domain, which are determined by an adaptive weighted function. This function, akin to a collection of bandpass filters, decomposes the integrated function into multiple components, some of which contain interference while others do not. The compensation term adheres to an integrated guideline for discerning the presence of signal components or noise within the integrated function. The integration results are then concatenated to reconstruct a compensated matched filter signal output. Simulations are conducted to showcase the exceptional capability of the proposed method in suppressing ISRJ in diverse interference scenarios, even in the absence of a pre-existing ISRJ model. △ Less

Submitted 13 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.08253 [pdf, ps, other]

doi 10.1109/TIFS.2023.3279979

Measures and Optimization for Robustness and Vulnerability in Disconnected Networks

Authors: Liwang Zhu, Qi Bao, Zhongzhi Zhang

Abstract: The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing… ▽ More The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing metrics. Using such a measure as an optimization criterion, we propose and study the problem of breaking down a network by attacking some key edges. We show that the objective function of the problem is monotonic but not submodular, which impose more challenging on the problem. We thus resort to greedy algorithms extended for non-submodular functions by iteratively deleting the most promising edges. We first propose a simple greedy algorithm with a proved bound for the approximation ratio and cubic-time complexity. To confront the computation challenge for large networks, we further propose an improved nearly-linear time greedy algorithm, which significantly speeds up the process for edge selection but sacrifices little accuracy. Extensive experimental results for a large set of real-world networks verify the effectiveness and efficiency of our algorithms, demonstrating that our algorithms outperform several baseline schemes. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 13 pages

Journal ref: IEEE Transactions on Information Forensics and Security,pp:3350-3362,2023

arXiv:2306.02850 [pdf, other]

TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments

Authors: Yu Sun, Qian Bao, Wu Liu, Tao Mei, Michael J. Black

Abstract: Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that… ▽ More Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that enables end-to-end reasoning about people in scenes. Our method, called TRACE, introduces several novel architectural components. Most importantly, it uses two new "maps" to reason about the 3D trajectory of people over time in camera, and world, coordinates. An additional memory unit enables persistent tracking of people even during long occlusions. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. By training it end-to-end, and using full image information, TRACE achieves state-of-the-art performance on tracking and HPS benchmarks. The code and dataset are released for research purposes. △ Less

Submitted 20 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Project page: https://www.yusun.work/TRACE/TRACE.html

arXiv:2305.19555 [pdf, ps, other]

Large Language Models Are Not Strong Abstract Reasoners

Authors: Gaël Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie

Abstract: Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning i… ▽ More Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks. We perform extensive evaluations of state-of-the-art LLMs, showing that they currently achieve very limited performance in contrast with other natural language tasks, even when applying techniques that have been shown to improve performance on other NLP tasks. We argue that guiding LLM generation to follow causal paths could help improve the generalisation and reasoning abilities of LLMs. △ Less

Submitted 2 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 50 pages, 14 pages for the main paper and 36 pages for the supplement, 35 figures, 17 tables. V3: performed additional experiments

ACM Class: I.2.2; I.2.3; I.2.7; I.5.1

arXiv:2305.12599 [pdf, other]

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Authors: Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data… ▽ More Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning. △ Less

Submitted 17 April, 2025; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures, the Findings of ACL 2024

arXiv:2303.07585 [pdf, other]

Input-length-shortening and text generation via attention values

Authors: Neşet Özkan Tan, Alex Yuxuan Peng, Joshua Bensemann, Qiming Bao, Tim Hartill, Mark Gahegan, Michael Witbrock

Abstract: Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-leng… ▽ More Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 7 pages, 4 figures. AAAI23-EMC2

arXiv:2210.15618 [pdf, ps, other]

Two $q$-operational equations and Hahn polynomials

Authors: Jing Gu, DunKun Yang, Qi Bao

Abstract: Motivated by Liu's recent work in \cite{Liu2022}. We shall reveal the essential feature of Hahn polynomials by presenting two new $q$-exponential operators. These lead us to use a systematic method to study identities involving Hahn polynomials. As applications, we use the method of $q$-exponential operator to prove the bilinear generating function of Hahn polynomials and Heine's second transforma… ▽ More Motivated by Liu's recent work in \cite{Liu2022}. We shall reveal the essential feature of Hahn polynomials by presenting two new $q$-exponential operators. These lead us to use a systematic method to study identities involving Hahn polynomials. As applications, we use the method of $q$-exponential operator to prove the bilinear generating function of Hahn polynomials and Heine's second transformation formula. Moreover, a generalization of $q$-Gaussian summation is given, too. △ Less

Submitted 5 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

MSC Class: 05A30; 33D90

arXiv:2209.02431 [pdf, other]

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

Authors: Shuaitao Zhao, Kun Liu, Yuhang Huang, Qian Bao, Dan Zeng, Wu Liu

Abstract: Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happe… ▽ More Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happens. On the other hand, existing bottom-up methods consider all people at the same time and capture the global knowledge of the entire image. However, they are less accurate than the top-down methods due to the scale variation. To address these problems, we propose a novel Dual-Pipeline Integrated Transformer (DPIT) by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive fields and achieve their complementarity. Specifically, DPIT consists of two branches, the bottom-up branch deals with the whole image to capture the global visual information, while the top-down branch extracts the feature representation of local vision from the single-human bounding box. Then, the extracted feature representations from bottom-up and top-down branches are fed into the transformer encoder to fuse the global and local knowledge interactively. Moreover, we define the keypoint queries to explore both full-scene and single-human posture visual clues to realize the mutual complementarity of the two pipelines. To the best of our knowledge, this is one of the first works to integrate the bottom-up and top-down pipelines with transformers for human pose estimation. Extensive experiments on COCO and MPII datasets demonstrate that our DPIT achieves comparable performance to the state-of-the-art methods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2209.01059 [pdf, other]

In-Place Gestures Classification via Long-term Memory Augmented Network

Authors: Lizhi Zhao, Xuequan Lu, Qianyue Bao, Meili Wang

Abstract: In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (u… ▽ More In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: This paper is accepted to IEEE ISMAR2022

arXiv:2209.00776 [pdf, other]

doi 10.1145/3503161.3547743

WOC: A Handy Webcam-based 3D Online Chatroom

Authors: Chuanhang Yan, Yu Sun, Qian Bao, Jinhui Pang, Wu Liu, Tao Mei

Abstract: We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual a… ▽ More We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual avatar manipulation, which also supports the user-defined characters. With the distributed data flow service, the system delivers highly synchronized motion and voice for all users. Deployed on the website and no installation required, users can freely experience the virtual online chat at https://yanch.cloud. △ Less

Submitted 17 March, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.03609 [pdf, other]

Continual Learning for Tumor Classification in Histopathology Images

Authors: Veena Kaustaban, Qinle Ba, Ipshita Bhattacharya, Nahil Sobh, Satarupa Mukherjee, Jim Martin, Mohammad Saleh Miri, Christoph Guetter, Amal Chaturvedi

Abstract: Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from cata… ▽ More Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from catastrophic forgetting when adapted to unseen data with transfer learning. With an increasing need for deep learning models to handle ever changing data distributions, including evolving patient population and new diagnosis assays, continual learning models that alleviate model forgetting need to be introduced in DP based analysis. However, to our best knowledge, there is no systematic study of such models for DP-specific applications. Here, we propose CL scenarios in DP settings, where histopathology image data from different sources/distributions arrive sequentially, the knowledge of which is integrated into a single model without training all the data from scratch. We then established an augmented dataset for colorectal cancer H&E classification to simulate shifts of image appearance and evaluated CL model performance in the proposed CL scenarios. We leveraged a breast tumor H&E dataset along with the colorectal cancer to evaluate CL from different tumor types. In addition, we evaluated CL methods in an online few-shot setting under the constraints of annotation and computational resources. We revealed promising results of CL in DP applications, potentially paving the way for application of these methods in clinical practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI, a MICCAI2022 workshop: https://sites.google.com/view/movi2022

arXiv:2207.14000 [pdf, other]

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Authors: Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, Jiamou Liu

Abstract: Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an it… ▽ More Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gated attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language. △ Less

Submitted 17 April, 2025; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 10 pages, 3 figures, The 2nd International Joint Conference on Learning & Reasoning and 16th International Workshop on Neural-Symbolic Learning and Reasoning (IJCLR-NeSy 2022)

arXiv:2207.01442 [pdf, ps, other]

Notes on $q$-partial differential equations for $q$-Laguerre polynomials and little $q$-Jacobi polynomials

Authors: Qi Bao, DunKun Yang

Abstract: We define two common $q$-orthogonal polynomials: homogeneous $q$-Laguerre polynomials and homogeneous little $q$-Jacobi polynomials. They can be viewed separately as solutions to two $q$-partial differential equations. Then, we proved that if an analytic function satisfies a certain system of $q$-partial differential equations, if and only if it can be expanded in terms of homogeneous $q$-Laguerre… ▽ More We define two common $q$-orthogonal polynomials: homogeneous $q$-Laguerre polynomials and homogeneous little $q$-Jacobi polynomials. They can be viewed separately as solutions to two $q$-partial differential equations. Then, we proved that if an analytic function satisfies a certain system of $q$-partial differential equations, if and only if it can be expanded in terms of homogeneous $q$-Laguerre polynomials or homogeneous little $q$-Jacobi polynomials. As applications, we obtain generalizations of the Ramanujan $q$-beta integrals and Andrews-Askey integrals. Additionally, we present an operator representation of $q$-Laguerre polynomials that facilitates the computation of identities involving $q$-Laguerre polynomials. △ Less

Submitted 6 May, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

MSC Class: 05A30; 11B65; 32A05; 33D15; 33D45; 39A13

arXiv:2206.13039 [pdf]

Anisotropic polaritons in 2D vdW materials

Authors: Babar Shabbir, Weiliang Ma, Qiaoliang Bao

Abstract: Perhaps the most significant progress to the field of infrared optics and nanophotonics has been made through the real space realisation of polaritons in two-dimensional materials that provide maximum light confinement functionalities. The recent breakthrough discovery of in-plane hyperbolicity in the natural van der Waals material has revealed a most exciting optical property which enable an in-p… ▽ More Perhaps the most significant progress to the field of infrared optics and nanophotonics has been made through the real space realisation of polaritons in two-dimensional materials that provide maximum light confinement functionalities. The recent breakthrough discovery of in-plane hyperbolicity in the natural van der Waals material has revealed a most exciting optical property which enable an in-plane anisotropic dispersion. Yet, the most intriguing feature of in-plane anisotropic dispersion is the manipulation of polaritons at the nano scale. This development has opened a new window of opportunity in order to develop unique nanophotonic devices with unprecedented controls. This chapter will cover these developments with focus on fundamental understandings and progress of real space visualisation of in-plane anisotropic polaritons in the near-field range. The last section will conclude with the future prospects of this rapidly emerging area. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2205.12945 [pdf, other]

doi 10.1186/s43593-022-00026-y

Conformal optical black hole for cavity

Authors: Qingtao Ba, Yangyang Zhou, Jue Li, Wen Xiao, Longfang Ye, Yineng Liu, Jin-hui Chen, Huanyang Chen

Abstract: Whispering gallery mode (WGM) cavity is important for exploring physics of strong light-matter interaction. Yet it suffers from the notorious radiation loss universally due to the light tunneling effect through the curved boundary. In this work, we propose and demonstrate an optical black hole (OBH) cavity based on transformation optics. The radiation loss of all WGMs in OBH cavity is completely i… ▽ More Whispering gallery mode (WGM) cavity is important for exploring physics of strong light-matter interaction. Yet it suffers from the notorious radiation loss universally due to the light tunneling effect through the curved boundary. In this work, we propose and demonstrate an optical black hole (OBH) cavity based on transformation optics. The radiation loss of all WGMs in OBH cavity is completely inhibited by an infinite wide potential barrier. Besides, the WGM field outside the cavity is revealed to follow $1/r^α$ decay rule based on conformal mapping, which is fundamentally different from the conventional Hankel-function distributions in a homogeneous cavity. Experimentally, a truncated OBH cavity is achieved based on the effective medium theory, and both the Q-factor enhancement and tightly confined WGM field are measured in the microwave spectra which agree well with the theoretical results. The circular OBH cavity is further applied to the arbitrary-shaped cavities including single-core and multi-core structures with high-Q factor via the conformal mapping. The OBH cavity design strategy can be generalized to resonant modes of various wave systems, such as acoustic and elastic waves, and finds applications in energy harvesting and optoelectronics. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Journal ref: eLight(2022)

arXiv:2204.11625

A Generalization of q-Binomial Theorem

Authors: Qi Bao

Abstract: By using Liu's $q$-partial differential equations theory, we prove that if an analytic function in several variables satisfies a system of $q$-partial differential equations, if and only if it can be expanded in terms of homogeneous $(q,c)$-Al-Salam-Carlitz polynomials. As an application, we proved that for $c\neq0$ and $\max \{|cq|,|x|\}<1$, \begin{align*} \sum_{n=0}^{\infty} \frac{ (a;q)_n }{(cq… ▽ More By using Liu's $q$-partial differential equations theory, we prove that if an analytic function in several variables satisfies a system of $q$-partial differential equations, if and only if it can be expanded in terms of homogeneous $(q,c)$-Al-Salam-Carlitz polynomials. As an application, we proved that for $c\neq0$ and $\max \{|cq|,|x|\}<1$, \begin{align*} \sum_{n=0}^{\infty} \frac{ (a;q)_n }{(cq;q)_n}x^n=(ax/c;q)_{\infty} \sum_{n=0}^{\infty} \frac{x^n}{(cq;q)_n}, \end{align*} which is a generalization of famous $q$-binomial theorem or so-called Cauchy theorem. △ Less

Submitted 30 April, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: The error in this manuscript is that the right side of equation (3.1) is not suitable for formula (1.1). This causes formula (3.1) to be incorrect. Therefore, theorem 1.2 is also incorrect. However, the second part of this manuscript about the theorem of q-partial differential equation theory is still the correct conclusion

MSC Class: 05A30; 32A05

arXiv:2203.12186 [pdf, other]

AbductionRules: Training Transformers to Explain Unexpected Inputs

Authors: Nathan Young, Qiming Bao, Joshua Bensemann, Michael Witbrock

Abstract: Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a… ▽ More Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a group of natural language datasets designed to train and test generalisable abduction over natural-language knowledge bases. We use these datasets to finetune pretrained Transformers and discuss their performance, finding that our models learned generalisable abductive techniques but also learned to exploit the structure of our data. Finally, we discuss the viability of this approach to abductive reasoning and ways in which it may be improved in future work. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2202.13877 [pdf]

Negative reflection of polaritons at the nanoscale in a low-loss natural medium

Authors: Gonzalo Alvarez-Perez, Jiahua Duan, Javier Taboada-Gutierrez, Qingdong Ou, Elizaveta Nikulina, Song Liu, James H. Edgar, Qiaoliang Bao, Vincenzo Giannini, Rainer Hillenbrand, J. Martin-Sanchez, Alexey Y. Nikitin, Pablo Alonso-Gonzalez

Abstract: Negative reflection occurs when light is reflected towards the same side of the normal to the boundary from which it is incident. This exotic optical phenomenon, which provides a new avenue towards light manipulation, is not only yet to be visualized in real space but remains largely unexplored both at the nanoscale and in natural media. Here, we directly visualize nanoscale-confined polaritons ne… ▽ More Negative reflection occurs when light is reflected towards the same side of the normal to the boundary from which it is incident. This exotic optical phenomenon, which provides a new avenue towards light manipulation, is not only yet to be visualized in real space but remains largely unexplored both at the nanoscale and in natural media. Here, we directly visualize nanoscale-confined polaritons negatively reflecting on subwavelength mirrors fabricated in a low-loss van der Waals crystal. Our near-field nanoimaging results unveil an unconventional and broad tunability of both the polaritonic wavelength and direction of propagation upon negative reflection. Based on these findings, we introduce a novel device in nano-optics: a hyperbolic nanoresonator, in which hyperbolic polaritons with different momenta reflect back to a common point source, enhancing its intensity. These results pave the way to realize nanophotonics in low-loss natural media, providing a novel and efficient route to confine and control the flow of light at the nanoscale, key for future optical on-chip nanotechnologies. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.09758 [pdf, ps, other]

Notes on Generalized Grötzsch Ring Function and Generalized Hersch-Pfluger Distortion Function

Authors: Qi Bao, MiaoKun Wang

Abstract: For $a\in(0,1)$, $r\in(0,1)$ and $K\in(1,\infty)$, let $μ_{a}(r)$ and $\varphi_{K}^{a}(r)$ be the generalized Grötzsch ring function and generalized Hersch-Pfluger distortion function. In the past few years, the functions $μ_{a}(r)$ and $\varphi_{K}^{a}(r)$, and their special cases $μ_{1/2}(r)$ and $\varphi_{K}^{1/2}(r)$ have been playing the very important role on the theory of quasiconformal map… ▽ More For $a\in(0,1)$, $r\in(0,1)$ and $K\in(1,\infty)$, let $μ_{a}(r)$ and $\varphi_{K}^{a}(r)$ be the generalized Grötzsch ring function and generalized Hersch-Pfluger distortion function. In the past few years, the functions $μ_{a}(r)$ and $\varphi_{K}^{a}(r)$, and their special cases $μ_{1/2}(r)$ and $\varphi_{K}^{1/2}(r)$ have been playing the very important role on the theory of quasiconformal mappings and (generalized) Ramanujan's modular equations. In this paper, we present a series expansion of $μ_{a}(r)$, and thus prove that the function $r\mapsto -[μ_{a}(r)-\log{(e^{R(a)/2})/r}]$ is absolutely monotonic on $(0,1)$. Here $R(a)$ is the Ramanujan constant. In addition, we also investigate the submultiplicative and power submultiplicative properties of $\varphi_{K}^{a}(r)$, and establish some new inequalities for $\varphi_{K}^{a}(r)$ in terms of elementary functions. △ Less

Submitted 4 March, 2025; v1 submitted 20 February, 2022; originally announced February 2022.

MSC Class: 11F03; 33C05

Showing 1–50 of 116 results for author: Ba, Q