Search | arXiv e-print repository

A Deep Learning Approach to Anomaly Detection in High-Frequency Trading Data

Authors: Qiuliuyang Bao, Jiawei Wang, Hao Gong, Yiwei Zhang, Xiaojun Guo, Hanrui Feng

Abstract: This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted… ▽ More This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted attention mechanism of the Transformer, and uses a classifier to identify abnormal events. Experimental results on a real high-frequency dataset containing order book depth, spread, and trading volume show that the proposed method significantly outperforms traditional machine learning (such as decision trees and random forests) and deep learning methods (such as MLP, CNN, RNN, LSTM) in terms of accuracy (0.93), F1-Score (0.91), and AUC-ROC (0.95). Ablation experiments verify the contribution of each component, and the visualization of order book depth and anomaly detection further reveals the effectiveness of the model under complex market dynamics. Despite the false positive problem, the model still provides important support for market supervision. In the future, noise processing can be optimized and extended to other markets to improve generalization and real-time performance. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2503.06396 [pdf, other]

Optimizing Minimum Vertex Cover Solving via a GCN-assisted Heuristic Algorithm

Authors: Enqiang Zhu, Qiqi Bao, Yu Zhang, Chanjuan Liu

Abstract: The problem of finding a minimum vertex cover (MVC) in a graph is a well-known NP-hard problem with significant practical applications in optimization and scheduling. Its complexity, combined with the increasing scale of problems, underscores the need for efficient and effective algorithms. However, existing heuristic algorithms for MVC often rely on simplistic initialization strategies and overlo… ▽ More The problem of finding a minimum vertex cover (MVC) in a graph is a well-known NP-hard problem with significant practical applications in optimization and scheduling. Its complexity, combined with the increasing scale of problems, underscores the need for efficient and effective algorithms. However, existing heuristic algorithms for MVC often rely on simplistic initialization strategies and overlook the impact of edge attributes and neighborhood information on vertex selection. In this paper, we introduce GCNIVC, a novel heuristic search algorithm designed to address the limitations of existing methods for solving MVC problems in large-scale graphs. Our approach features two main innovations. First, it utilizes a Graph Convolutional Network (GCN) to capture the global structure of graphs, which enables the generation of high-quality initial solutions that enhance the efficiency of the subsequent search process. Second, GCNIVC introduces a new heuristic that employs three containers and the concept of double-covered edges (dc-edges), improving search efficiency and providing greater flexibility for adding and removing operations based on edge attributes. Through extensive experiments on benchmark datasets, we demonstrate that GCNIVC outperforms state-of-the-art MVC algorithms in terms of both accuracy and efficiency. Our results highlight the effectiveness of GCNIVC's GCN-assisted initialization and its edge-informed search strategy. This study not only advances the understanding of MVC problem-solving but also contributes a new tool for addressing large-scale graph optimization challenges. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.03755 [pdf]

Research on evolution and early warning model of network public opinion based on online Latent Dirichlet distribution model and BP neural network

Authors: Qiaozhi Bao, Yanlin Chen, Xusheng Ji

Abstract: Online public opinion is increasingly becoming a significant factor affecting the stability of the internet and society, particularly as the frequency of online public opinion crises has risen in recent years. Enhancing the capability for early warning of online public opinion crises is urgent. The most effective approach is to identify potential crises in their early stages and implement correspo… ▽ More Online public opinion is increasingly becoming a significant factor affecting the stability of the internet and society, particularly as the frequency of online public opinion crises has risen in recent years. Enhancing the capability for early warning of online public opinion crises is urgent. The most effective approach is to identify potential crises in their early stages and implement corresponding management measures. This study establishes a preliminary indicator system for online public opinion early warning, based on the principles of indicator system construction and the characteristics and evolution patterns of online public opinion. Subsequently, data-driven methodologies were employed to collect and preprocess public opinion indicator data. Utilizing grey relational analysis and the K-Means clustering algorithm, we classified online public opinion events into three levels: slight, warning, and severe. Furthermore, we constructed an online topic evolution model using the online Hierarchical Dirichlet Process model to analyze the thematic changes of online public opinion events across different warning levels. Finally, we developed an online public opinion early warning model using a Backpropagation (BP) neural network. The test results of early warning samples show that the model achieves high accuracy. Thus, in practical early warning applications, the BP neural network can be effectively utilized for predicting online public opinion events. △ Less

Submitted 16 February, 2025; originally announced March 2025.

arXiv:2503.03702 [pdf, other]

Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models

Authors: Jiyue Jiang, Alfred Kar Yin Truong, Yanyu Chen, Qinghang Bao, Sheng Wang, Pengan Chen, Jiuming Wang, Lingpeng Kong, Yu Li, Chuan Wu

Abstract: High-quality data resources play a crucial role in learning large language models (LLMs), particularly for low-resource languages like Cantonese. Despite having more than 85 million native speakers, Cantonese is still considered a low-resource language in the field of natural language processing (NLP) due to factors such as the dominance of Mandarin, lack of cohesion within the Cantonese-speaking… ▽ More High-quality data resources play a crucial role in learning large language models (LLMs), particularly for low-resource languages like Cantonese. Despite having more than 85 million native speakers, Cantonese is still considered a low-resource language in the field of natural language processing (NLP) due to factors such as the dominance of Mandarin, lack of cohesion within the Cantonese-speaking community, diversity in character encoding and input methods, and the tendency of overseas Cantonese speakers to prefer using English. In addition, rich colloquial vocabulary of Cantonese, English loanwords, and code-switching characteristics add to the complexity of corpus collection and processing. To address these challenges, we collect Cantonese texts from a variety of sources, including open source corpora, Hong Kong-specific forums, Wikipedia, and Common Crawl data. We conduct rigorous data processing through language filtering, quality filtering, content filtering, and de-duplication steps, successfully constructing a high-quality Cantonese corpus of over 2 billion tokens for training large language models. We further refined the model through supervised fine-tuning (SFT) on curated Cantonese tasks, enhancing its ability to handle specific applications. Upon completion of the training, the model achieves state-of-the-art (SOTA) performance on four Cantonese benchmarks. After training on our dataset, the model also exhibits improved performance on other mainstream language tasks. △ Less

Submitted 5 March, 2025; originally announced March 2025.

arXiv:2501.07849 [pdf, ps, other]

The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation

Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Qingshuang Bao, Weipeng Jiang, Qian Wang, Chao Shen, Yang Liu

Abstract: Large Language Models (LLMs) have emerged as the new recommendation engines, surpassing traditional methods in both capability and scope, particularly in code generation. In this paper, we reveal a novel provider bias in LLMs: without explicit directives, these models show systematic preferences for services from specific providers in their recommendations (e.g., favoring Google Cloud over Microso… ▽ More Large Language Models (LLMs) have emerged as the new recommendation engines, surpassing traditional methods in both capability and scope, particularly in code generation. In this paper, we reveal a novel provider bias in LLMs: without explicit directives, these models show systematic preferences for services from specific providers in their recommendations (e.g., favoring Google Cloud over Microsoft Azure). To systematically investigate this bias, we develop an automated pipeline to construct the dataset, incorporating 6 distinct coding task categories and 30 real-world application scenarios. Leveraging this dataset, we conduct the first comprehensive empirical study of provider bias in LLM code generation across seven state-of-the-art LLMs, utilizing approximately 500 million tokens (equivalent to $5,000+ in computational costs). Our findings reveal that LLMs exhibit significant provider preferences, predominantly favoring services from Google and Amazon, and can autonomously modify input code to incorporate their preferred providers without users' requests. Such a bias holds far-reaching implications for market dynamics and societal equilibrium, potentially contributing to digital monopolies. It may also deceive users and violate their expectations, leading to various consequences. We call on the academic community to recognize this emerging issue and develop effective evaluation and mitigation methods to uphold AI security and fairness. △ Less

Submitted 3 June, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

Comments: 27 pages, 13 figures

arXiv:2410.10874 [pdf]

Optimizing Transformer based on high-performance optimizer for predicting employment sentiment in American social media content

Authors: Feiyang Wang, Qiaozhi Bao, Zixuan Wang, Yanlin Chen

Abstract: This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during… ▽ More This article improves the Transformer model based on swarm intelligence optimization algorithm, aiming to predict the emotions of employment related text content on American social media. Through text preprocessing, feature extraction, and vectorization, the text data was successfully converted into numerical data and imported into the model for training. The experimental results show that during the training process, the accuracy of the model gradually increased from 49.27% to 82.83%, while the loss value decreased from 0.67 to 0.35, indicating a significant improvement in the performance of the model on the training set. According to the confusion matrix analysis of the training set, the accuracy of the training set is 86.15%. The confusion matrix of the test set also showed good performance, with an accuracy of 82.91%. The accuracy difference between the training set and the test set is only 3.24%, indicating that the model has strong generalization ability. In addition, the evaluation of polygon results shows that the model performs well in classification accuracy, sensitivity, specificity, and area under the curve (AUC), with a Kappa coefficient of 0.66 and an F-measure of 0.80, further verifying the effectiveness of the model in social media sentiment analysis. The improved model proposed in this article not only improves the accuracy of sentiment recognition in employment related texts on social media, but also has important practical significance. This social media based data analysis method can not only capture social dynamics in a timely manner, but also promote decision-makers to pay attention to public concerns and provide data support for improving employment conditions. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 5 pages, 5 figures

arXiv:2410.01459 [pdf]

A Smart Chair for Health Monitoring in Daily Life

Authors: Nguyen Thi Minh Huong, Vo Quoc Bao, Nguyen Trung Hau, Huynh Quang Linh

Abstract: Recent research has focused on the risks associated with poor sitting posture and the impact of sitting on biological parameters, such as heart rate because prolonged sitting is common across all ages and professions. In this work, we propose a novel approach that can display simultaneously posture and heart rate in real-time. In this device, pressure sensors are embedded into a flexible separate… ▽ More Recent research has focused on the risks associated with poor sitting posture and the impact of sitting on biological parameters, such as heart rate because prolonged sitting is common across all ages and professions. In this work, we propose a novel approach that can display simultaneously posture and heart rate in real-time. In this device, pressure sensors are embedded into a flexible separate cushion easily put on any chair to provide sitting behaviours and a smartwatch-like PPG module is worn on the user's wrist. Regarding posture classification, pressure figures of ten pressure sensors under the seat bottom are inputs of four machine learning models, giving a high accuracy of 99 per cent. Besides, the Electrocardiography recording module is illustrated with the same results as a commercial device called DFRobot. Another advantage of this smart chair is that it not only simultaneously displays both sitting postures and heart rates on external devices like laptops, mobile phones, or televisions through microcontrollers but also offers the relationship between them to help people adjust their sitting behaviours, avoiding influencing heart rate. The smart chair is expected to be useful equipment for people with a sedentary lifestyle, especially office workers. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.17778 [pdf, other]

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs

Authors: Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zicheng Liu, Zhongdao Wang, Emad Barsoum

Abstract: Diffusion-based image super-resolution (SR) models have attracted substantial interest due to their powerful image restoration capabilities. However, prevailing diffusion models often struggle to strike an optimal balance between efficiency and performance. Typically, they either neglect to exploit the potential of existing extensive pretrained models, limiting their generative capacity, or they n… ▽ More Diffusion-based image super-resolution (SR) models have attracted substantial interest due to their powerful image restoration capabilities. However, prevailing diffusion models often struggle to strike an optimal balance between efficiency and performance. Typically, they either neglect to exploit the potential of existing extensive pretrained models, limiting their generative capacity, or they necessitate a dozens of forward passes starting from random noises, compromising inference efficiency. In this paper, we present DoSSR, a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images. At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models. This integration not only improves the use of diffusion prior but also boosts inference efficiency. Moreover, we advance our method by transitioning the discrete shift process to a continuous formulation, termed as DoS-SDEs. This advancement leads to the fast and customized solvers that further enhance sampling efficiency. Empirical results demonstrate that our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps. Compared to previous diffusion prior based methods, our approach achieves a remarkable speedup of 5-7 times, demonstrating its superior efficiency. Code: https://github.com/QinpengCui/DoSSR. △ Less

Submitted 10 December, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: This paper is accepted by NeurIPS 2024

arXiv:2409.08588 [pdf]

Improved Unet model for brain tumor image segmentation based on ASPP-coordinate attention mechanism

Authors: Zixuan Wang, Yanlin Chen, Feiyang Wang, Qiaozhi Bao

Abstract: In this paper, we propose an improved Unet model for brain tumor image segmentation, which combines coordinate attention mechanism and ASPP module to improve the segmentation effect. After the data set is divided, we do the necessary preprocessing to the image and use the improved model to experiment. First, we trained and validated the traditional Unet model. By analyzing the loss curve of the tr… ▽ More In this paper, we propose an improved Unet model for brain tumor image segmentation, which combines coordinate attention mechanism and ASPP module to improve the segmentation effect. After the data set is divided, we do the necessary preprocessing to the image and use the improved model to experiment. First, we trained and validated the traditional Unet model. By analyzing the loss curve of the training set and the validation set, we can see that the loss value continues to decline at the first epoch and becomes stable at the eighth epoch. This process shows that the model constantly optimizes its parameters to improve performance. At the same time, the change in the miou (mean Intersection over Union) index shows that the miou value exceeded 0.6 at the 15th epoch, remained above 0.6 thereafter, and reached above 0.7 at the 46th epoch. These results indicate that the basic Unet model is effective in brain tumor image segmentation. Next, we introduce an improved Unet algorithm based on coordinate attention mechanism and ASPP module for experiments. By observing the loss change curves of the training set and the verification set, it is found that the loss value reaches the lowest point at the sixth epoch and then remains relatively stable. At the same time, the miou indicator has stabilized above 0.7 since the 20th epoch and has reached a maximum of 0.76. These results show that the new mechanism introduced significantly improves the segmentation ability of the model. Finally, we apply the trained traditional Unet model and the improved Unet model based on the coordinate attention mechanism and ASPP module to the test set for brain tumor image segmentation prediction. Compared to the traditional Unet, the enhanced model offers superior segmentation and edge accuracy, providing a more reliable method for medical image analysis with the coordinate attention mechanism and ASPP module. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 5 pages, 8 figures, accepted by ICBASE 2024

arXiv:2409.02119 [pdf, other]

CoRA: Optimizing Low-Rank Adaptation with Common Subspace of Large Language Models

Authors: Xiaojun Xiao, Sen Shen, Qiming Bao, Hongfei Rong, Kairui Liu, Zhongsheng Wang, Jiamou Liu

Abstract: In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in Lo… ▽ More In fine-tuning large language models (LLMs), conserving computational resources while maintaining effectiveness and improving outcomes within the same computational constraints is crucial. The Low-Rank Adaptation (LoRA) strategy balances efficiency and performance in fine-tuning large models by reducing the number of trainable parameters and computational costs. However, current advancements in LoRA might be focused on its fine-tuning methodologies, with not as much exploration as might be expected into further compression of LoRA. Since most of LoRA's parameters might still be superfluous, this may lead to unnecessary wastage of computational resources. In this paper, we propose \textbf{CoRA}: leveraging shared knowledge to optimize LoRA training by substituting its matrix $B$ with a common subspace from large models. Our two-fold method includes (1) Freezing the substitute matrix $B$ to halve parameters while training matrix $A$ for specific tasks and (2) Using the substitute matrix $B$ as an enhanced initial state for the original matrix $B$, achieving improved results with the same parameters. Our experiments show that the first approach achieves the same efficacy as the original LoRA fine-tuning while being more efficient than halving parameters. At the same time, the second approach has some improvements compared to LoRA's original fine-tuning performance. They generally attest to the effectiveness of our work. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.16756 [pdf, other]

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

Authors: Jiyue Jiang, Pengan Chen, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong-Macau Greater Bay Area, and in substantial Cantonese-speaking populations in places like Singapore and North America. Despite its wide use, Cantonese has scant representation in NLP research, especially compared to other languages from similarly developed regions. To bridge these gaps, we outline current Cantonese NLP methods and introduce new benchmarks designed to evaluate LLM performance in factual generation, mathematical logic, complex reasoning, and general knowledge in Cantonese, which aim to advance open-source Cantonese LLM technology. We also propose future research directions and recommended models to enhance Cantonese LLM development. △ Less

Submitted 17 February, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted by NAACL 2025

arXiv:2408.02922 [pdf, other]

Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network

Authors: Xinyi Zhang, Qiqi Bao, Qinpeng Cui, Wenming Yang, Qingmin Liao

Abstract: Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modelin… ▽ More Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modeling. Nonetheless, Mamba still faces challenges in precisely exploiting local dependencies between joints. To address these issues, we propose a new attention-free hybrid spatiotemporal architecture named Hybrid Mamba-GCN (Pose Magic). This architecture introduces local enhancement with GCN by capturing relationships between neighboring joints, thus producing new representations to complement Mamba's outputs. By adaptively fusing representations from Mamba and GCN, Pose Magic demonstrates superior capability in learning the underlying 3D structure. To meet the requirements of real-time inference, we also provide a fully causal version. Extensive experiments show that Pose Magic achieves new SOTA results ($\downarrow 0.9 mm$) while saving $74.1\%$ FLOPs. In addition, Pose Magic exhibits optimal motion consistency and the ability to generalize to unseen sequence lengths. △ Less

Submitted 25 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: This work has been accepted by AAAI 2025

arXiv:2407.16341 [pdf, other]

Motion Capture from Inertial and Vision Sensors

Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 17 pages,9 figures

arXiv:2407.10162 [pdf, other]

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Authors: Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang

Abstract: Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at L… ▽ More Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at LLM reasoning tasks that can enhance the performance of LLMs in multi-step deductive reasoning tasks by integrating logic programming. In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage. We propose a novel method of converting logic problems into symbolic integration with an inference engine. This approach leverages large language models' situational understanding and imitation skills and uses symbolic memory to enhance multi-step deductive reasoning capabilities. Our results show that the ChatLogic framework significantly improves the multi-step reasoning capabilities of LLMs. The source code and data are available at \url{https://github.com/Strong-AI-Lab/ChatLogic} △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 8 pages, 3 figures. This paper has been accepted by WCCI IJCNN 2024

arXiv:2407.09521 [pdf, other]

Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition

Authors: Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang

Abstract: We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conven… ▽ More We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks. This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network. The core strength of this approach is its ability to utilize the ample, coarser temporal cues found in conventional frames for effective emotion recognition. Consequently, our method adeptly interprets both temporal and spatial information from the conventional frame domain, eliminating the need for specialized sensing devices, e.g., event-based camera. The effectiveness of our approach is thoroughly demonstrated using both existing and our compiled single-eye emotion recognition datasets, achieving unparalleled performance in accuracy and efficiency over existing state-of-the-art methods. △ Less

Submitted 20 June, 2024; originally announced July 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.08831 [pdf, other]

Structured Model Pruning for Efficient Inference in Computational Pathology

Authors: Mohammed Adnan, Qinle Ba, Nazim Shaikh, Shivam Kalra, Satarupa Mukherjee, Auranuch Lorsakul

Abstract: Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to devel… ▽ More Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to develop efficient models, especially for deploying AI solutions under resource-constrains or with time sensitivity. One potential solution is to perform model compression, a set of techniques that remove less important model components or reduce parameter precision, to reduce model computation demand. In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance. To this end, we develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging, with which we evaluate multiple pruning heuristics on nuclei instance segmentation and classification, and empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.07905 [pdf]

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

Authors: Zheng Xu, Yulu Gong, Yanlin Zhou, Qiaozhi Bao, Wenpin Qian

Abstract: With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper pro… ▽ More With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper proposes an automatic task scheduling scheme based on deep learning and reinforcement learning. Firstly, the deep learning technology is used to monitor and predict the parameters in the cloud computing system in real time to obtain the system status information. Then, combined with reinforcement learning algorithm, the task scheduling strategy is dynamically adjusted according to the real-time system state and task characteristics to achieve the optimal utilization of system resources and the maximum of task execution efficiency. This paper verifies the effectiveness and performance advantages of the proposed scheme in experiments, and proves the potential and application prospect of deep learning and reinforcement learning in automatic task scheduling in large-scale cloud computing systems. △ Less

Submitted 26 February, 2024; originally announced March 2024.

arXiv:2401.01078 [pdf, other]

Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

Authors: Triet Minh Huynh, Quan Le Bao

Abstract: Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacio… ▽ More Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content. △ Less

Submitted 4 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2310.09430 [pdf, ps, other]

Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning

Authors: Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu

Abstract: Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets name… ▽ More Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus" that extend standard logical reasoning datasets to evaluate the robustness of the LLM's reasoning. For each, we create three subsets: the first with randomly shuffled options, the second with the correct choices replaced by "none of the other options is correct", and the third with a combination of shuffling and substitution. Experiments on these datasets show that these simple augmentations greatly hinder the models' performance. Despite their high performance on the original publicly available datasets, we find that all models perform poorly on these newly constructed datasets. We also demonstrate that introducing task variations into the training set can markedly improve the model's performance on both the original and our developed datasets. Finally, we show that applying logic-driven data augmentation for fine-tuning and prompting can enhance generalisation in both discriminative and generative models, offering a path to improving their robustness for tasks involving logical reasoning. Source code and data are made publicly available at https://github.com/Strong-AI-Lab/Logical-and-abstract-reasoning. △ Less

Submitted 16 January, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: The short version (v3) was accepted for oral presentation at the first LLM@IJCAI 2023 non-archival symposium, and the full version was accepted by ICONIP 2024

arXiv:2309.10444 [pdf, other]

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other stud… ▽ More Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other students understand the solution and promotes a deeper understanding of related concepts. However, it is often difficult for students to craft effective solution explanations, due to limited subject understanding. To help scaffold the task of automated explanation generation, we present and evaluate a framework called "ILearner-LLM", that iteratively enhances the generated explanations for the given questions with large language models. Comprising an explanation generation model and an explanation evaluation model, the framework generates high-quality student-aligned explanations by iteratively feeding the quality rating score from the evaluation model back into the instruction prompt of the explanation generation model. Experimental results demonstrate the effectiveness of our ILearner-LLM on LLaMA2-13B and GPT-4 to generate higher quality explanations that are closer to those written by students on five PeerWise datasets. Our findings represent a promising path to enrich the learnersourcing experience for students and to enhance the capabilities of large language models for educational applications. △ Less

Submitted 16 January, 2025; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: The short version (v4) has been accepted as a non-archival workshop paper at AGI@ICLR 2024, and the full version has been accepted by the main track of AAAI/EAAI 2025

arXiv:2309.06169 [pdf, other]

Elucidating the solution space of extended reverse-time SDE for diffusion models

Authors: Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao

Abstract: Sampling from Diffusion Models can alternatively be seen as solving differential equations, where there is a challenge in balancing speed and image visual quality. ODE-based samplers offer rapid sampling time but reach a performance limit, whereas SDE-based samplers achieve superior quality, albeit with longer iterations. In this work, we formulate the sampling process as an Extended Reverse-Time… ▽ More Sampling from Diffusion Models can alternatively be seen as solving differential equations, where there is a challenge in balancing speed and image visual quality. ODE-based samplers offer rapid sampling time but reach a performance limit, whereas SDE-based samplers achieve superior quality, albeit with longer iterations. In this work, we formulate the sampling process as an Extended Reverse-Time SDE (ER SDE), unifying prior explorations into ODEs and SDEs. Theoretically, leveraging the semi-linear structure of ER SDE solutions, we offer exact solutions and approximate solutions for VP SDE and VE SDE, respectively. Based on the approximate solution space of the ER SDE, referred to as one-step prediction errors, we yield mathematical insights elucidating the rapid sampling capability of ODE solvers and the high-quality sampling ability of SDE solvers. Additionally, we unveil that VP SDE solvers stand on par with their VE SDE counterparts. Based on these findings, leveraging the dual advantages of ODE solvers and SDE solvers, we devise efficient high-quality samplers, namely ER-SDE-Solvers. Experimental results demonstrate that ER-SDE-Solvers achieve state-of-the-art performance across all stochastic samplers while maintaining efficiency of deterministic samplers. Specifically, on the ImageNet $128\times128$ dataset, ER-SDE-Solvers obtain 8.33 FID in only 20 function evaluations. Code is available at \href{https://github.com/QinpengCui/ER-SDE-Solver}{https://github.com/QinpengCui/ER-SDE-Solver} △ Less

Submitted 27 February, 2025; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: This paper has been accepted by WACV 2025 (Oral). The official version lacked proper attribution to the co-authors, and this version has been updated accordingly

arXiv:2306.08253 [pdf, ps, other]

doi 10.1109/TIFS.2023.3279979

Measures and Optimization for Robustness and Vulnerability in Disconnected Networks

Authors: Liwang Zhu, Qi Bao, Zhongzhi Zhang

Abstract: The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing… ▽ More The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing metrics. Using such a measure as an optimization criterion, we propose and study the problem of breaking down a network by attacking some key edges. We show that the objective function of the problem is monotonic but not submodular, which impose more challenging on the problem. We thus resort to greedy algorithms extended for non-submodular functions by iteratively deleting the most promising edges. We first propose a simple greedy algorithm with a proved bound for the approximation ratio and cubic-time complexity. To confront the computation challenge for large networks, we further propose an improved nearly-linear time greedy algorithm, which significantly speeds up the process for edge selection but sacrifices little accuracy. Extensive experimental results for a large set of real-world networks verify the effectiveness and efficiency of our algorithms, demonstrating that our algorithms outperform several baseline schemes. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 13 pages

Journal ref: IEEE Transactions on Information Forensics and Security,pp:3350-3362,2023

arXiv:2306.02850 [pdf, other]

TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments

Authors: Yu Sun, Qian Bao, Wu Liu, Tao Mei, Michael J. Black

Abstract: Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that… ▽ More Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that enables end-to-end reasoning about people in scenes. Our method, called TRACE, introduces several novel architectural components. Most importantly, it uses two new "maps" to reason about the 3D trajectory of people over time in camera, and world, coordinates. An additional memory unit enables persistent tracking of people even during long occlusions. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. By training it end-to-end, and using full image information, TRACE achieves state-of-the-art performance on tracking and HPS benchmarks. The code and dataset are released for research purposes. △ Less

Submitted 20 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Project page: https://www.yusun.work/TRACE/TRACE.html

arXiv:2305.19555 [pdf, ps, other]

Large Language Models Are Not Strong Abstract Reasoners

Authors: Gaël Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie

Abstract: Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning i… ▽ More Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks. We perform extensive evaluations of state-of-the-art LLMs, showing that they currently achieve very limited performance in contrast with other natural language tasks, even when applying techniques that have been shown to improve performance on other NLP tasks. We argue that guiding LLM generation to follow causal paths could help improve the generalisation and reasoning abilities of LLMs. △ Less

Submitted 2 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 50 pages, 14 pages for the main paper and 36 pages for the supplement, 35 figures, 17 tables. V3: performed additional experiments

ACM Class: I.2.2; I.2.3; I.2.7; I.5.1

arXiv:2305.12599 [pdf, other]

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Authors: Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data… ▽ More Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning. △ Less

Submitted 17 April, 2025; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures, the Findings of ACL 2024

arXiv:2303.07585 [pdf, other]

Input-length-shortening and text generation via attention values

Authors: Neşet Özkan Tan, Alex Yuxuan Peng, Joshua Bensemann, Qiming Bao, Tim Hartill, Mark Gahegan, Michael Witbrock

Abstract: Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-leng… ▽ More Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 7 pages, 4 figures. AAAI23-EMC2

arXiv:2209.02431 [pdf, other]

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

Authors: Shuaitao Zhao, Kun Liu, Yuhang Huang, Qian Bao, Dan Zeng, Wu Liu

Abstract: Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happe… ▽ More Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happens. On the other hand, existing bottom-up methods consider all people at the same time and capture the global knowledge of the entire image. However, they are less accurate than the top-down methods due to the scale variation. To address these problems, we propose a novel Dual-Pipeline Integrated Transformer (DPIT) by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive fields and achieve their complementarity. Specifically, DPIT consists of two branches, the bottom-up branch deals with the whole image to capture the global visual information, while the top-down branch extracts the feature representation of local vision from the single-human bounding box. Then, the extracted feature representations from bottom-up and top-down branches are fed into the transformer encoder to fuse the global and local knowledge interactively. Moreover, we define the keypoint queries to explore both full-scene and single-human posture visual clues to realize the mutual complementarity of the two pipelines. To the best of our knowledge, this is one of the first works to integrate the bottom-up and top-down pipelines with transformers for human pose estimation. Extensive experiments on COCO and MPII datasets demonstrate that our DPIT achieves comparable performance to the state-of-the-art methods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2209.01059 [pdf, other]

In-Place Gestures Classification via Long-term Memory Augmented Network

Authors: Lizhi Zhao, Xuequan Lu, Qianyue Bao, Meili Wang

Abstract: In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (u… ▽ More In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: This paper is accepted to IEEE ISMAR2022

arXiv:2209.00776 [pdf, other]

doi 10.1145/3503161.3547743

WOC: A Handy Webcam-based 3D Online Chatroom

Authors: Chuanhang Yan, Yu Sun, Qian Bao, Jinhui Pang, Wu Liu, Tao Mei

Abstract: We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual a… ▽ More We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual avatar manipulation, which also supports the user-defined characters. With the distributed data flow service, the system delivers highly synchronized motion and voice for all users. Deployed on the website and no installation required, users can freely experience the virtual online chat at https://yanch.cloud. △ Less

Submitted 17 March, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.03609 [pdf, other]

Continual Learning for Tumor Classification in Histopathology Images

Authors: Veena Kaustaban, Qinle Ba, Ipshita Bhattacharya, Nahil Sobh, Satarupa Mukherjee, Jim Martin, Mohammad Saleh Miri, Christoph Guetter, Amal Chaturvedi

Abstract: Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from cata… ▽ More Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from catastrophic forgetting when adapted to unseen data with transfer learning. With an increasing need for deep learning models to handle ever changing data distributions, including evolving patient population and new diagnosis assays, continual learning models that alleviate model forgetting need to be introduced in DP based analysis. However, to our best knowledge, there is no systematic study of such models for DP-specific applications. Here, we propose CL scenarios in DP settings, where histopathology image data from different sources/distributions arrive sequentially, the knowledge of which is integrated into a single model without training all the data from scratch. We then established an augmented dataset for colorectal cancer H&E classification to simulate shifts of image appearance and evaluated CL model performance in the proposed CL scenarios. We leveraged a breast tumor H&E dataset along with the colorectal cancer to evaluate CL from different tumor types. In addition, we evaluated CL methods in an online few-shot setting under the constraints of annotation and computational resources. We revealed promising results of CL in DP applications, potentially paving the way for application of these methods in clinical practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI, a MICCAI2022 workshop: https://sites.google.com/view/movi2022

arXiv:2207.14000 [pdf, other]

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Authors: Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, Jiamou Liu

Abstract: Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an it… ▽ More Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gated attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language. △ Less

Submitted 17 April, 2025; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 10 pages, 3 figures, The 2nd International Joint Conference on Learning & Reasoning and 16th International Workshop on Neural-Symbolic Learning and Reasoning (IJCLR-NeSy 2022)

arXiv:2203.12186 [pdf, other]

AbductionRules: Training Transformers to Explain Unexpected Inputs

Authors: Nathan Young, Qiming Bao, Joshua Bensemann, Michael Witbrock

Abstract: Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a… ▽ More Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a group of natural language datasets designed to train and test generalisable abduction over natural-language knowledge bases. We use these datasets to finetune pretrained Transformers and discuss their performance, finding that our models learned generalisable abductive techniques but also learned to exploit the structure of our data. Finally, we discuss the viability of this approach to abductive reasoning and ways in which it may be improved in future work. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2201.04024 [pdf, other]

Smart Director: An Event-Driven Directing System for Live Broadcasting

Authors: Yingwei Pan, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, Tao Mei

Abstract: Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To reliev… ▽ More Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called "three-event" construction of sports broadcast, we build our system with an event-driven pipeline consisting of three consecutive novel components: 1) the Multi-view Event Localization to detect events by modeling multi-view correlations, 2) the Multi-view Highlight Detection to rank camera views by the visual importance for view selection, 3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

arXiv:2201.00616 [pdf, other]

doi 10.1142/S2705078521500156

Relating Blindsight and AI: A Review

Authors: Joshua Bensemann, Qiming Bao, Gaël Gendron, Tim Hartill, Michael Witbrock

Abstract: Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial network… ▽ More Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial networks have no form of visual experience, then deficits caused by blindsight give us insights into the processes occurring within visual experience that we can incorporate into artificial neural networks. This article has been structured into three parts. Section 2 is a review of blindsight research, looking specifically at the errors occurring during this condition compared to normal vision. Section 3 identifies overall patterns from Section 2 to generate insights for computational models of vision. Section 4 demonstrates the utility of examining biological research to inform artificial intelligence research by examining computation models of visual attention relevant to one of the insights generated in Section 3. The research covered in Section 4 shows that incorporating one of our insights into computational vision does benefit those models. Future research will be required to determine whether our other insights are as valuable. △ Less

Submitted 8 December, 2021; originally announced January 2022.

Comments: Preprint of an article published in Journal of Artificial Intelligence and Consciousness, 2021 doi.org/10.1142/S2705078521500156 \c{opyright} copyright World Scientific Publishing Company www.worldscientific.com/worldscinet/jaic

Journal ref: Journal of Artificial Intelligence and Consciousness, 1-15 (2021)

arXiv:2201.00466 [pdf, other]

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark

Authors: Zhuo Deng, Yuanhao Cai, Lu Chen, Zheng Gong, Qiqi Bao, Xue Yao, Dong Fang, Shaochong Zhang, Lan Ma

Abstract: Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real cli… ▽ More Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real clinical benchmark has not been explored for this task so far. In this paper, we investigate the real clinical fundus image restoration problem. Firstly, We establish a clinical dataset, Real Fundus (RF), including 120 low- and high-quality (HQ) image pairs. Then we propose a novel Transformer-based Generative Adversarial Network (RFormer) to restore the real degradation of clinical fundus images. The key component in our network is the Window-based Self-Attention Block (WSAB) which captures non-local self-similarity and long-range dependencies. To produce more visually pleasant results, a Transformer-based discriminator is introduced. Extensive experiments on our clinical benchmark show that the proposed RFormer significantly outperforms the state-of-the-art (SOTA) methods. In addition, experiments of downstream tasks such as vessel segmentation and optic disc/cup detection demonstrate that our proposed RFormer benefits clinical fundus image analysis and applications. The dataset, code, and models are publicly available at https://github.com/dengzhuo-AI/Real-Fundus △ Less

Submitted 3 August, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

Comments: IEEE J-BHI 2022; The First Benchmark and First Transformer-based Method for Real Clinical Fundus Image Restoration

arXiv:2112.08274 [pdf, other]

Putting People in their Place: Monocular Regression of 3D People in Depth

Authors: Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black

Abstract: Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several t… ▽ More Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code and dataset are released for research purposes. △ Less

Submitted 19 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: CVPR 2022; Code https://github.com/Arthur151/ROMP ; Dataset https://github.com/Arthur151/Relative_Human

arXiv:2111.10058 [pdf, other]

DeepQR: Neural-based Quality Ratings for Learnersourced Multiple-Choice Questions

Authors: Lin Ni, Qiming Bao, Xiaoxuan Li, Qianqian Qi, Paul Denny, Jim Warren, Michael Witbrock, Jiamou Liu

Abstract: Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a n… ▽ More Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a novel neural-network model for AQQR that is trained using multiple-choice-question (MCQ) datasets collected from PeerWise, a widely-used learnersourcing platform. Along with designing DeepQR, we investigate models based on explicitly-defined features, or semantic features, or both. We also introduce a self-attention mechanism to capture semantic correlations between MCQ components, and a contrastive-learning approach to acquire question representations using quality ratings. Extensive experiments on datasets collected from eight university-level courses illustrate that DeepQR has superior performance over six comparative models. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: EAAI 22

arXiv:2110.07872 [pdf, ps, other]

Role Similarity Metric Based on Spanning Rooted Forest

Authors: Qi Bao, Zhongzhi Zhang, Haibin Kan

Abstract: As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on larg… ▽ More As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in $O(k)$ time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to $O(ε^{-2}m\log^5{n}\log{\frac{1}ε})$ and $O(m\log^3{n})$, respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods. △ Less

Submitted 1 April, 2024; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures

arXiv:2108.13246 [pdf, other]

LUAI Challenge 2021 on Learning to Understand Aerial Images

Authors: Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang , et al. (11 additional authors not shown)

Abstract: This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This cha… ▽ More This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This challenge received a total of 146 registrations on the three tasks. Through the challenge, we hope to draw attention from a wide range of communities and call for more efforts on the problems of learning to understand aerial images. △ Less

Submitted 17 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures, accepted by ICCVW 2021

arXiv:2104.11536 [pdf, other]

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

Authors: Wu Liu, Qian Bao, Yu Sun, Tao Mei

Abstract: Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefited from the deep learning technologies, a significant amount of research efforts have greatly advanced the monocular human pose estimation both in 2D and 3D areas. Although there have been some works to summarize the different approaches… ▽ More Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefited from the deep learning technologies, a significant amount of research efforts have greatly advanced the monocular human pose estimation both in 2D and 3D areas. Although there have been some works to summarize the different approaches, it still remains challenging for researchers to have an in-depth view of how these approaches work. In this paper, we provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks. By systematically summarizing the differences and connections between these approaches, we further analyze the solutions for challenging cases, such as the lack of data, the inherent ambiguity between 2D and 3D, and the complex multi-person scenarios. We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches. Finally, we discuss the challenges and give deep thinking of promising directions for future research. We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2102.07087 [pdf, other]

Survey on Aerial Radio Access Networks: Toward a Comprehensive 6G Access Infrastructure

Authors: Nhu-Ngoc Dao, Quoc-Viet Pham, Ngo Hoang Tu, Tran Thien Thanh, Vo Nguyen Quoc Bao, Demeke Shumeye Lakew, Sungrae Cho

Abstract: Current network access infrastructures are characterized by heterogeneity, low latency, high throughput, and high computational capability, enabling massive concurrent connections and various services. Unfortunately, this design does not pay significant attention to mobile services in underserved areas. In this context, the use of aerial radio access networks (ARANs) is a promising strategy to com… ▽ More Current network access infrastructures are characterized by heterogeneity, low latency, high throughput, and high computational capability, enabling massive concurrent connections and various services. Unfortunately, this design does not pay significant attention to mobile services in underserved areas. In this context, the use of aerial radio access networks (ARANs) is a promising strategy to complement existing terrestrial communication systems. Involving airborne components such as unmanned aerial vehicles, drones, and satellites, ARANs can quickly establish a flexible access infrastructure on demand. ARANs are expected to support the development of seamless mobile communication systems toward a comprehensive sixth-generation (6G) global access infrastructure. This paper provides an overview of recent studies regarding ARANs in the literature. First, we investigate related work to identify areas for further exploration in terms of recent knowledge advancements and analyses. Second, we define the scope and methodology of this study. Then, we describe ARAN architecture and its fundamental features for the development of 6G networks. In particular, we analyze the system model from several perspectives, including transmission propagation, energy consumption, communication latency, and network mobility. Furthermore, we introduce technologies that enable the success of ARAN implementations in terms of energy replenishment, operational management, and data delivery. Subsequently, we discuss application scenarios envisioned for these technologies. Finally, we highlight ongoing research efforts and trends toward 6G ARANs. △ Less

Submitted 27 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted by the IEEE Communications Surveys and Tutorials

arXiv:2101.08143 [pdf, other]

doi 10.1145/3442381.3449812

Fast Evaluation for Relevant Quantities of Opinion Dynamics

Authors: Wanyue Xu, Qi Bao, Zhongzhi Zhang

Abstract: One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of… ▽ More One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of nodes. In this paper, by reducing the problem of computing relevant quantities to evaluating $\ell_2$ norms of some vectors, we present a nearly linear time algorithm to estimate all these quantities. Our algorithm is based on the Laplacian solvers, and has a proved theoretical guarantee of error for each quantity. We execute extensive numerical experiments on a variety of real networks, which demonstrate that our approximation algorithm is efficient and effective, scalable to large graphs having millions of nodes. △ Less

Submitted 12 June, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Journal ref: Proceedings of The Web Conference 2021, pp.2037-2045

arXiv:2012.13577 [pdf, other]

doi 10.1609/aaai.v36i10.21291

LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification

Authors: Jiangjie Chen, Qiaoben Bao, Changzhi Sun, Xinbo Zhang, Jiaze Chen, Hao Zhou, Yanghua Xiao, Lei Li

Abstract: Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. We decompose the verification of the whole claim at phrase-level, where the vera… ▽ More Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. We decompose the verification of the whole claim at phrase-level, where the veracity of the phrases serves as explanations and can be aggregated into the final verdict according to logical rules. The key insight of LOREN is to represent claim phrase veracity as three-valued latent variables, which are regularized by aggregation logical rules. The final claim verification is based on all latent variables. Thus, LOREN enjoys the additional benefit of interpretability -- it is easy to explain how it reaches certain results with claim phrase veracity. Experiments on a public fact verification benchmark show that LOREN is competitive against previous approaches while enjoying the merit of faithful and accurate interpretability. The resources of LOREN are available at: https://github.com/jiangjiechen/LOREN. △ Less

Submitted 9 December, 2021; v1 submitted 25 December, 2020; originally announced December 2020.

Comments: Accepted to AAAI 2022

arXiv:2012.04821 [pdf, ps, other]

Complex Relation Extraction: Challenges and Opportunities

Authors: Haiyun Jiang, Qiaoben Bao, Qiao Cheng, Deqing Yang, Li Wang, Yanghua Xiao

Abstract: Relation extraction aims to identify the target relations of entities in texts. Relation extraction is very important for knowledge base construction and text understanding. Traditional binary relation extraction, including supervised, semi-supervised and distant supervised ones, has been extensively studied and significant results are achieved. In recent years, many complex relation extraction ta… ▽ More Relation extraction aims to identify the target relations of entities in texts. Relation extraction is very important for knowledge base construction and text understanding. Traditional binary relation extraction, including supervised, semi-supervised and distant supervised ones, has been extensively studied and significant results are achieved. In recent years, many complex relation extraction tasks, i.e., the variants of simple binary relation extraction, are proposed to meet the complex applications in practice. However, there is no literature to fully investigate and summarize these complex relation extraction works so far. In this paper, we first report the recent progress in traditional simple binary relation extraction. Then we summarize the existing complex relation extraction tasks and present the definition, recent progress, challenges and opportunities for each task. △ Less

Submitted 8 December, 2020; originally announced December 2020.

Comments: 7 pages

arXiv:2010.14036 [pdf, other]

Synthetic Training for Monocular Human Mesh Recovery

Authors: Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei

Abstract: Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve th… ▽ More Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D projection. However, we observe that the commonly used weak-perspective model performs poorly in dealing with the external foreshortening effect of camera projection. Therefore, we propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision. The proposed method outperforms previous methods on the CMU Panoptic Studio dataset according to the evaluation results and achieves comparable results on the Human3.6M body and STB hand benchmarks. More impressively, the performance in close shot images gets significantly improved using the proposed D2S projection for weak supervision, while maintains obvious superiority in computational efficiency. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2008.12272 [pdf, other]

Monocular, One-stage, Regression of Multiple 3D People

Authors: Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei

Abstract: This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding… ▽ More This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression. △ Less

Submitted 16 September, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: ICCV 2021, Code https://github.com/Arthur151/ROMP

arXiv:2008.06285 [pdf, other]

Rb-PaStaNet: A Few-Shot Human-Object Interaction Detection Based on Rules and Part States

Authors: Shenyu Zhang, Zichen Zhu, Qingquan Bao

Abstract: Existing Human-Object Interaction (HOI) Detection approaches have achieved great progress on nonrare classes while rare HOI classes are still not well-detected. In this paper, we intend to apply human prior knowledge into the existing work. So we add human-labeled rules to PaStaNet and propose Rb-PaStaNet aimed at improving rare HOI classes detection. Our results show a certain improvement of the… ▽ More Existing Human-Object Interaction (HOI) Detection approaches have achieved great progress on nonrare classes while rare HOI classes are still not well-detected. In this paper, we intend to apply human prior knowledge into the existing work. So we add human-labeled rules to PaStaNet and propose Rb-PaStaNet aimed at improving rare HOI classes detection. Our results show a certain improvement of the rare classes, while the non-rare classes and the overall improvement is more considerable. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2006.13693 [pdf, other]

PECAIQR: A Model for Infectious Disease Applied to the Covid-19 Epidemic

Authors: Richard Bao, August Chen, Jethin Gowda, Shiva Mudide

Abstract: The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We us… ▽ More The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We used US county-level data on daily deaths and population statistics to forecast future deaths. We extended the SIR epidemiological model to a novel model we call the PECAIQR model. It adds several new variables and parameters to the naive SIR model by taking into account the ramifications of the partial quarantining implemented in the US. We fitted data to the model parameters with numerical integration. Because of the fit degeneracy in parameter space and non-constant nature of the parameters, we developed several methods to optimize our fit, such as training on the data tail and training on specific policy regimes. We use cross-validation to tune our hyper parameters at the county level and generate a CDF for future daily deaths. For predictions made from training data up to May 25th, we consistently obtained an averaged pinball loss score of 0.096 on a 14 day forecast. We finally present examples of possible avenues for utility from our model. We generate longer-time horizon predictions over various 1-month windows in the past, forecast how many medical resources such as ventilators and ICU beds will be needed in counties, and evaluate the efficacy of our model in other countries. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2002.03140 [pdf, other]

doi 10.1145/3373017.3373049

HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention

Authors: Qiming Bao, Lin Ni, Jiamou Liu

Abstract: This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collected from the Internet. HHH also implements a novel t… ▽ More This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collected from the Internet. HHH also implements a novel text representation and similarity deep learning model, Hierarchical BiLSTM Attention Model (HBAM), to find the most similar question from a large QA dataset. We compare HBAM with other state-of-the-art language models such as bidirectional encoder representation from transformers (BERT) and Manhattan LSTM Model (MaLSTM). We train and test the models with a subset of the Quora duplicate questions dataset in the medical area. The experimental results show that our model is able to achieve a superior performance than these existing methods. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Comments: 10 pages, 9 figures, 3 tables. Proceedings of the Australasian Computer Science Week Multiconference (ACSW 2020)

arXiv:1911.08045 [pdf, ps, other]

The k-Power Domination Number in Some Self-Similar Graphs

Authors: Yulun Xu, Qi Bao, Zhongzhi Zhang

Abstract: The $k$-power domination problem is a problem in graph theory, which has applications in many areas. However, it is hard to calculate the exact $k$-power domination number since determining k-power domination number of a generic graph is a NP-complete problem. We determine the exact $k$-power domination number in two graphs which have the same number of vertices and edges: pseudofractal scale-free… ▽ More The $k$-power domination problem is a problem in graph theory, which has applications in many areas. However, it is hard to calculate the exact $k$-power domination number since determining k-power domination number of a generic graph is a NP-complete problem. We determine the exact $k$-power domination number in two graphs which have the same number of vertices and edges: pseudofractal scale-free web and Sierpiński gasket. The $k$-power domination number becomes 1 for $k\ge2$ in the Sierpiński gasket, while the $k$-power domination number increases at an exponential rate with regard to the number of vertices in the pseudofractal scale-free web. The scale-free property may account for the difference in the behavior of two graphs. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Showing 1–50 of 55 results for author: Ba, Q