-
Scaling Test-time Compute for LLM Agents
Authors:
King Zhu,
Hanhao Li,
Siwei Wu,
Tianshun Xing,
Dehua Ma,
Xiangru Tang,
Minghao Liu,
Jian Yang,
Jiaheng Liu,
Yuchen Eleanor Jiang,
Changwang Zhang,
Chenghua Lin,
Jun Wang,
Ge Zhang,
Wangchunshu Zhou
Abstract:
Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sa…
▽ More
Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms; (2) sequential revision strategies; (3) verifiers and merging methods; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Global Stress Generation and Spatiotemporal Super-Resolution Physics-Informed Operator under Dynamic Loading for Two-Phase Random Materials
Authors:
Tengfei Xing,
Xiaodan Ren,
Jie Li
Abstract:
Material stress analysis is a critical aspect of material design and performance optimization. Under dynamic loading, the global stress evolution in materials exhibits complex spatiotemporal characteristics, especially in two-phase random materials (TRMs). Such kind of material failure is often associated with stress concentration, and the phase boundaries are key locations where stress concentrat…
▽ More
Material stress analysis is a critical aspect of material design and performance optimization. Under dynamic loading, the global stress evolution in materials exhibits complex spatiotemporal characteristics, especially in two-phase random materials (TRMs). Such kind of material failure is often associated with stress concentration, and the phase boundaries are key locations where stress concentration occurs. In practical engineering applications, the spatiotemporal resolution of acquired microstructural data and its dynamic stress evolution is often limited. This poses challenges for deep learning methods in generating high-resolution spatiotemporal stress fields, particularly for accurately capturing stress concentration regions. In this study, we propose a framework for global stress generation and spatiotemporal super-resolution in TRMs under dynamic loading. First, we introduce a diffusion model-based approach, named as Spatiotemporal Stress Diffusion (STS-diffusion), for generating global spatiotemporal stress data. This framework incorporates Space-Time U-Net (STU-net), and we systematically investigate the impact of different attention positions on model accuracy. Next, we develop a physics-informed network for spatiotemporal super-resolution, termed as Spatiotemporal Super-Resolution Physics-Informed Operator (ST-SRPINN). The proposed ST-SRPINN is an unsupervised learning method. The influence of data-driven and physics-informed loss function weights on model accuracy is explored in detail. Benefiting from physics-based constraints, ST-SRPINN requires only low-resolution stress field data during training and can upscale the spatiotemporal resolution of stress fields to arbitrary magnifications.
△ Less
Submitted 26 April, 2025;
originally announced May 2025.
-
Predicting Stress in Two-phase Random Materials and Super-Resolution Method for Stress Images by Embedding Physical Information
Authors:
Tengfei Xing,
Xiaodan Ren,
Jie Li
Abstract:
Stress analysis is an important part of material design. For materials with complex microstructures, such as two-phase random materials (TRMs), material failure is often accompanied by stress concentration. Phase interfaces in two-phase materials are critical for stress concentration. Therefore, the prediction error of stress at phase boundaries is crucial. In practical engineering, the pixels of…
▽ More
Stress analysis is an important part of material design. For materials with complex microstructures, such as two-phase random materials (TRMs), material failure is often accompanied by stress concentration. Phase interfaces in two-phase materials are critical for stress concentration. Therefore, the prediction error of stress at phase boundaries is crucial. In practical engineering, the pixels of the obtained material microstructure images are limited, which limits the resolution of stress images generated by deep learning methods, making it difficult to observe stress concentration regions. Existing Image Super-Resolution (ISR) technologies are all based on data-driven supervised learning. However, stress images have natural physical constraints, which provide new ideas for new ISR technologies. In this study, we constructed a stress prediction framework for TRMs. First, the framework uses a proposed Multiple Compositions U-net (MC U-net) to predict stress in low-resolution material microstructures. By considering the phase interface information of the microstructure, the MC U-net effectively reduces the problem of excessive prediction errors at phase boundaries. Secondly, a Mixed Physics-Informed Neural Network (MPINN) based method for stress ISR (SRPINN) was proposed. By introducing the constraints of physical information, the new method does not require paired stress images for training and can increase the resolution of stress images to any multiple. This enables a multiscale analysis of the stress concentration regions at phase boundaries. Finally, we performed stress analysis on TRMs with different phase volume fractions and loading states through transfer learning. The results show the proposed stress prediction framework has satisfactory accuracy and generalization ability.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
MobiVital: Self-supervised Time-series Quality Estimation for Contactless Respiration Monitoring Using UWB Radar
Authors:
Ziqi Wang,
Derek Hua,
Wenjun Jiang,
Tianwei Xing,
Xun Chen,
Mani Srivastava
Abstract:
Respiration waveforms are increasingly recognized as important biomarkers, offering insights beyond simple respiration rates, such as detecting breathing irregularities for disease diagnosis or monitoring breath patterns to guide rehabilitation training. Previous works in wireless respiration monitoring have primarily focused on estimating respiration rate, where the breath waveforms are often gen…
▽ More
Respiration waveforms are increasingly recognized as important biomarkers, offering insights beyond simple respiration rates, such as detecting breathing irregularities for disease diagnosis or monitoring breath patterns to guide rehabilitation training. Previous works in wireless respiration monitoring have primarily focused on estimating respiration rate, where the breath waveforms are often generated as a by-product. As a result, issues such as waveform deformation and inversion have largely been overlooked, reducing the signal's utility for applications requiring breathing waveforms. To address this problem, we present a novel approach, MobiVital, that improves the quality of respiration waveforms obtained from ultra-wideband (UWB) radar data. MobiVital combines a self-supervised autoregressive model for breathing waveform extraction with a biology-informed algorithm to detect and correct waveform inversions. To encourage reproducible research efforts for developing wireless vital signal monitoring systems, we also release a 12-person, 24-hour UWB radar vital signal dataset, with time-synchronized ground truth obtained from wearable sensors. Our results show that the respiration waveforms produced by our system exhibit a 7-34% increase in fidelity to the ground truth compared to the baselines and can benefit downstream tasks such as respiration rate estimation.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction
Authors:
Peizhi Tang,
Chuang Yang,
Tong Xing,
Xiaohang Xu,
Renhe Jiang,
Kaoru Sezaki
Abstract:
Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama-3-8B-Mob, a large language model fine-tuned w…
▽ More
Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama-3-8B-Mob, a large language model fine-tuned with instruction tuning, for long-term citywide mobility prediction -- in a Q&A manner. We validate our approach using large-scale human mobility data from four metropolitan areas in Japan, focusing on predicting individual trajectories over the next 15 days. The results demonstrate that Llama-3-8B-Mob excels in modeling long-term human mobility -- surpassing the state-of-the-art on multiple prediction metrics. It also displays strong zero-shot generalization capabilities -- effectively generalizing to other cities even when fine-tuned only on limited samples from a single city. Source codes are available at https://github.com/TANGHULU6/Llama3-8B-Mob.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Multi-source Domain Adaptation for Panoramic Semantic Segmentation
Authors:
Jing Jiang,
Sicheng Zhao,
Jiankun Zhu,
Wenbo Tang,
Zhaopan Xu,
Jidong Yang,
Guoping Liu,
Tengfei Xing,
Pengfei Xu,
Hongxun Yao
Abstract:
Unsupervised domain adaptation methods for panoramic semantic segmentation utilize real pinhole images or low-cost synthetic panoramic images to transfer segmentation models to real panoramic images. However, these methods struggle to understand the panoramic structure using only real pinhole images and lack real-world scene perception with only synthetic panoramic images. Therefore, in this paper…
▽ More
Unsupervised domain adaptation methods for panoramic semantic segmentation utilize real pinhole images or low-cost synthetic panoramic images to transfer segmentation models to real panoramic images. However, these methods struggle to understand the panoramic structure using only real pinhole images and lack real-world scene perception with only synthetic panoramic images. Therefore, in this paper, we propose a new task, Multi-source Domain Adaptation for Panoramic Semantic Segmentation (MSDA4PASS), which leverages both real pinhole and synthetic panoramic images to improve segmentation on unlabeled real panoramic images. There are two key issues in the MSDA4PASS task: (1) distortion gaps between the pinhole and panoramic domains -- panoramic images exhibit global and local distortions absent in pinhole images; (2) texture gaps between the source and target domains -- scenes and styles differ across domains. To address these two issues, we propose a novel framework, Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which converts all pinhole images in the source domains into distorted images and aligns the source distorted and panoramic images with the target panoramic images. Specifically, DTA4PASS consists of two main components: Unpaired Semantic Morphing (USM) and Distortion Gating Alignment (DGA). First, in USM, the Dual-view Discriminator (DvD) assists in training the diffeomorphic deformation network at the image and pixel level, enabling the effective deformation transformation of pinhole images without paired panoramic views, alleviating distortion gaps. Second, DGA assigns pinhole-like (pin-like) and panoramic-like (pan-like) features to each image by gating, and aligns these two features through uncertainty estimation, reducing texture gaps.
△ Less
Submitted 7 January, 2025; v1 submitted 29 August, 2024;
originally announced August 2024.
-
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
Authors:
Zhongyu Yang,
Mai Liu,
Jinluo Xie,
Yueming Zhang,
Chen Shen,
Wei Shao,
Jichao Jiao,
Tengfei Xing,
Runbo Hu,
Pengfei Xu
Abstract:
Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and…
▽ More
Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and use multi-task heads to delineate road centerlines, boundary lines, pedestrian crossings, and other areas. However, these algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded. Therefore, in this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue. We employed map encoder pre-training to enhance the network's geometric encoding capabilities and utilized YOLOX to improve traffic element detection precision. Additionally, for area detection, we innovatively introduced LDTR and auxiliary tasks to achieve higher precision. As a result, our final OLUS score is 0.58.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
LDTR: Transformer-based Lane Detection with Anchor-chain Representation
Authors:
Zhongyu Yang,
Chen Shen,
Wei Shao,
Tengfei Xing,
Runbo Hu,
Pengfei Xu,
Hua Chai,
Ruini Xue
Abstract:
Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model…
▽ More
Despite recent advances in lane detection methods, scenarios with limited- or no-visual-clue of lanes due to factors such as lighting conditions and occlusion remain challenging and crucial for automated driving. Moreover, current lane representations require complex post-processing and struggle with specific instances. Inspired by the DETR architecture, we propose LDTR, a transformer-based model to address these issues. Lanes are modeled with a novel anchor-chain, regarding a lane as a whole from the beginning, which enables LDTR to handle special lanes inherently. To enhance lane instance perception, LDTR incorporates a novel multi-referenced deformable attention module to distribute attention around the object. Additionally, LDTR incorporates two line IoU algorithms to improve convergence efficiency and employs a Gaussian heatmap auxiliary branch to enhance model representation capability during training. To evaluate lane detection models, we rely on Frechet distance, parameterized F1-score, and additional synthetic metrics. Experimental results demonstrate that LDTR achieves state-of-the-art performance on well-known datasets.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Learning to Compress Prompt in Natural Language Formats
Authors:
Yu-Neng Chuang,
Tianwei Xing,
Chia-Yuan Chang,
Zirui Liu,
Xun Chen,
Xia Hu
Abstract:
Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on com…
▽ More
Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.
△ Less
Submitted 1 April, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Application of AI in Nutrition
Authors:
Ritu Ramakrishnan,
Tianxiang Xing,
Tianfeng Chen,
Ming-Hao Lee,
Jinzhu Gao
Abstract:
In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health ex…
▽ More
In healthcare, artificial intelligence (AI) has been changing the way doctors and health experts take care of people. This paper will cover how AI is making major changes in the health care system, especially with nutrition. Various machine learning and deep learning algorithms have been developed to extract valuable information from healthcare data which help doctors, nutritionists, and health experts to make better decisions and make our lifestyle healthy. This paper provides an overview of the current state of AI applications in healthcare with a focus on the utilization of AI-driven recommender systems in nutrition. It will discuss the positive outcomes and challenges that arise when AI is used in this field. This paper addresses the challenges to develop AI recommender systems in healthcare, providing a well-rounded perspective on the complexities. Real-world examples and research findings are presented to underscore the tangible and significant impact AI recommender systems have in the field of healthcare, particularly in nutrition. The ongoing efforts of applying AI in nutrition lay the groundwork for a future where personalized recommendations play a pivotal role in guiding individuals toward healthier lifestyles.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Dataset Distillation via the Wasserstein Metric
Authors:
Haoyang Liu,
Yijiang Li,
Tiancheng Xing,
Vibhu Dalal,
Luwei Li,
Jingrui He,
Haohan Wang
Abstract:
Dataset Distillation (DD) emerges as a powerful strategy to encapsulate the expansive information of large datasets into significantly smaller, synthetic equivalents, thereby preserving model performance with reduced computational overhead. Pursuing this objective, we introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in DD. Our appr…
▽ More
Dataset Distillation (DD) emerges as a powerful strategy to encapsulate the expansive information of large datasets into significantly smaller, synthetic equivalents, thereby preserving model performance with reduced computational overhead. Pursuing this objective, we introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in DD. Our approach employs the Wasserstein barycenter to provide a geometrically meaningful method for quantifying distribution differences and capturing the centroid of distribution sets efficiently. By embedding synthetic data in the feature spaces of pretrained classification models, we facilitate effective distribution matching that leverages prior knowledge inherent in these models. Our method not only maintains the computational advantages of distribution matching-based techniques but also achieves new state-of-the-art performance across a range of high-resolution datasets. Extensive testing demonstrates the effectiveness and adaptability of our method, underscoring the untapped potential of Wasserstein metrics in dataset distillation.
△ Less
Submitted 15 March, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection
Authors:
Zhongyu Yang,
Chen Shen,
Wei Shao,
Tengfei Xing,
Runbo Hu,
Pengfei Xu,
Hua Chai,
Ruini Xue
Abstract:
Lane detection is challenging due to the complicated on road scenarios and line deformation from different camera perspectives. Lots of solutions were proposed, but can not deal with corner lanes well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANET. A lane instance is first responded by the heat-map on the U-shaped curved guide line at glob…
▽ More
Lane detection is challenging due to the complicated on road scenarios and line deformation from different camera perspectives. Lots of solutions were proposed, but can not deal with corner lanes well. To address this problem, this paper proposes a new top-down deep learning lane detection approach, CANET. A lane instance is first responded by the heat-map on the U-shaped curved guide line at global semantic level, thus the corresponding features of each lane are aggregated at the response point. Then CANET obtains the heat-map response of the entire lane through conditional convolution, and finally decodes the point set to describe lanes via adaptive decoder. The experimental results show that CANET reaches SOTA in different metrics. Our code will be released soon.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
S4OD: Semi-Supervised learning for Single-Stage Object Detection
Authors:
Yueming Zhang,
Xingxu Yao,
Chao Liu,
Feng Chen,
Xiaolin Song,
Tengfei Xing,
Runbo Hu,
Hua Chai,
Pengfei Xu,
Guoshan Zhang
Abstract:
Single-stage detectors suffer from extreme foreground-background class imbalance, while two-stage detectors do not. Therefore, in semi-supervised object detection, two-stage detectors can deliver remarkable performance by only selecting high-quality pseudo labels based on classification scores. However, directly applying this strategy to single-stage detectors would aggravate the class imbalance w…
▽ More
Single-stage detectors suffer from extreme foreground-background class imbalance, while two-stage detectors do not. Therefore, in semi-supervised object detection, two-stage detectors can deliver remarkable performance by only selecting high-quality pseudo labels based on classification scores. However, directly applying this strategy to single-stage detectors would aggravate the class imbalance with fewer positive samples. Thus, single-stage detectors have to consider both quality and quantity of pseudo labels simultaneously. In this paper, we design a dynamic self-adaptive threshold (DSAT) strategy in classification branch, which can automatically select pseudo labels to achieve an optimal trade-off between quality and quantity. Besides, to assess the regression quality of pseudo labels in single-stage detectors, we propose a module to compute the regression uncertainty of boxes based on Non-Maximum Suppression. By leveraging only 10% labeled data from COCO, our method achieves 35.0% AP on anchor-free detector (FCOS) and 32.9% on anchor-based detector (RetinaNet).
△ Less
Submitted 9 April, 2022;
originally announced April 2022.
-
3rd Place Solution for VisDA 2021 Challenge -- Universally Domain Adaptive Image Recognition
Authors:
Haojin Liao,
Xiaolin Song,
Sicheng Zhao,
Shanghang Zhang,
Xiangyu Yue,
Xingxu Yao,
Yueming Zhang,
Tengfei Xing,
Pengfei Xu,
Qiang Wang
Abstract:
The Visual Domain Adaptation (VisDA) 2021 Challenge calls for unsupervised domain adaptation (UDA) methods that can deal with both input distribution shift and label set variance between the source and target domains. In this report, we introduce a universal domain adaptation (UniDA) method by aggregating several popular feature extraction and domain adaptation schemes. First, we utilize VOLO, a T…
▽ More
The Visual Domain Adaptation (VisDA) 2021 Challenge calls for unsupervised domain adaptation (UDA) methods that can deal with both input distribution shift and label set variance between the source and target domains. In this report, we introduce a universal domain adaptation (UniDA) method by aggregating several popular feature extraction and domain adaptation schemes. First, we utilize VOLO, a Transformer-based architecture with state-of-the-art performance in several visual tasks, as the backbone to extract effective feature representations. Second, we modify the open-set classifier of OVANet to recognize the unknown class with competitive accuracy and robustness. As shown in the leaderboard, our proposed UniDA method ranks the 3rd place with 48.49% ACC and 70.8% AUROC in the VisDA 2021 Challenge.
△ Less
Submitted 27 February, 2025; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Using DeepProbLog to perform Complex Event Processing on an Audio Stream
Authors:
Marc Roig Vilamala,
Tianwei Xing,
Harrison Taylor,
Luis Garcia,
Mani Srivastava,
Lance Kaplan,
Alun Preece,
Angelika Kimmig,
Federico Cerutti
Abstract:
In this paper, we present an approach to Complex Event Processing (CEP) that is based on DeepProbLog. This approach has the following objectives: (i) allowing the use of subsymbolic data as an input, (ii) retaining the flexibility and modularity on the definitions of complex event rules, (iii) allowing the system to be trained in an end-to-end manner and (iv) being robust against noisily labelled…
▽ More
In this paper, we present an approach to Complex Event Processing (CEP) that is based on DeepProbLog. This approach has the following objectives: (i) allowing the use of subsymbolic data as an input, (ii) retaining the flexibility and modularity on the definitions of complex event rules, (iii) allowing the system to be trained in an end-to-end manner and (iv) being robust against noisily labelled data. Our approach makes use of DeepProbLog to create a neuro-symbolic architecture that combines a neural network to process the subsymbolic data with a probabilistic logic layer to allow the user to define the rules for the complex events. We demonstrate that our approach is capable of detecting complex events from an audio stream. We also demonstrate that our approach is capable of training even with a dataset that has a moderate proportion of noisy data.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
2nd Place Solution for Waymo Open Dataset Challenge -- Real-time 2D Object Detection
Authors:
Yueming Zhang,
Xiaolin Song,
Bing Bai,
Tengfei Xing,
Chao Liu,
Xin Gao,
Zhihui Wang,
Yawei Wen,
Haojin Liao,
Guoshan Zhang,
Pengfei Xu
Abstract:
In an autonomous driving system, it is essential to recognize vehicles, pedestrians and cyclists from images. Besides the high accuracy of the prediction, the requirement of real-time running brings new challenges for convolutional network models. In this report, we introduce a real-time method to detect the 2D objects from images. We aggregate several popular one-stage object detectors and train…
▽ More
In an autonomous driving system, it is essential to recognize vehicles, pedestrians and cyclists from images. Besides the high accuracy of the prediction, the requirement of real-time running brings new challenges for convolutional network models. In this report, we introduce a real-time method to detect the 2D objects from images. We aggregate several popular one-stage object detectors and train the models of variety input strategies independently, to yield better performance for accurate multi-scale detection of each category, especially for small objects. For model acceleration, we leverage TensorRT to optimize the inference time of our detection pipeline. As shown in the leaderboard, our proposed detection framework ranks the 2nd place with 75.00% L1 mAP and 69.72% L2 mAP in the real-time 2D detection track of the Waymo Open Dataset Challenges, while our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Spatially Correlated Patterns in Adversarial Images
Authors:
Nandish Chattopadhyay,
Lionell Yip En Zhi,
Bryan Tan Bing Xing,
Anupam Chattopadhyay
Abstract:
Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and stu…
▽ More
Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and studied spatially co-located patterns in the distribution of pixels in the input space. In this paper, we propose a framework for segregating and isolating regions within an input image which are particularly critical towards either classification (during inference), or adversarial vulnerability or both. We assert that during inference, the trained model looks at a specific region in the image, which we call Region of Importance (RoI); and the attacker looks at a region to alter/modify, which we call Region of Attack (RoA). The success of this approach could also be used to design a post-hoc adversarial defence method, as illustrated by our observations. This uses the notion of blocking out (we call neutralizing) that region of the image which is highly vulnerable to adversarial attacks but is not important for the task of classification. We establish the theoretical setup for formalising the process of segregation, isolation and neutralization and substantiate it through empirical analysis on standard benchmarking datasets. The findings strongly indicate that mapping features into the input space preserves the significant patterns typically observed in the feature-space while adding major interpretability and therefore simplifies potential defensive mechanisms.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
An Experimentation Platform for Explainable Coalition Situational Understanding
Authors:
Katie Barrett-Powell,
Jack Furby,
Liam Hiley,
Marc Roig Vilamala,
Harrison Taylor,
Federico Cerutti,
Alun Preece,
Tianwei Xing,
Luis Garcia,
Mani Srivastava,
Dave Braines
Abstract:
We present an experimentation platform for coalition situational understanding research that highlights capabilities in explainable artificial intelligence/machine learning (AI/ML) and integration of symbolic and subsymbolic AI/ML approaches for event processing. The Situational Understanding Explorer (SUE) platform is designed to be lightweight, to easily facilitate experiments and demonstrations…
▽ More
We present an experimentation platform for coalition situational understanding research that highlights capabilities in explainable artificial intelligence/machine learning (AI/ML) and integration of symbolic and subsymbolic AI/ML approaches for event processing. The Situational Understanding Explorer (SUE) platform is designed to be lightweight, to easily facilitate experiments and demonstrations, and open. We discuss our requirements to support coalition multi-domain operations with emphasis on asset interoperability and ad hoc human-machine teaming in a dense urban terrain setting. We describe the interface functionality and give examples of SUE applied to coalition situational understanding tasks.
△ Less
Submitted 9 November, 2020; v1 submitted 27 October, 2020;
originally announced October 2020.
-
A Hybrid Neuro-Symbolic Approach for Complex Event Processing
Authors:
Marc Roig Vilamala,
Harrison Taylor,
Tianwei Xing,
Luis Garcia,
Mani Srivastava,
Lance Kaplan,
Alun Preece,
Angelika Kimmig,
Federico Cerutti
Abstract:
Training a model to detect patterns of interrelated events that form situations of interest can be a complex problem: such situations tend to be uncommon, and only sparse data is available. We propose a hybrid neuro-symbolic architecture based on Event Calculus that can perform Complex Event Processing (CEP). It leverages both a neural network to interpret inputs and logical rules that express the…
▽ More
Training a model to detect patterns of interrelated events that form situations of interest can be a complex problem: such situations tend to be uncommon, and only sparse data is available. We propose a hybrid neuro-symbolic architecture based on Event Calculus that can perform Complex Event Processing (CEP). It leverages both a neural network to interpret inputs and logical rules that express the pattern of the complex event. Our approach is capable of training with much fewer labelled data than a pure neural network approach, and to learn to classify individual events even when training in an end-to-end manner. We demonstrate this comparing our approach against a pure neural network approach on a dataset based on Urban Sounds 8K.
△ Less
Submitted 13 October, 2020; v1 submitted 7 September, 2020;
originally announced September 2020.
-
An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos
Authors:
Sicheng Zhao,
Yunsheng Ma,
Yang Gu,
Jufeng Yang,
Tengfei Xing,
Pengfei Xu,
Runbo Hu,
Hua Chai,
Kurt Keutzer
Abstract:
Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Vis…
▽ More
Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.
△ Less
Submitted 12 February, 2020;
originally announced March 2020.
-
Time-Series Anomaly Detection Service at Microsoft
Authors:
Hansheng Ren,
Bixiong Xu,
Yujing Wang,
Chao Yi,
Congrui Huang,
Xiaoyu Kou,
Tony Xing,
Mao Yang,
Jie Tong,
Qi Zhang
Abstract:
Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which…
▽ More
Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
Authors:
Jeng-Hau Lin,
Tianwei Xing,
Ritchie Zhao,
Zhiru Zhang,
Mani Srivastava,
Zhuowen Tu,
Rajesh K. Gupta
Abstract:
State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we push the boundaries of hardware-effective CNN design by proposin…
▽ More
State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we push the boundaries of hardware-effective CNN design by proposing BCNN with Separable Filters (BCNNw/SF), which applies Singular Value Decomposition (SVD) on BCNN kernels to further reduce computational and storage complexity. To enable its implementation, we provide a closed form of the gradient over SVD to calculate the exact gradient with respect to every binarized weight in backward propagation. We verify BCNNw/SF on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of 17% and execution time reduction of 31.3% compared to BCNN with only minor accuracy sacrifices.
△ Less
Submitted 15 July, 2017;
originally announced July 2017.
-
Personalized Course Sequence Recommendations
Authors:
Jie Xu,
Tianwei Xing,
Mihaela van der Schaar
Abstract:
Given the variability in student learning it is becoming increasingly important to tailor courses as well as course sequences to student needs. This paper presents a systematic methodology for offering personalized course sequence recommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally select course sequences to decrease the time required f…
▽ More
Given the variability in student learning it is becoming increasingly important to tailor courses as well as course sequences to student needs. This paper presents a systematic methodology for offering personalized course sequence recommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally select course sequences to decrease the time required for a student to graduate. The algorithm accounts for prerequisite requirements (typically present in higher level education) and course availability. Second, using the tools of multi-armed bandits, an algorithm is developed that can optimally recommend a course sequence that both reduces the time to graduate while also increasing the overall GPA of the student. The algorithm dynamically learns how students with different contextual backgrounds perform for given course sequences and then recommends an optimal course sequence for new students. Using real-world student data from the UCLA Mechanical and Aerospace Engineering department, we illustrate how the proposed algorithms outperform other methods that do not include student contextual information when making course sequence recommendations.
△ Less
Submitted 11 January, 2016; v1 submitted 30 December, 2015;
originally announced December 2015.