Search | arXiv e-print repository

Triadic Novelty: A Typology and Measurement Framework for Recognizing Novel Contributions in Science

Authors: Jin Ai, Richard S. Steinberg, Chao Guo, Filipi Nascimento Silva

Abstract: Scientific progress depends on novel ideas, but current reward systems often fail to recognize them. Many existing metrics conflate novelty with popularity, privileging ideas that fit existing paradigms over those that challenge them. This study develops a theory-driven framework to better understand how different types of novelty emerge, take hold, and receive recognition. Drawing on network scie… ▽ More Scientific progress depends on novel ideas, but current reward systems often fail to recognize them. Many existing metrics conflate novelty with popularity, privileging ideas that fit existing paradigms over those that challenge them. This study develops a theory-driven framework to better understand how different types of novelty emerge, take hold, and receive recognition. Drawing on network science and theories of discovery, we introduce a triadic typology: Pioneers, who introduce entirely new topics; Mavericks, who recombine distant concepts; and Vanguards, who reinforce weak but promising connections. We apply this typology to a dataset of 41,623 articles in the interdisciplinary field of philanthropy and nonprofit studies, linking novelty types to five-year citation counts using mixed-effects negative binomial regression. Results show that novelty is not uniformly rewarded. Pioneer efforts are foundational but often overlooked. Maverick novelty shows consistent citation benefits, particularly rewarded when it displaces prior focus. Vanguard novelty is more likely to gain recognition when it strengthens weakly connected topics, but its citation advantage diminishes as those reinforced nodes become more central. To enable fair comparison across time and domains, we introduce a simulated baseline model. These findings improve the evaluation of innovations, affecting science policy, funding, and institutional assessment practices. △ Less

Submitted 25 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

Comments: 27 pages, 3 figures, 5 tables

arXiv:2506.15675 [pdf, ps, other]

Sekai: A Video Dataset towards World Exploration

Authors: Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang

Abstract: Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai… ▽ More Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning ``world'' in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Experiments demonstrate the quality of the dataset. And, we use a subset to train an interactive video world exploration model, named YUME (meaning ``dream'' in Japanese). We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications. The project page is https://lixsp11.github.io/sekai-project/. △ Less

Submitted 20 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

Comments: 12 pages, 6 figures

arXiv:2506.09427 [pdf, other]

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Authors: Yukang Feng, Jianwen Sun, Chuanhao Li, Zizhen Li, Jiaxin Ai, Fanrui Zhang, Yifan Chang, Sizhuo Zhou, Shenglin Zhang, Yu Dai, Kaipeng Zhang

Abstract: Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed us… ▽ More Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed using our Self-Evaluation with Iterative Refinement (SEIR) method. InterSyn features multi-turn, instruction-driven dialogues with tightly interleaved imagetext responses, providing rich object diversity and rigorous automated quality refinement, making it well-suited for training next-generation instruction-following LMMs. Furthermore, to address the lack of reliable evaluation tools capable of assessing interleaved multimodal outputs, we introduce SynJudge, an automatic evaluation model designed to quantitatively assess multimodal outputs along four dimensions: text content, image content, image quality, and image-text synergy. Experimental studies show that the SEIR method leads to substantially higher dataset quality compared to an otherwise identical process without refinement. Moreover, LMMs trained on InterSyn achieve uniform performance gains across all evaluation metrics, confirming InterSyn's utility for advancing multimodal systems. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2506.08849 [pdf, ps, other]

Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

Authors: Jingguo Qu, Xinyang Han, Tonghuan Xiao, Jia Ai, Juan Wu, Tong Zhao, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

Abstract: Medical ultrasonography is an essential imaging technique for examining superficial organs and tissues, including lymph nodes, breast, and thyroid. It employs high-frequency ultrasound waves to generate detailed images of the internal structures of the human body. However, manually contouring regions of interest in these images is a labor-intensive task that demands expertise and often results in… ▽ More Medical ultrasonography is an essential imaging technique for examining superficial organs and tissues, including lymph nodes, breast, and thyroid. It employs high-frequency ultrasound waves to generate detailed images of the internal structures of the human body. However, manually contouring regions of interest in these images is a labor-intensive task that demands expertise and often results in inconsistent interpretations among individuals. Vision-language foundation models, which have excelled in various computer vision applications, present new opportunities for enhancing ultrasound image analysis. Yet, their performance is hindered by the significant differences between natural and medical imaging domains. This research seeks to overcome these challenges by developing domain adaptation methods for vision-language foundation models. In this study, we explore the fine-tuning pipeline for vision-language foundation models by utilizing large language model as text refiner with special-designed adaptation strategies and task-driven heads. Our approach has been extensively evaluated on six ultrasound datasets and two tasks: segmentation and classification. The experimental results show that our method can effectively improve the performance of vision-language foundation models for ultrasound image analysis, and outperform the existing state-of-the-art vision-language and pure foundation models. The source code of this study is available at https://github.com/jinggqu/NextGen-UIA. △ Less

Submitted 10 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

arXiv:2505.22126 [pdf, ps, other]

SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model

Authors: Yifan Chang, Yukang Feng, Jianwen Sun, Jiaxin Ai, Chuanhao Li, S. Kevin Zhou, Kaipeng Zhang

Abstract: Recent years have seen rapid advances in AI-driven image generation. Early diffusion models emphasized perceptual quality, while newer multimodal models like GPT-4o-image integrate high-level reasoning, improving semantic understanding and structural composition. Scientific illustration generation exemplifies this evolution: unlike general image synthesis, it demands accurate interpretation of tec… ▽ More Recent years have seen rapid advances in AI-driven image generation. Early diffusion models emphasized perceptual quality, while newer multimodal models like GPT-4o-image integrate high-level reasoning, improving semantic understanding and structural composition. Scientific illustration generation exemplifies this evolution: unlike general image synthesis, it demands accurate interpretation of technical content and transformation of abstract ideas into clear, standardized visuals. This task is significantly more knowledge-intensive and laborious, often requiring hours of manual work and specialized tools. Automating it in a controllable, intelligent manner would provide substantial practical value. Yet, no benchmark currently exists to evaluate AI on this front. To fill this gap, we introduce SridBench, the first benchmark for scientific figure generation. It comprises 1,120 instances curated from leading scientific papers across 13 natural and computer science disciplines, collected via human experts and MLLMs. Each sample is evaluated along six dimensions, including semantic fidelity and structural accuracy. Experimental results reveal that even top-tier models like GPT-4o-image lag behind human performance, with common issues in text/visual clarity and scientific correctness. These findings highlight the need for more advanced reasoning-driven visual generation capabilities. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2504.14952 [pdf, other]

PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV

Authors: Qianyu Zhu, Junjie Wang, Jeremiah Hu, Jia Ai, Yong Lee

Abstract: Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To r… ▽ More Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To reduce the special noise step-by-step, we employ a denoising diffusion model~(FlowDiffuser) for PIV analysis. And the data-hungry iterative denoising diffusion model is trained via a transfer learning strategy, resulting in our PIV-FlowDiffuser method. Specifically, (1) pre-training a FlowDiffuser model with multiple optical flow datasets of the computer vision community, such as Sintel, KITTI, etc; (2) fine-tuning the pre-trained model on synthetic PIV datasets. Note that the PIV images are upsampled by a factor of two to resolve the small-scale turbulent flow structures. The visualized results indicate that our PIV-FlowDiffuser effectively suppresses the noise patterns. Therefore, the denoising diffusion model reduces the average end-point error~($AEE$) by 59.4% over RAFT256-PIV baseline on the classic Cai's dataset. Besides, PIV-FlowDiffuser exhibits enhanced generalization performance on unseen particle images due to transfer learning. Overall, this study highlights the transfer-learning-based denoising diffusion models for PIV. And a detailed implementation is recommended for interested readers in the repository https://github.com/Zhu-Qianyu/PIV-FlowDiffuser. △ Less

Submitted 21 April, 2025; originally announced April 2025.

arXiv:2504.07634 [pdf, other]

Agent That Debugs: Dynamic State-Guided Vulnerability Repair

Authors: Zhengyao Liu, Yunlong Ma, Jingxuan Xu, Junchen Ai, Xiang Gao, Hailong Sun, Abhik Roychoudhury

Abstract: In recent years, more vulnerabilities have been discovered every day, while manual vulnerability repair requires specialized knowledge and is time-consuming. As a result, many detected or even published vulnerabilities remain unpatched, thereby increasing the exposure of software systems to attacks. Recent advancements in agents based on Large Language Models have demonstrated their increasing cap… ▽ More In recent years, more vulnerabilities have been discovered every day, while manual vulnerability repair requires specialized knowledge and is time-consuming. As a result, many detected or even published vulnerabilities remain unpatched, thereby increasing the exposure of software systems to attacks. Recent advancements in agents based on Large Language Models have demonstrated their increasing capabilities in code understanding and generation, which can be promising to achieve automated vulnerability repair. However, the effectiveness of agents based on static information retrieval is still not sufficient for patch generation. To address the challenge, we propose a program repair agent called VulDebugger that fully utilizes both static and dynamic context, and it debugs programs in a manner akin to humans. The agent inspects the actual state of the program via the debugger and infers expected states via constraints that need to be satisfied. By continuously comparing the actual state with the expected state, it deeply understands the root causes of the vulnerabilities and ultimately accomplishes repairs. We experimentally evaluated VulDebugger on 50 real-life projects. With 60.00% successfully fixed, VulDebugger significantly outperforms state-of-the-art approaches for vulnerability repair. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.05782 [pdf, other]

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Authors: Pengfei Zhou, Fanrui Zhang, Xiaopeng Peng, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You, Kaipeng Zhang

Abstract: Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited da… ▽ More Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we introduce MDK12-Bench, a multi-disciplinary benchmark assessing the reasoning capabilities of MLLMs via real-world K-12 examinations. Spanning six disciplines (math, physics, chemistry, biology, geography, and information science), our benchmark comprises 140K reasoning instances across diverse difficulty levels from primary school to 12th grade. It features 6,827 instance-level knowledge point annotations based on a well-organized knowledge structure, detailed answer explanations, difficulty labels and cross-year partitions, providing a robust platform for comprehensive evaluation. Additionally, we present a novel dynamic evaluation framework to mitigate data contamination issues by bootstrapping question forms, question types, and image styles during evaluation. Extensive experiment on MDK12-Bench reveals the significant limitation of current MLLMs in multimodal reasoning. The findings on our benchmark provide insights into the development of the next-generation models. Our data and codes are available at https://github.com/LanceZPF/MDK12. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 11 pages, 8 figures

arXiv:2503.12545 [pdf, other]

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

Authors: Zhaopan Xu, Pengfei Zhou, Weidong Tang, Jiaxin Ai, Wangbo Zhao, Xiaojiang Peng, Kai Wang, Yang You, Wenqi Shao, Hongxun Yao, Kaipeng Zhang

Abstract: In recent years, Multimodal Large Language Models (MLLMs) have demonstrated remarkable advancements in tasks such as visual question answering, visual understanding, and reasoning. However, this impressive progress relies on vast amounts of data collected from the internet, raising significant concerns about privacy and security. To address these issues, machine unlearning (MU) has emerged as a pr… ▽ More In recent years, Multimodal Large Language Models (MLLMs) have demonstrated remarkable advancements in tasks such as visual question answering, visual understanding, and reasoning. However, this impressive progress relies on vast amounts of data collected from the internet, raising significant concerns about privacy and security. To address these issues, machine unlearning (MU) has emerged as a promising solution, enabling the removal of specific knowledge from an already trained model without requiring retraining from scratch. Although MU for MLLMs has gained attention, current evaluations of its efficacy remain incomplete, and the underlying problem is often poorly defined, which hinders the development of strategies for creating more secure and trustworthy systems. To bridge this gap, we introduce a benchmark, named PEBench, which includes a dataset of personal entities and corresponding general event scenes, designed to comprehensively assess the performance of MU for MLLMs. Through PEBench, we aim to provide a standardized and robust framework to advance research in secure and privacy-preserving multimodal models. We benchmarked 6 MU methods, revealing their strengths and limitations, and shedding light on key challenges and opportunities for MU in MLLMs. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2503.12505 [pdf, other]

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

Authors: Zhaopan Xu, Pengfei Zhou, Jiaxin Ai, Wangbo Zhao, Kai Wang, Xiaojiang Peng, Wenqi Shao, Hongxun Yao, Kaipeng Zhang

Abstract: Reasoning is an essential capacity for large language models (LLMs) to address complex tasks, where the identification of process errors is vital for improving this ability. Recently, process-level reward models (PRMs) were proposed to provide step-wise rewards that facilitate reinforcement learning and data production during training and guide LLMs toward correct steps during inference, thereby i… ▽ More Reasoning is an essential capacity for large language models (LLMs) to address complex tasks, where the identification of process errors is vital for improving this ability. Recently, process-level reward models (PRMs) were proposed to provide step-wise rewards that facilitate reinforcement learning and data production during training and guide LLMs toward correct steps during inference, thereby improving reasoning accuracy. However, existing benchmarks of PRMs are text-based and focus on error detection, neglecting other scenarios like reasoning search. To address this gap, we introduce MPBench, a comprehensive, multi-task, multimodal benchmark designed to systematically assess the effectiveness of PRMs in diverse scenarios. MPBench employs three evaluation paradigms, each targeting a specific role of PRMs in the reasoning process: (1) Step Correctness, which assesses the correctness of each intermediate reasoning step; (2) Answer Aggregation, which aggregates multiple solutions and selects the best one; and (3) Reasoning Process Search, which guides the search for optimal reasoning steps during inference. Through these paradigms, MPBench makes comprehensive evaluations and provides insights into the development of multimodal PRMs. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2503.06553 [pdf, other]

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

Authors: Jiaxin Ai, Pengfei Zhou, Zhaopan Xu, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun, Yukang Feng, Baojin Huang, Zhongyuan Wang, Kaipeng Zhang

Abstract: As multi-modal large language models (MLLMs) frequently exhibit errors when solving scientific problems, evaluating the validity of their reasoning processes is critical for ensuring reliability and uncovering fine-grained model weaknesses. Since human evaluation is laborious and costly, prompting MLLMs as automated process judges has become a common practice. However, the reliability of these mod… ▽ More As multi-modal large language models (MLLMs) frequently exhibit errors when solving scientific problems, evaluating the validity of their reasoning processes is critical for ensuring reliability and uncovering fine-grained model weaknesses. Since human evaluation is laborious and costly, prompting MLLMs as automated process judges has become a common practice. However, the reliability of these model-based judges remains uncertain. To address this, we introduce ProJudgeBench, the first comprehensive benchmark specifically designed for evaluating abilities of MLLM-based process judges. ProJudgeBench comprises 2,400 test cases and 50,118 step-level labels, spanning four scientific disciplines with diverse difficulty levels and multi-modal content. In ProJudgeBench, each step is meticulously annotated by human experts for correctness, error type, and explanation, enabling a systematic evaluation of judges' capabilities to detect, classify and diagnose errors. Evaluation on ProJudgeBench reveals a significant performance gap between open-source and proprietary models. To bridge this gap, we further propose ProJudge-173k, a large-scale instruction-tuning dataset, and a Dynamic Dual-Phase fine-tuning strategy that encourages models to explicitly reason through problem-solving before assessing solutions. Both contributions significantly enhance the process evaluation capabilities of open-source models. All the resources will be released to foster future research of reliable multi-modal process evaluation. △ Less

Submitted 9 March, 2025; originally announced March 2025.

arXiv:2503.06542 [pdf, ps, other]

ARMOR: Empowering Multimodal Understanding Model with Interleaved Multimodal Generation Capability

Authors: Jianwen Sun, Yukang Feng, Chuanhao Li, Fanrui Zhang, Zizhen Li, Jiaxin Ai, Sizhuo Zhou, Yu Dai, Shenglin Zhang, Kaipeng Zhang

Abstract: Unified multimodal understanding and generation have recently received much attention in the area of vision and language. Existing UniMs are designed to simultaneously learn both multimodal understanding and generation capabilities, demanding substantial computational resources, and often struggle to generate interleaved text-image. We present ARMOR, a resource-efficient and pure autoregressive fr… ▽ More Unified multimodal understanding and generation have recently received much attention in the area of vision and language. Existing UniMs are designed to simultaneously learn both multimodal understanding and generation capabilities, demanding substantial computational resources, and often struggle to generate interleaved text-image. We present ARMOR, a resource-efficient and pure autoregressive framework that achieves both understanding and generation by fine-tuning existing multimodal large language models (MLLMs). Specifically, ARMOR extends existing MLLMs from three perspectives: (1) For model architecture, an asymmetric encoder-decoder architecture with a forward-switching mechanism is introduced to unify embedding space integrating textual and visual modalities for enabling natural text-image interleaved generation with minimal computational overhead. (2) For training data, a meticulously curated, high-quality interleaved dataset is collected for fine-tuning MLLMs. (3) For the training algorithm, we propose a ``what or how to generate'' algorithm to empower existing MLLMs with multimodal generation capabilities while preserving their multimodal understanding capabilities, through three progressive training stages based on the collected dataset. Experimental results demonstrate that ARMOR upgrades existing MLLMs to UniMs with promising image generation capabilities, using limited training resources. Our code will be released soon at https://github.com/finyorko/armor. △ Less

Submitted 6 June, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

arXiv:2411.11396 [pdf, other]

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, Zhongyuan Wang

Abstract: The rapid advancement of face forgery techniques has introduced a growing variety of forgeries. Incremental Face Forgery Detection (IFFD), involving gradually adding new forgery data to fine-tune the previously trained model, has been introduced as a promising strategy to deal with evolving forgery methods. However, a naively trained IFFD model is prone to catastrophic forgetting when new forgerie… ▽ More The rapid advancement of face forgery techniques has introduced a growing variety of forgeries. Incremental Face Forgery Detection (IFFD), involving gradually adding new forgery data to fine-tune the previously trained model, has been introduced as a promising strategy to deal with evolving forgery methods. However, a naively trained IFFD model is prone to catastrophic forgetting when new forgeries are integrated, as treating all forgeries as a single ''Fake" class in the Real/Fake classification can cause different forgery types overriding one another, thereby resulting in the forgetting of unique characteristics from earlier tasks and limiting the model's effectiveness in learning forgery specificity and generality. In this paper, we propose to stack the latent feature distributions of previous and new tasks brick by brick, $\textit{i.e.}$, achieving $\textbf{aligned feature isolation}$. In this manner, we aim to preserve learned forgery information and accumulate new knowledge by minimizing distribution overriding, thereby mitigating catastrophic forgetting. To achieve this, we first introduce Sparse Uniform Replay (SUR) to obtain the representative subsets that could be treated as the uniformly sparse versions of the previous global distributions. We then propose a Latent-space Incremental Detector (LID) that leverages SUR data to isolate and align distributions. For evaluation, we construct a more advanced and comprehensive benchmark tailored for IFFD. The leading experimental results validate the superiority of our method. △ Less

Submitted 28 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025

arXiv:2408.06635 [pdf, other]

IDRetracor: Towards Visual Forensics Against Malicious Face Swapping

Authors: Jikang Cheng, Jiaxin Ai, Zhen Han, Chao Liang, Qin Zou, Zhongyuan Wang, Qian Wang

Abstract: The face swapping technique based on deepfake methods poses significant social risks to personal identity security. While numerous deepfake detection methods have been proposed as countermeasures against malicious face swapping, they can only output binary labels (Fake/Real) for distinguishing fake content without reliable and traceable evidence. To achieve visual forensics and target face attribu… ▽ More The face swapping technique based on deepfake methods poses significant social risks to personal identity security. While numerous deepfake detection methods have been proposed as countermeasures against malicious face swapping, they can only output binary labels (Fake/Real) for distinguishing fake content without reliable and traceable evidence. To achieve visual forensics and target face attribution, we propose a novel task named face retracing, which considers retracing the original target face from the given fake one via inverse mapping. Toward this goal, we propose an IDRetracor that can retrace arbitrary original target identities from fake faces generated by multiple face swapping methods. Specifically, we first adopt a mapping resolver to perceive the possible solution space of the original target face for the inverse mappings. Then, we propose mapping-aware convolutions to retrace the original target face from the fake one. Such convolutions contain multiple kernels that can be combined under the control of the mapping resolver to tackle different face swapping mappings dynamically. Extensive experiments demonstrate that the IDRetracor exhibits promising retracing performance from both quantitative and qualitative perspectives. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.17051 [pdf, ps, other]

Number of Subgraphs and Their Converses in Tournaments and New Digraph Polynomials

Authors: Jiangdong Ai, Gregory Gutin, Hui Lei, Anders Yeo, Yacong Zhou

Abstract: An oriented graph $D$ is converse invariant if, for any tournament $T$, the number of copies of $D$ in $T$ is equal to that of its converse $-D$. El Sahili and Ghazo Hanna [J. Graph Theory 102 (2023), 684-701] showed that any oriented graph $D$ with maximum degree at most 2 is converse invariant. They proposed a question: Can we characterize all converse invariant oriented graphs? In this paper,… ▽ More An oriented graph $D$ is converse invariant if, for any tournament $T$, the number of copies of $D$ in $T$ is equal to that of its converse $-D$. El Sahili and Ghazo Hanna [J. Graph Theory 102 (2023), 684-701] showed that any oriented graph $D$ with maximum degree at most 2 is converse invariant. They proposed a question: Can we characterize all converse invariant oriented graphs? In this paper, we introduce a digraph polynomial and employ it to give a necessary condition for an oriented graph to be converse invariant. This polynomial serves as a cornerstone in proving all the results presented in this paper. In particular, we characterize all orientations of trees with diameter at most 3 that are converse invariant. We also show that all orientations of regular graphs are not converse invariant if $D$ and $-D$ have different degree sequences. In addition, in contrast to the findings of El Sahili and Ghazo Hanna, we prove that every connected graph $G$ with maximum degree at least $3$, admits an orientation $D$ of $G$ such that $D$ is not converse invariant. We pose one conjecture. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2404.08285 [pdf]

A Survey of Neural Network Robustness Assessment in Image Recognition

Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu

Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models. Researchers have dedicated efforts to evaluate robustness in diverse perturbation conditions for image recognition tasks. Robustness assessment encompasses two main techniques: robustness verification/ certification for deliberate adversarial attacks and robustness testing for random data corruptions. In this survey, we present a detailed examination of both adversarial robustness (AR) and corruption robustness (CR) in neural network assessment. Analyzing current research papers and standards, we provide an extensive overview of robustness assessment in image recognition. Three essential aspects are analyzed: concepts, metrics, and assessment methods. We investigate the perturbation metrics and range representations used to measure the degree of perturbations on images, as well as the robustness metrics specifically for the robustness conditions of classification models. The strengths and limitations of the existing methods are also discussed, and some potential directions for future research are provided. △ Less

Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Corrected typos and grammatical errors in Section 5

arXiv:2403.19943 [pdf, other]

TDANet: A Novel Temporal Denoise Convolutional Neural Network With Attention for Fault Diagnosis

Authors: Zhongzhi Li, Rong Fan, Jingqi Tu, Jinyi Ma, Jianliang Ai, Yiqun Dong

Abstract: Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical ap… ▽ More Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical application of current fault diagnosis methods is challenged by the complexity of industrial environments. This paper proposed the Temporal Denoise Convolutional Neural Network With Attention (TDANet), designed to improve fault diagnosis performance in noise environments. This model transforms one-dimensional signals into two-dimensional tensors based on their periodic properties, employing multi-scale 2D convolution kernels to extract signal information both within and across periods. This method enables effective identification of signal characteristics that vary over multiple time scales. The TDANet incorporates a Temporal Variable Denoise (TVD) module with residual connections and a Multi-head Attention Fusion (MAF) module, enhancing the saliency of information within noisy data and maintaining effective fault diagnosis performance. Evaluation on two datasets, CWRU (single sensor) and Real aircraft sensor fault (multiple sensors), demonstrates that the TDANet model significantly outperforms existing deep learning approaches in terms of diagnostic accuracy under noisy environments. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19737 [pdf, ps, other]

Piercing independent sets in graphs without large induced matching

Authors: Jiangdong Ai, Hong Liu, Zixiang Xu, Qiang Zhou

Abstract: Given a graph $G$, denote by $h(G)$ the smallest size of a subset of $V(G)$ which intersects every maximum independent set of $G$. We prove that any graph $G$ without induced matching of size $t$ satisfies $h(G)\le ω(G)^{3t-3+o(1)}$. This resolves a conjecture of Hajebi, Li and Spirkl (Hitting all maximum stable sets in $P_{5}$-free graphs, JCTB 2024). Given a graph $G$, denote by $h(G)$ the smallest size of a subset of $V(G)$ which intersects every maximum independent set of $G$. We prove that any graph $G$ without induced matching of size $t$ satisfies $h(G)\le ω(G)^{3t-3+o(1)}$. This resolves a conjecture of Hajebi, Li and Spirkl (Hitting all maximum stable sets in $P_{5}$-free graphs, JCTB 2024). △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.12729 [pdf, other]

Scalable and reliable deep transfer learning for intelligent fault detection via multi-scale neural processes embedded with knowledge

Authors: Zhongzhi Li, Jingqi Tu, Jiacheng Zhu, Jianliang Ai, Yiqun Dong

Abstract: Deep transfer learning (DTL) is a fundamental method in the field of Intelligent Fault Detection (IFD). It aims to mitigate the degradation of method performance that arises from the discrepancies in data distribution between training set (source domain) and testing set (target domain). Considering the fact that fault data collection is challenging and certain faults are scarce, DTL-based methods… ▽ More Deep transfer learning (DTL) is a fundamental method in the field of Intelligent Fault Detection (IFD). It aims to mitigate the degradation of method performance that arises from the discrepancies in data distribution between training set (source domain) and testing set (target domain). Considering the fact that fault data collection is challenging and certain faults are scarce, DTL-based methods face the limitation of available observable data, which reduces the detection performance of the methods in the target domain. Furthermore, DTL-based methods lack comprehensive uncertainty analysis that is essential for building reliable IFD systems. To address the aforementioned problems, this paper proposes a novel DTL-based method known as Neural Processes-based deep transfer learning with graph convolution network (GTNP). Feature-based transfer strategy of GTNP bridges the data distribution discrepancies of source domain and target domain in high-dimensional space. Both the joint modeling based on global and local latent variables and sparse sampling strategy reduce the demand of observable data in the target domain. The multi-scale uncertainty analysis is obtained by using the distribution characteristics of global and local latent variables. Global analysis of uncertainty enables GTNP to provide quantitative values that reflect the complexity of methods and the difficulty of tasks. Local analysis of uncertainty allows GTNP to model uncertainty (confidence of the fault detection result) at each sample affected by noise and bias. The validation of the proposed method is conducted across 3 IFD tasks, consistently showing the superior detection performance of GTNP compared to the other DTL-based methods. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2401.05362 [pdf, other]

DualTeacher: Bridging Coexistence of Unlabelled Classes for Semi-supervised Incremental Object Detection

Authors: Ziqi Yuan, Liyuan Wang, Wenbo Ding, Xingxing Zhang, Jiachen Zhong, Jianyong Ai, Jianmin Li, Jun Zhu

Abstract: In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively. Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data. However, as supervisory signals are usually rare and expensive, the supervised I… ▽ More In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively. Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data. However, as supervisory signals are usually rare and expensive, the supervised IOD may not be practical for implementation. In this work, we consider a more realistic setting named semi-supervised IOD (SSIOD), where the object detector needs to learn new classes incrementally from a few labelled data and massive unlabelled data without catastrophic forgetting of old classes. A commonly-used strategy for supervised IOD is to encourage the current model (as a student) to mimic the behavior of the old model (as a teacher), but it generally fails in SSIOD because a dominant number of object instances from old and new classes are coexisting and unlabelled, with the teacher only recognizing a fraction of them. Observing that learning only the classes of interest tends to preclude detection of other classes, we propose to bridge the coexistence of unlabelled classes by constructing two teacher models respectively for old and new classes, and using the concatenation of their predictions to instruct the student. This approach is referred to as DualTeacher, which can serve as a strong baseline for SSIOD with limited resource overhead and no extra hyperparameters. We build various benchmarks for SSIOD and perform extensive experiments to demonstrate the superiority of our approach (e.g., the performance lead is up to 18.28 AP on MS-COCO). Our code is available at \url{https://github.com/chuxiuhong/DualTeacher}. △ Less

Submitted 13 December, 2023; originally announced January 2024.

arXiv:2401.04330 [pdf, other]

doi 10.1109/JSTARS.2024.3392917

BD-MSA: Body decouple VHR Remote Sensing Image Change Detection method guided by multi-scale feature information aggregation

Authors: Yonghui Tan, Xiaolong Li, Yishu Chen, Jinquan Ai

Abstract: The purpose of remote sensing image change detection (RSCD) is to detect differences between bi-temporal images taken at the same place. Deep learning has been extensively used to RSCD tasks, yielding significant results in terms of result recognition. However, due to the shooting angle of the satellite, the impacts of thin clouds, and certain lighting conditions, the problem of fuzzy edges in the… ▽ More The purpose of remote sensing image change detection (RSCD) is to detect differences between bi-temporal images taken at the same place. Deep learning has been extensively used to RSCD tasks, yielding significant results in terms of result recognition. However, due to the shooting angle of the satellite, the impacts of thin clouds, and certain lighting conditions, the problem of fuzzy edges in the change region in some remote sensing photographs cannot be properly handled using current RSCD algorithms. To solve this issue, we proposed a Body Decouple Multi-Scale by fearure Aggregation change detection (BD-MSA), a novel model that collects both global and local feature map information in the channel and space dimensions of the feature map during the training and prediction phases. This approach allows us to successfully extract the change region's boundary information while also divorcing the change region's main body from its boundary. Numerous studies have shown that the assessment metrics and evaluation effects of the model described in this paper on the publicly available datasets DSIFN-CD, S2Looking and WHU-CD are the best when compared to other models. △ Less

Submitted 3 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2309.12708 [pdf, other]

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Authors: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu

Abstract: Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC,… ▽ More Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Semantic Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation. The code and datasets are available at https://github.com/yyxssm/PointSSC. △ Less

Submitted 6 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: ICRA2024, oral & poster

arXiv:2304.10202 [pdf, ps, other]

Bounds on Maximum Weight Directed Cut

Authors: Jiangdong Ai, Stefanie Gerke, Gregory Gutin, Anders Yeo, Yacong Zhou

Abstract: We obtain lower and upper bounds for the maximum weight of a directed cut in the classes of weighted digraphs and weighted acyclic digraphs as well as in some of their subclasses. We compare our results with those obtained for the maximum size of a directed cut in unweighted digraphs. In particular, we show that a lower bound obtained by Alon, Bollobas, Gyafas, Lehel and Scott (J Graph Th 55(1) (2… ▽ More We obtain lower and upper bounds for the maximum weight of a directed cut in the classes of weighted digraphs and weighted acyclic digraphs as well as in some of their subclasses. We compare our results with those obtained for the maximum size of a directed cut in unweighted digraphs. In particular, we show that a lower bound obtained by Alon, Bollobas, Gyafas, Lehel and Scott (J Graph Th 55(1) (2007)) for unweighted acyclic digraphs can be extended to weighted digraphs with the maximum length of a cycle being bounded by a constant and the weight of every arc being at least one. We state a number of open problems. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2209.11523 [pdf, other]

WS-3D-Lane: Weakly Supervised 3D Lane Detection With 2D Lane Labels

Authors: Jianyong Ai, Wenbo Ding, Jiuhua Zhao, Jiachen Zhong

Abstract: Compared to 2D lanes, real 3D lane data is difficult to collect accurately. In this paper, we propose a novel method for training 3D lanes with only 2D lane labels, called weakly supervised 3D lane detection WS-3D-Lane. By assumptions of constant lane width and equal height on adjacent lanes, we indirectly supervise 3D lane heights in the training. To overcome the problem of the dynamic change of… ▽ More Compared to 2D lanes, real 3D lane data is difficult to collect accurately. In this paper, we propose a novel method for training 3D lanes with only 2D lane labels, called weakly supervised 3D lane detection WS-3D-Lane. By assumptions of constant lane width and equal height on adjacent lanes, we indirectly supervise 3D lane heights in the training. To overcome the problem of the dynamic change of the camera pitch during data collection, a camera pitch self-calibration method is proposed. In anchor representation, we propose a double-layer anchor with a improved non-maximum suppression (NMS) method, which enables the anchor-based method to predict two lane lines that are close. Experiments are conducted on the base of 3D-LaneNet under two supervision methods. Under weakly supervised setting, our WS-3D-Lane outperforms previous 3D-LaneNet: F-score rises to 92.3% on Apollo 3D synthetic dataset, and F1 rises to 74.5% on ONCE-3DLanes. Meanwhile, WS-3D-Lane in purely supervised setting makes more increments and outperforms state-of-the-art. To the best of our knowledge, WS-3D-Lane is the first try of 3D lane detection under weakly supervised setting. △ Less

Submitted 17 January, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: 7 pages, 8 figures. Accepted by ICRA 2023

arXiv:2207.13267 [pdf, other]

Fault Detection and Classification of Aerospace Sensors using a VGG16-based Deep Neural Network

Authors: Zhongzhi Li, Yunmei Zhao, Jinyi Ma, Jianliang Ai, Yiqun Dong

Abstract: Compared with traditional model-based fault detection and classification (FDC) methods, deep neural networks (DNN) prove to be effective for the aerospace sensors FDC problems. However, time being consumed in training the DNN is excessive, and explainability analysis for the FDC neural network is still underwhelming. A concept known as imagefication-based intelligent FDC has been studied in recent… ▽ More Compared with traditional model-based fault detection and classification (FDC) methods, deep neural networks (DNN) prove to be effective for the aerospace sensors FDC problems. However, time being consumed in training the DNN is excessive, and explainability analysis for the FDC neural network is still underwhelming. A concept known as imagefication-based intelligent FDC has been studied in recent years. This concept advocates to stack the sensors measurement data into an image format, the sensors FDC issue is then transformed to abnormal regions detection problem on the stacked image, which may well borrow the recent advances in the machine vision vision realm. Although promising results have been claimed in the imagefication-based intelligent FDC researches, due to the low size of the stacked image, small convolutional kernels and shallow DNN layers were used, which hinders the FDC performance. In this paper, we first propose a data augmentation method which inflates the stacked image to a larger size (correspondent to the VGG16 net developed in the machine vision realm). The FDC neural network is then trained via fine-tuning the VGG16 directly. To truncate and compress the FDC net size (hence its running time), we perform model pruning on the fine-tuned net. Class activation mapping (CAM) method is also adopted for explainability analysis of the FDC net to verify its internal operations. Via data augmentation, fine-tuning from VGG16, and model pruning, the FDC net developed in this paper claims an FDC accuracy 98.90% across 4 aircraft at 5 flight conditions (running time 26 ms). The CAM results also verify the FDC net w.r.t. its internal operations. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.12157 [pdf, ps, other]

Results on the Small Quasi-Kernel Conjecture

Authors: Jiangdong Ai, Stefanie Gerke, Gregory Gutin, Anders Yeo, Yacong Zhou

Abstract: A {\em quasi-kernel} of a digraph $D$ is an independent set $Q\subseteq V(D)$ such that for every vertex $v\in V(D)\backslash Q$, there exists a directed path with one or two arcs from $v$ to a vertex $u\in Q$. In 1974, Chvátal and Lovász proved that every digraph has a quasi-kernel. In 1976, Erdős and Sźekely conjectured that every sink-free digraph $D=(V(D),A(D))$ has a quasi-kernel of size at m… ▽ More A {\em quasi-kernel} of a digraph $D$ is an independent set $Q\subseteq V(D)$ such that for every vertex $v\in V(D)\backslash Q$, there exists a directed path with one or two arcs from $v$ to a vertex $u\in Q$. In 1974, Chvátal and Lovász proved that every digraph has a quasi-kernel. In 1976, Erdős and Sźekely conjectured that every sink-free digraph $D=(V(D),A(D))$ has a quasi-kernel of size at most $|V(D)|/2$. In this paper, we give a new method to show that the conjecture holds for a generalization of anti-claw-free digraphs. For any sink-free one-way split digraph $D$ of order $n$, when $n\geq 3$, we show a stronger result that $D$ has a quasi-kernel of size at most $\frac{n+3}{2} - \sqrt{n}$, and the bound is sharp. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: 14 pages

arXiv:2207.03666 [pdf, other]

Deepfake Face Traceability with Disentangling Reversing Network

Authors: Jiaxin Ai, Zhongyuan Wang, Baojin Huang, Zhen Han

Abstract: Deepfake face not only violates the privacy of personal identity, but also confuses the public and causes huge social harm. The current deepfake detection only stays at the level of distinguishing true and false, and cannot trace the original genuine face corresponding to the fake face, that is, it does not have the ability to trace the source of evidence. The deepfake countermeasure technology fo… ▽ More Deepfake face not only violates the privacy of personal identity, but also confuses the public and causes huge social harm. The current deepfake detection only stays at the level of distinguishing true and false, and cannot trace the original genuine face corresponding to the fake face, that is, it does not have the ability to trace the source of evidence. The deepfake countermeasure technology for judicial forensics urgently calls for deepfake traceability. This paper pioneers an interesting question about face deepfake, active forensics that "know it and how it happened". Given that deepfake faces do not completely discard the features of original faces, especially facial expressions and poses, we argue that original faces can be approximately speculated from their deepfake counterparts. Correspondingly, we design a disentangling reversing network that decouples latent space features of deepfake faces under the supervision of fake-original face pair samples to infer original faces in reverse. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 5 pages, 4 figures

arXiv:2206.09055

Augmented Imagefication: A Data-driven Fault Detection Method for Aircraft Air Data Sensors

Authors: Hang Zhao, Jinyi Ma, Zhongzhi Li, Yiqun Dong, Jianliang Ai

Abstract: In this paper, a novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed. Exemplifying the FD problem of aircraft air data sensors, an online FD scheme on edge device based on deep neural network (DNN) is developed. First, the aircraft inertial reference unit measurements is adopted as equivalent inputs, which is scalable to… ▽ More In this paper, a novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed. Exemplifying the FD problem of aircraft air data sensors, an online FD scheme on edge device based on deep neural network (DNN) is developed. First, the aircraft inertial reference unit measurements is adopted as equivalent inputs, which is scalable to different aircraft/flight cases. Data associated with 6 different aircraft/flight conditions are collected to provide diversity (scalability) in the training/testing database. Then Augmented Imagefication is proposed for the DNN-based prediction of flying conditions. The raw data are reshaped as a grayscale image for convolutional operation, and the necessity of augmentation is analyzed and pointed out. Different kinds of augmented method, i.e. Flip, Repeat, Tile and their combinations are discussed, the result shows that the All Repeat operation in both axes of image matrix leads to the best performance of DNN. The interpretability of DNN is studied based on Grad-CAM, which provide a better understanding and further solidifies the robustness of DNN. Next the DNN model, VGG-16 with augmented imagefication data is optimized for mobile hardware deployment. After pruning of DNN, a lightweight model (98.79% smaller than original VGG-16) with high accuracy (slightly up by 0.27%) and fast speed (time delay is reduced by 87.54%) is obtained. And the hyperparameters optimization of DNN based on TPE is implemented and the best combination of hyperparameters is determined (learning rate 0.001, iterative epochs 600, and batch size 100 yields the highest accuracy at 0.987). Finally, a online FD deployment based on edge device, Jetson Nano, is developed and the real time monitoring of aircraft is achieved. We believe that this method is instructive for addressing the FD problems in other similar fields. △ Less

Submitted 28 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: a crucial design defect to acquire flying data by simulation

arXiv:2206.05751 [pdf, other]

doi 10.1016/j.patrec.2023.03.001

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

Authors: Chengyang Ying, You Qiaoben, Xinning Zhou, Hang Su, Wenbo Ding, Jianyong Ai

Abstract: Embodied agents in vision navigation coupled with deep neural networks have attracted increasing attention. However, deep neural networks have been shown vulnerable to malicious adversarial noises, which may potentially cause catastrophic failures in Embodied Vision Navigation. Among different adversarial noises, universal adversarial perturbations (UAP), i.e., a constant image-agnostic perturbati… ▽ More Embodied agents in vision navigation coupled with deep neural networks have attracted increasing attention. However, deep neural networks have been shown vulnerable to malicious adversarial noises, which may potentially cause catastrophic failures in Embodied Vision Navigation. Among different adversarial noises, universal adversarial perturbations (UAP), i.e., a constant image-agnostic perturbation applied on every input frame of the agent, play a critical role in Embodied Vision Navigation since they are computation-efficient and application-practical during the attack. However, existing UAP methods ignore the system dynamics of Embodied Vision Navigation and might be sub-optimal. In order to extend UAP to the sequential decision setting, we formulate the disturbed environment under the universal noise $δ$, as a $δ$-disturbed Markov Decision Process ($δ$-MDP). Based on the formulation, we analyze the properties of $δ$-MDP and propose two novel Consistent Attack methods, named Reward UAP and Trajectory UAP, for attacking Embodied agents, which consider the dynamic of the MDP and calculate universal noises by estimating the disturbed distribution and the disturbed Q function. For various victim models, our Consistent Attack can cause a significant drop in their performance in the PointGoal task in Habitat with different datasets and different scenes. Extensive experimental results indicate that there exist serious potential risks for applying Embodied Vision Navigation methods to the real world. △ Less

Submitted 25 March, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

Journal ref: Pattern Recognition Letters (PRL), 2023

arXiv:2202.13898

DistAD: Software Anomaly Detection Based on Execution Trace Distribution

Authors: Shiyi Kong, Jun Ai, Minyan Lu, Shuguang Wang, W. Eric Wong

Abstract: Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to identify the manifestation of faults (anomalies) be… ▽ More Modern software systems have become increasingly complex, which makes them difficult to test and validate. Detecting software partial anomalies in complex systems at runtime can assist with handling unintended software behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to identify the manifestation of faults (anomalies) before they ultimately lead to unavoidable failures, thus, supporting the following runtime fault-tolerant techniques. In this work, we propose a novel anomaly detection method named DistAD, which is based on the distribution of software runtime dynamic execution traces. Unlike other existing works using key performance indicators, the execution trace is collected during runtime via intrusive instrumentation. Instrumentation are controlled following a sampling mechanism to avoid excessive overheads. Bi-directional Long Short-Term Memory (Bi-LSTM), an architecture of Recurrent Neural Network (RNN) is used to achieve the anomaly detection. The whole framework is constructed under a One-Class Neural Network (OCNN) learning mode which can help eliminate the limits of lacking for enough labeled samples and the data imbalance issues. A series of controlled experiments are conducted on a widely used database system named Cassandra to prove the validity and feasibility of the proposed method. Overheads brought about by the intrusive probing are also evaluated. The results show that DistAD can achieve more than 70% accuracy and 90% recall (in normal states) with no more than 2 times overheads compared with unmonitored executions. △ Less

Submitted 26 April, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: need modification, the experiment results need carefully check

arXiv:2111.09633 [pdf, ps, other]

Extended Path Partition Conjecture for Semicomplete and Acyclic Compositions

Authors: Jiangdong Ai, Stefanie Gerke, Gregory Gutin, Yacong Zhou

Abstract: Let $D$ be a digraph and let $λ(D)$ denote the number of vertices in a longest path of $D$. For a pair of vertex-disjoint induced subdigraphs $A$ and $B$ of $D$, we say that $(A,B)$ is a partition of $D$ if $V(A)\cup V(B)=V(D).$ The Path Partition Conjecture (PPC) states that for every digraph, $D$, and every integer $q$ with $1\leq q\leqλ(D)-1$, there exists a partition $(A,B)$ of $D$ such that… ▽ More Let $D$ be a digraph and let $λ(D)$ denote the number of vertices in a longest path of $D$. For a pair of vertex-disjoint induced subdigraphs $A$ and $B$ of $D$, we say that $(A,B)$ is a partition of $D$ if $V(A)\cup V(B)=V(D).$ The Path Partition Conjecture (PPC) states that for every digraph, $D$, and every integer $q$ with $1\leq q\leqλ(D)-1$, there exists a partition $(A,B)$ of $D$ such that $λ(A)\leq q$ and $λ(B)\leqλ(D)-q.$ Let $T$ be a digraph with vertex set $\{u_1,\dots, u_t\}$ and for every $i\in [t]$, let $H_i$ be a digraph with vertex set $\{u_{i,j_i}\colon\, j_i\in [n_i]\}$. The {\em composition} $Q=T[H_1,\dots , H_t]$ of $T$ and $H_1,\ldots, H_t$ is a digraph with vertex set $\{u_{i,j_i}\colon\, i\in [t], j_i\in [n_i]\}$ and arc set $$A(Q)=\cup^t_{i=1}A(H_i)\cup \{u_{i,j_i}u_{p,q_p}\colon\, u_iu_p\in A(T), j_i\in [n_i], q_p\in [n_p]\}.$$ We say that $Q$ is acyclic {(semicomplete, respectively)} if $T$ is acyclic {(semicomplete, respectively)}. In this paper, we introduce a conjecture stronger than PPC using a property first studied by Bang-Jensen, Nielsen and Yeo (2006) and show that the stronger conjecture holds for wide families of acyclic and semicomplete compositions. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 9 pages

arXiv:2110.06629 [pdf]

Detection Software Content Failures Using Dynamic Execution Information

Authors: Shiyi Kong, Minyan Lu, Jun Ai, Shuguang Wang

Abstract: Modern software systems become too complex to be tested and validated. Detecting software partial failures in complex systems at runtime assist to handle software unintended behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to find the manifestation of faults before they finally lead to unavoidable failures, thus supporti… ▽ More Modern software systems become too complex to be tested and validated. Detecting software partial failures in complex systems at runtime assist to handle software unintended behaviors, avoiding catastrophic software failures and improving software runtime availability. These detection techniques aim to find the manifestation of faults before they finally lead to unavoidable failures, thus supporting following runtime fault tolerant techniques. We review the state of the art articles and find that the content failures account for the majority of all kinds of software failures, but its detection methods are rarely studied. In this work, we propose a novel failure detection indicator based on the software runtime dynamic execution information for software content failures. The runtime information is recorded during software execution, then transformed to a measure named runtime entropy and finally fed into machine learning models. The machine learning models are built to classify the intended and unintended behaviors of the objected software systems. A series of controlled experiments on several open source projects are conducted to prove the feasibility of the method. We also evaluate the accuracy of machine learning models built in this work. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2104.01318 [pdf, other]

Efficient DETR: Improving End-to-End Object Detector with Dense Prior

Authors: Zhuyu Yao, Jiangbo Ai, Boxun Li, Chi Zhang

Abstract: The recently proposed end-to-end transformer detectors, such as DETR and Deformable DETR, have a cascade structure of stacking 6 decoder layers to update object queries iteratively, without which their performance degrades seriously. In this paper, we investigate that the random initialization of object containers, which include object queries and reference points, is mainly responsible for the re… ▽ More The recently proposed end-to-end transformer detectors, such as DETR and Deformable DETR, have a cascade structure of stacking 6 decoder layers to update object queries iteratively, without which their performance degrades seriously. In this paper, we investigate that the random initialization of object containers, which include object queries and reference points, is mainly responsible for the requirement of multiple iterations. Based on our findings, we propose Efficient DETR, a simple and efficient pipeline for end-to-end object detection. By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers and brings the gap of the 1-decoder structure and 6-decoder structure. Experiments conducted on MS COCO show that our method, with only 3 encoder layers and 1 decoder layer, achieves competitive performance with state-of-the-art object detection methods. Efficient DETR is also robust in crowded scenes. It outperforms modern detectors on CrowdHuman dataset by a large margin. △ Less

Submitted 3 April, 2021; originally announced April 2021.

Comments: 10 pages, 5 figures, 10 tables

arXiv:2011.05878 [pdf, ps, other]

Kings in Multipartite Hypertournaments

Authors: Jiangdong Ai, Stefanie Gerke, Gregory Gutin

Abstract: In his paper "Kings in Bipartite Hypertournaments" (Graphs $\&$ Combinatorics 35, 2019), Petrovic stated two conjectures on 4-kings in multipartite hypertournaments. We prove one of these conjectures and give counterexamples for the other. In his paper "Kings in Bipartite Hypertournaments" (Graphs $\&$ Combinatorics 35, 2019), Petrovic stated two conjectures on 4-kings in multipartite hypertournaments. We prove one of these conjectures and give counterexamples for the other. △ Less

Submitted 16 July, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2005.06184 [pdf, ps, other]

Attribute-guided Feature Extraction and Augmentation Robust Learning for Vehicle Re-identification

Authors: Chaoran Zhuge, Yujie Peng, Yadong Li, Jiangbo Ai, Junru Chen

Abstract: Vehicle re-identification is one of the core technologies of intelligent transportation systems and smart cities, but large intra-class diversity and inter-class similarity poses great challenges for existing method. In this paper, we propose a multi-guided learning approach which utilizing the information of attributes and meanwhile introducing two novel random augments to improve the robustness… ▽ More Vehicle re-identification is one of the core technologies of intelligent transportation systems and smart cities, but large intra-class diversity and inter-class similarity poses great challenges for existing method. In this paper, we propose a multi-guided learning approach which utilizing the information of attributes and meanwhile introducing two novel random augments to improve the robustness during training. What's more, we propose an attribute constraint method and group re-ranking strategy to refine matching results. Our method achieves mAP of 66.83% and rank-1 accuracy 76.05% in the CVPR 2020 AI City Challenge. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2001.10253 [pdf, other]

Proximity and Remoteness in Directed and Undirected Graphs

Authors: Jiangdong Ai, Stefanie Gerke, Gregory Gutin, Sonwabile Mafunda

Abstract: Let $D$ be a strongly connected digraph. The average distance $\barσ(v)$ of a vertex $v$ of $D$ is the arithmetic mean of the distances from $v$ to all other vertices of $D$. The remoteness $ρ(D)$ and proximity $π(D)$ of $D$ are the maximum and the minimum of the average distances of the vertices of $D$, respectively. We obtain sharp upper and lower bounds on $π(D)$ and $ρ(D)$ as a function of the… ▽ More Let $D$ be a strongly connected digraph. The average distance $\barσ(v)$ of a vertex $v$ of $D$ is the arithmetic mean of the distances from $v$ to all other vertices of $D$. The remoteness $ρ(D)$ and proximity $π(D)$ of $D$ are the maximum and the minimum of the average distances of the vertices of $D$, respectively. We obtain sharp upper and lower bounds on $π(D)$ and $ρ(D)$ as a function of the order $n$ of $D$ and describe the extreme digraphs for all the bounds. We also obtain such bounds for strong tournaments. We show that for a strong tournament $T$, we have $π(T)=ρ(T)$ if and only if $T$ is regular. Due to this result, one may conjecture that every strong digraph $D$ with $π(D)=ρ(D)$ is regular. We present an infinite family of non-regular strong digraphs $D$ such that $π(D)=ρ(D).$ We describe such a family for undirected graphs as well. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:2001.04665 [pdf]

Face Attribute Invertion

Authors: X G Tu, Y Luo, H S Zhang, W J Ai, Z Ma, M Xie

Abstract: Manipulating human facial images between two domains is an important and interesting problem. Most of the existing methods address this issue by applying two generators or one generator with extra conditional inputs. In this paper, we proposed a novel self-perception method based on GANs for automatical face attribute inverse. The proposed method takes face images as inputs and employs only one si… ▽ More Manipulating human facial images between two domains is an important and interesting problem. Most of the existing methods address this issue by applying two generators or one generator with extra conditional inputs. In this paper, we proposed a novel self-perception method based on GANs for automatical face attribute inverse. The proposed method takes face images as inputs and employs only one single generator without being conditioned on other inputs. Profiting from the multi-loss strategy and modified U-net structure, our model is quite stable in training and capable of preserving finer details of the original face images. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 8 pages, 3 figures

arXiv:1907.06379 [pdf, other]

Proper Orientation Number of Triangle-free Bridgeless Outerplanar Graphs

Authors: J. Ai, S. Gerke, G. Gutin, Y. Shi, Z. Taoqiu

Abstract: An orientation of $G$ is a digraph obtained from $G$ by replacing each edge by exactly one of two possible arcs with the same endpoints. We call an orientation \emph{proper} if neighbouring vertices have different in-degrees. The proper orientation number of a graph $G$, denoted by $\vecχ(G)$, is the minimum maximum in-degree of a proper orientation of G. Araujo et al. (Theor. Comput. Sci. 639 (20… ▽ More An orientation of $G$ is a digraph obtained from $G$ by replacing each edge by exactly one of two possible arcs with the same endpoints. We call an orientation \emph{proper} if neighbouring vertices have different in-degrees. The proper orientation number of a graph $G$, denoted by $\vecχ(G)$, is the minimum maximum in-degree of a proper orientation of G. Araujo et al. (Theor. Comput. Sci. 639 (2016) 14--25) asked whether there is a constant $c$ such that $\vecχ(G)\leq c$ for every outerplanar graph $G$ and showed that $\vecχ(G)\leq 7$ for every cactus $G.$ We prove that $\vecχ(G)\leq 3$ if $G$ is a triangle-free $2$-connected outerplanar graph and $\vecχ(G)\leq 4$ if $G$ is a triangle-free bridgeless outerplanar graph. △ Less

Submitted 17 March, 2020; v1 submitted 15 July, 2019; originally announced July 2019.

arXiv:1812.08809 [pdf, ps, other]

Arc-disjoint strong spanning subdigraphs in compositions and products of digraphs

Authors: Yuefang Sun, Gregory Gutin, Jiangdong Ai

Abstract: A digraph $D=(V,A)$ has a good decomposition if $A$ has two disjoint sets $A_1$ and $A_2$ such that both $(V,A_1)$ and $(V,A_2)$ are strong. Let $T$ be a digraph with $t$ vertices $u_1,\dots , u_t$ and let $H_1,\dots H_t$ be digraphs such that $H_i$ has vertices $u_{i,j_i},\ 1\le j_i\le n_i.$ Then the composition $Q=T[H_1,\dots , H_t]$ is a digraph with vertex set… ▽ More A digraph $D=(V,A)$ has a good decomposition if $A$ has two disjoint sets $A_1$ and $A_2$ such that both $(V,A_1)$ and $(V,A_2)$ are strong. Let $T$ be a digraph with $t$ vertices $u_1,\dots , u_t$ and let $H_1,\dots H_t$ be digraphs such that $H_i$ has vertices $u_{i,j_i},\ 1\le j_i\le n_i.$ Then the composition $Q=T[H_1,\dots , H_t]$ is a digraph with vertex set $\{u_{i,j_i}\mid 1\le i\le t, 1\le j_i\le n_i\}$ and arc set $$A(Q)=\cup^t_{i=1}A(H_i)\cup \{u_{ij_i}u_{pq_p}\mid u_iu_p\in A(T), 1\le j_i\le n_i, 1\le q_p\le n_p\}.$$ For digraph compositions $Q=T[H_1,\dots H_t]$, we obtain sufficient conditions for $Q$ to have a good decomposition and a characterization of $Q$ with a good decomposition when $T$ is a strong semicomplete digraph and each $H_i$ is an arbitrary digraph with at least two vertices. For digraph products, we prove the following: (a) if $k\geq 2$ is an integer and $G$ is a strong digraph which has a collection of arc-disjoint cycles covering all vertices, then the Cartesian product digraph $G^{\square k}$ (the $k$th powers with respect to Cartesian product) has a good decomposition; (b) for any strong digraphs $G, H$, the strong product $G\boxtimes H$ has a good decomposition. △ Less

Submitted 20 December, 2018; originally announced December 2018.

arXiv:1601.01502 [pdf, ps, other]

The Expurgation-Augmentation Method for Constructing Good Plane Subspace Codes

Authors: Jingmei Ai, Thomas Honold, Haiteng Liu

Abstract: As shown in [28], one of the five isomorphism types of optimal binary subspace codes of size 77 for packet length v=6, constant dimension k=3 and minimum subspace distance d=4 can be constructed by first expurgating and then augmenting the corresponding lifted Gabidulin code in a fairly simple way. The method was refined in [32,26] to yield an essentially computer-free construction of a currently… ▽ More As shown in [28], one of the five isomorphism types of optimal binary subspace codes of size 77 for packet length v=6, constant dimension k=3 and minimum subspace distance d=4 can be constructed by first expurgating and then augmenting the corresponding lifted Gabidulin code in a fairly simple way. The method was refined in [32,26] to yield an essentially computer-free construction of a currently best-known plane subspace code of size 329 for (v,k,d)=(7,3,4). In this paper we generalize the expurgation-augmentation approach to arbitrary packet length v, providing both a detailed theoretical analysis of our method and computational results for small parameters. As it turns out, our method is capable of producing codes larger than those obtained by the echelon-Ferrers construction and its variants. We are able to prove this observation rigorously for packet lengths v = 3 mod 4. △ Less

Submitted 17 January, 2016; v1 submitted 7 January, 2016; originally announced January 2016.

Comments: 44 pages, 3 tables, 1 figure; part of the results was presented at the International Workshop on Algebraic Combinatorics at Zhejiang University, Hangzhou, September 2015; Version 2 contains minor corrections

MSC Class: 94B05; 05B25; 51E20 (Primary); 51E14; 51E22; 51E23 (Secondary)

Showing 1–40 of 40 results for author: Ai, J